First, I apologize for the long post - this takes a bit of explanation.
I recently started a project with a new collaborator. The project is still in the proposal/contract phase, but I have some ethical concerns about it. I'm interested to hear what others would do in my situation.
The researcher I'm working with has requested that I keep the details of his project private, so I'm going to make up an application. Essentially, he's investigating the link (which he suspects is causal) between two physical processes. For discussion, let's say it's the link between levels of radiation exposure (to radon-222) and occurrence of hiccups in adults in the United States. Based on graphical exploration of secondary data, he thinks there is evidence that certain patterns of radiation exposure (e.g., rapid increases in exposure, peaks of exposure, rapid decreases in exposure, etc.) make the occurrence of hiccups more likely. He's asked me to quantify the strength of relationship between radiation exposure events and the occurrence of hiccups and to perform a test to show it is significantly different from zero.
During an initial meeting, I asked him to explain his theory on why these two phenomena should be related. He said he's not an expert in radiation (or physiology) and doesn't have any theory on why they should be related. One of his main goals for the project is to get the results published (assuming there is a quantifiable relationship) so that researchers who are experts in these fields can start developing theories to explain the observed relationship.
My main concern is that he identified the radiation events of interest by looking at the data he wants me to analyze (since he has no governing theory). So, for example, he saw that hiccups occur at almost every occurrence of a peak in radiation, so he labeled peaks as a radiation event of interest. But some occurrences of hiccups happen when radiation exposure decreases rapidly, so he labeled rapid decreases as an event of interest as well. I asked him to create an operational definition for each of the radiation events he believes is related to hiccups, and he did that (e.g., a peak time is the first time of obtaining the maximum value of measured levels of radon-222 after exceeding a threshold of 200 Bq/m3 and before returning to a value below 200 Bq/m3, and is counted as an event of interest for 5 time points before and after the peak time). For the analysis, I would just calculate the strength of association between two binary variables (occurrence of one of the radiation events during a time interval and occurrence of hiccups during that time interval).
It seems like there's no harm in calculating the strength of association between the occurrence of his identified radiation events and whether or not hiccups occurred. However, it's sure to be biased since he's "cherry picking" the events by looking at the historical data. Is it unethical to numerically quantify the strength of the relationship under these circumstances? I've warned the client up-front that to really get a good measure of the strength of the relationship we need to prospectively collect data (or find another secondary data source for verification purposes). However, I think he'll push forward with publication regardless of that warning.
Calculating a p-value for this seems really absurd to me, but the client is certain to push for it if I quantify the strength of the relationship. Should I avoid the project altogether since I suspect the client will either be dissatisfied (if I refuse to calculate a p-value) or misuse the results (if I do calculate a p-value)? Again, I warned the client (in the proposal) that a p-value is not valid in this situation, but I don't think that will head off the problem.
Thanks!
Chris
-------------------------------------------
Christopher Holloman
Ohio State University
-------------------------------------------