Discussion: View Thread

A question of statistical ethics

  • 1.  A question of statistical ethics

    Posted 01-22-2014 14:32

    First, I apologize for the long post - this takes a bit of explanation.

    I recently started a project with a new collaborator.  The project is still in the proposal/contract phase, but I have some ethical concerns about it.  I'm interested to hear what others would do in my situation.

    The researcher I'm working with has requested that I keep the details of his project private, so I'm going to make up an application.  Essentially, he's investigating the link (which he suspects is causal) between two physical processes.  For discussion, let's say it's the link between levels of radiation exposure (to radon-222) and occurrence of hiccups in adults in the United States.  Based on graphical exploration of secondary data, he thinks there is evidence that certain patterns of radiation exposure (e.g., rapid increases in exposure, peaks of exposure, rapid decreases in exposure, etc.) make the occurrence of hiccups more likely.  He's asked me to quantify the strength of relationship between radiation exposure events and the occurrence of hiccups and to perform a test to show it is significantly different from zero.

    During an initial meeting, I asked him to explain his theory on why these two phenomena should be related.  He said he's not an expert in radiation (or physiology) and doesn't have any theory on why they should be related.  One of his main goals for the project is to get the results published (assuming there is a quantifiable relationship) so that researchers who are experts in these fields can start developing theories to explain the observed relationship.

    My main concern is that he identified the radiation events of interest by looking at the data he wants me to analyze (since he has no governing theory).  So, for example, he saw that hiccups occur at almost every occurrence of a peak in radiation, so he labeled peaks as a radiation event of interest.  But some occurrences of hiccups happen when radiation exposure decreases rapidly, so he labeled rapid decreases as an event of interest as well.  I asked him to create an operational definition for each of the radiation events he believes is related to hiccups, and he did that (e.g., a peak time is the first time of obtaining the maximum value of measured levels of radon-222 after exceeding a threshold of 200 Bq/m3 and before returning to a value below 200 Bq/m3, and is counted as an event of interest for 5 time points before and after the peak time).  For the analysis, I would just calculate the strength of association between two binary variables (occurrence of one of the radiation events during a time interval and occurrence of hiccups during that time interval).

    It seems like there's no harm in calculating the strength of association between the occurrence of his identified radiation events and whether or not hiccups occurred.  However, it's sure to be biased since he's "cherry picking" the events by looking at the historical data.  Is it unethical to numerically quantify the strength of the relationship under these circumstances?  I've warned the client up-front that to really get a good measure of the strength of the relationship we need to prospectively collect data (or find another secondary data source for verification purposes).  However, I think he'll push forward with publication regardless of that warning.

    Calculating a p-value for this seems really absurd to me, but the client is certain to push for it if I quantify the strength of the relationship.  Should I avoid the project altogether since I suspect the client will either be dissatisfied (if I refuse to calculate a p-value) or misuse the results (if I do calculate a p-value)?  Again, I warned the client (in the proposal) that a p-value is not valid in this situation, but I don't think that will head off the problem.

    Thanks!

    Chris

    -------------------------------------------
    Christopher Holloman
    Ohio State University
    -------------------------------------------


  • 2.  RE:A question of statistical ethics

    Posted 01-22-2014 14:50

    Hello Christopher,

    Well you know, Lies..damn lies...and statistics.  If you are concerned about the validity of the design or other aspects of a research then you don't have to take it on.  It does sound like your client is cherry picking at best. 

    People will try to make cases for anything, especially with correlation.  There is almost perfect direct correlation between the sun rising and roosters crowing, but the roosters don't make the sun rise...and yes, I know the other way around works.

    I also try to avoid clients who need something tomorrow, or have a huge set of problems to address. Because I have enough problems of my own, and I don't need to take responsibility for problems that other people made before I came along! :)

    Just some things to think about.  Follow your gut.

    Best,

    Elaine 

    -------------------------------------------
    Elaine Eisenbeisz
    Owner and Principal Statistician
    Omega Statistics
    -------------------------------------------








  • 3.  RE:A question of statistical ethics

    Posted 01-22-2014 14:59
    Chris,

    Possible guidance can come from ASA's Ethical Guidelines for Statistical Practice, http://www.amstat.org/about/ethicalguidelines.cfm

    Ask yourself this: If he published the results with a big, fat acknowledgment to you for the statistical work, will you feel proud or embarrassed?

    Clearly a p-value is meaningless in this case.  It fails to measure any meaningful probability because the hypotheses were determined after looking at the data.  Providing an uncorrected p-value in such a case seems borderline unethical when you know that it will be misused.  I don't know how to correct the p-value to account for the data snooping, similar to a Scheffe analysis of linear contrasts, when the test is a Pearson or Fisher test on a table specially selected from the data.

    I would decline, and send him to Stan Young's work, an easy example of which is here: http://errorstatistics.com/2013/03/11/s-stanley-young-scientific-integrity-and-transparency/ .  

    Good luck.

    -Tom.

    -------------------------------------------
    Thomas Loughin
    Simon Fraser University
    -------------------------------------------








  • 4.  RE:A question of statistical ethics

    Posted 01-22-2014 15:06

    Hi Chris,
    I completely agree with you. I would tell him that a statistical test is not warrented since there was no predesigned experiment and no a priori hypothesis. Perhaps advise him to write up his serendipitous observation as a letter to the editor, explaining everything he did to make the association, and leave it at that. 
    Bonita

    -------------------------------------------
    Bonita Singal
    Associate Director for Clinical Research
    St. Joseph Mercy Health System
    -------------------------------------------








  • 5.  RE:A question of statistical ethics

    Posted 01-22-2014 15:12
    My view is that it is OK to do what your colleague proposes, provided that you are completely open about what you are doing.

    The results won't be very secure, but still might be useful. Exploring data and noting things about it is part of statistics.  The real danger is that the media will quote your results without your qualifications. This happens a lot. If the real result (as opposed to your invented example) is highly controversial or likely to cause harm, then you might consider the idea that bad data can be worse than no data.

    However..... there are ways to strengthen the research quite a bit. If the radiation events are relatively common, then you can continue to collect data and see if the pattern continues. Or if you can get data from another location, that could work too.

    That's my .02.

    Peter

    -------------------------------------------
    Peter Flom
    -------------------------------------------








  • 6.  RE:A question of statistical ethics

    Posted 01-22-2014 16:01
    I'd be inclined to agree with Peter that it's okay to proceed. I would stand firm on defining the methodology for "variable selection"...i.e. make sure the reader knows you (he) did a healthy round of "exploratory data analysis"...which pointed to radiation changes, etc. Science starts with the observation of a phenomena...followed by gut checks of varying rigor. I think you can make responsible forward progress in this scenario.

    I would also encourage you to attempt to find internal follow-up hypothesis tests which challenge with alternative explanations of the phenomena...Often I will say something: "Let's say that comes out significant, that's one sentence for your results...what else are you going to do? How do we kick the tires on this thing? What might make it look like that, when it's not true (confounding)?"

    ...trying to lead the client into a theoretical mode of thinking again.  Of course, a picture of a horse staring at a puddle comes to mind.

    I also find it rewarding when I can contribute to this process, myself, after conducting enough of a literature review to come up with some plausible theories of my own. This shows them you're invested in their project and may provide a stimulus where they may challenge what you propose on some grounds...perhaps even theoretical grounds! (gasp)

    ...and by all means...anything you can do to understand hiccups is worth the risk!  ;-)



    -------------------------------------------
    Jason T. Machan
    Director, Lifespan Biostatistics Core,
    Lifespan Hospital System
    Research Scientist, Biostatistics, Research
    Rhode Island Hospital
    Assistant Professor, Departments of Orthopaedics and Surgery
    The Warren Alpert Medical School, Brown University
    Director Biostatistics Externship, Adjunct Assistant Professor, Department of Psychology
    University of Rhode Island

    -------------------------------------------








  • 7.  RE:A question of statistical ethics

    Posted 01-22-2014 16:07
    I think looking to an epidemiological approach might give him first steps i.e. if it is the change in radiation that provokes the hiccups, then the control would be data from the same region/time that is stable.  This is how smoking and lung disease started.  Epi might be able to provide a frame work to explore association without implying causality.

    -------------------------------------------
    Janet McDougall
    President
    McDougall Scientific Ltd
    -------------------------------------------








  • 8.  RE:A question of statistical ethics

    Posted 01-22-2014 16:36
    There are well-established methods of the exploration-confirmation approach which should be considered.

    1) Splitting sample into 2/3-1/3. 2/3 for pattern, 1/3 for confirmation. This can be done by randomly selecting, and doing it several times.
    2) Determining whether additional data can be collected. The collection of confirmatory data which supports the initial observation considerably strengthens the case.

    Essentially, in my view of science, there are three steps. 1) What is the amount of X? 2) Is the amount of X related to the amount of Y? 3) Is the relationship of X and Y causal or non-arbitrary in some manner?

    Thus, establishing 2- a relationship - is important. The degree to which the relationship is arbitrary and based on data-driven (of the same data) categorization is clearly problematic, and one that you properly are leery of. Yet noting a relationship is not inappropriate.

    The SEM literature is a good place to look. With SEM, you collect a large amount of data and compute covariances. You attempt to determine the support of the covariance structure for an a priori structure. Problems arise when 1) there is little or no a priori structure - this is the factor analysis paradigm or 2) the structure obtained is different in minor by theoretically important ways.

    -------------------------------------------
    Paul Thompson
    Director, Methodology and Data Analysis Center
    Sanford Research/USD
    -------------------------------------------








  • 9.  RE:A question of statistical ethics

    Posted 01-22-2014 17:10
    This seems like a good example of the obsession with null hypothesis tests and p-values, especially in publishing, not serving the need for data-driven, exploratory research very well.

    I don't see anything unethical about quantifying the association between these phenomena, and I'm not sure the p-value is completely meaningless here. A non-significant p-value would mean something, at least. Question for the group: Could the p-value be presented simply as a measure of inconsistency between the sample data and the (a posteriori) null hypothesis?

    The key, I think, would be to communicate openly about the exploratory nature of the study, and it sounds like you may have limited influence on that front. 

    This reminds me of the practice of choosing regressors based on their correlation with the response variable. 

    -------------------------------------------
    Vincent Staggs
    Research Assistant Professor
    Department of Biostatistics
    University of Kansas Medical Center
    -------------------------------------------







  • 10.  RE:A question of statistical ethics

    Posted 01-22-2014 17:21
    Ramsey and Schafer (1997) have an "Example of a Hypothesis Based on How the Data Turned Out" in section 6.5.2 of their book, The Statistical Sleuth, that examines this very issue. It speaks to the dangers of testing a hypothesis generated after observing the data. It is also written in such a way that I'm sure your client would understand it.

    Regarding the ethics of agreeing to calculate a p-value knowing that it is very likely to be miscontrued and/or misused, I feel it is incumbent on us as statisticians to not commit the same errors we advise non-statisticians to avoid. If the client insists that it be done, let them find someone else to do it.

    Ramsey, F. L., and D. W. Schafer. 1997. The Statistical Sleuth. Duxbury Press, Belmont, CA, USA.


    -------------------------------------------
    Manuela Huso
    Research Statistician
    US Geological Survey
    -------------------------------------------








  • 11.  RE:A question of statistical ethics

    Posted 01-22-2014 17:28

    I see that you are from an academic organization, so I recommend that you convince the investigator to put forth a protocol for expedited review by the IRB.  The reasons for this include:  clarification of their ideas in a written protocol, justification of the hypothesis with supporting evidence from a lit search, drafting the statistical methods section, and overall justifying the project which is needed in submitted the manuscript.  Other benefits include having formally articulated the hypothesis ('ownership' by investigator), review by the multidisciplinary team of IRB members who can possibly better clarify the mechanism of action, the protocol will serve as a foundation for the manuscript, and if the IRB approves, you can feel more comfortable going ahead in doing the analysis. Also, a journal would be more likely to accept a research paper if the project had IRB approval.

    Also, from my experience designing studies and analyzing data in an academic medical center setting, it is rare that their are no potential confounding factors that enter into this 'bivariate' relationship--a lit search by both you and the investigator will likely help determine this.


    -------------------------------------------
    Katherine Freeman
    President
    EXTRAPOLATE
    -------------------------------------------








  • 12.  RE:A question of statistical ethics

    Posted 03-12-2014 13:59
    As I get older, I get more mellow about things like this. Calculate the p-value and then warn the client to be sure to mention in any report or publication that the hypothesis was generated post hoc. Stress that failure to state that this was a post hoc hypothesis leaves him/her open to a charge of fraud. Be sure that your client knows and appreciates that anything that mentions your name without your prior review also constitutes fraud.

    The sin in research is not in publishing something with lots of limitations. The sin is in pretending that it is something more than it really is. As long as the report clearly states the post hoc nature of the hypothesis and discusses the limitations associated with it, you should be fine. Use phrases like "this is an exploratory effort" or "the results need to be verified in a separate data set" or whatever. Some naive people will ignore your limitations, but if we all refused to publish articles that might be misinterpreted by naive people who ignore limitations, nothing would get published.

    If you find out after the fact that something was published with your name on it and without you reviewing it, you have plenty of options to pursue. Write to the journal and talk to the ethics or compliance officer at the place where your client works.

    I'd only turn down the work if you thought this person was a loose cannon who wouldn't mind getting in trouble with everyone involved in the process. Most people are comfortable with reasonable requests to review things. Most people will also respect your request to leave your name out if you can't agree on appropriate language in the report.

    -------------------------------------------
    Stephen Simon
    Independent Statistical Consultant
    P. Mean Consulting
    -------------------------------------------








  • 13.  RE:A question of statistical ethics

    Posted 01-23-2014 13:13


    -------------------------------------------
    William Seaver
    -------------------------------------------
    My experience with such post hoc hypothesis is very similar to that of Simon.  However, I would say that the tendency for such a situation seems to come up in the first 5 years or so of a consulting business.  As one has been in the business awhile, one can make it clear upfront that such results won't be accommodated or supported.  For instance, in that same direction, I have clients that want traditional approaches on non-random samples, but the proper approaches demand randomization/permutation test approaches.  I charge most for such but the data demands it.  May look at some things in parallel, but ethics drives a lot of what we do!