ASA Connect

 View Only
  • 1.  Single-cell analyses - suggestions requested

    Posted 02-21-2024 12:47

    I am working with a researcher who has done single-cell flow cytometry. She has 6 flasks with 3 different treatments (each treatment repeated in 2 flasks), and each flask contains 5,000 cells. This set up was then repeated, for 12 total flasks. The flow cytometry intensities were "normalized" to min-max for each flask.

    The flasks are obviously (to me) the experimental unit, but I am a bit at odds at the best way to compare intensities between treatments. Is this a case for a mixed model with flask as random variable? The researcher is concerned that this type of analysis wipes out the advantage of single-cell analysis (60000 data points vs 12 EU).  The measurements in each flask are very skewed right, and log transformation still results in some heavy tails.

    Thoughts? Suggestions? Recommended manuscripts?


    Barbara Graham
    Colorado State University

  • 2.  RE: Single-cell analyses - suggestions requested

    Posted 02-22-2024 09:14

    One simple, perhaps old-fashioned, step would be to compare the responses within the treatment groups between the flasks that were handled similarly. If they are shown to be different, the argument for a flask as unit or some kind of mixed models approach is strengthened. 

    If flasks within a condition are entirely non-significantly different, one could make an argument for treating flask as a fixed factor, which preserves the advantage of many cells.

    This is what we would have done 30 years ago!  But even today, I am concerned that with only a few replications of each condition (and would the second set have to also be treated as a random factor, so an n of 2 per cell?) that even our fine modern methods are going to fail. Mixed models depends on enough units to meaningfully estimate between group variability. But this is not my expertise so maybe there are enough here across 6 groups of 2.


    Edward Gracely
    Associate Professor
    Drexel University

  • 3.  RE: Single-cell analyses - suggestions requested

    Posted 02-23-2024 13:31

    Dear Barbara:

    I would consider the cells as the unit of observation, so yes, you have tens of thousands of them. However, you have clustered data, with flask being the cluster, ie, the measurements within the same flask are correlated.

    One option is as you say, a mixed-effects model. The RE for flask will induce the within-flask correlation.

    Another option is to use a multivariate regression model, with flask as the independent unit and the cells within it as the repeated (correlated) observations. In this case, you will model the within-flask correlation directly (eg, via an exchangeable correlation matrix).

    If you use SAS, the 1st option is w/ the random statement and the 2nd option is w/ the repeated statement.

    Re the issue of skewed data, I can't really say without actually really knowing the data. Definitely a log-transform (assuming you don't have lots of zero values).

    Then, if you have heteroskedasticity (eg, different variability for different treatments), you can account for it in the residual variance. In SAS, you can use the repeated statement again (with the 'group=' option).

    If you truly still have skewed conditional distribution of Y|Tx (even after log transformation), you might want to look into robust regression methods or median regression. However, I am not sure if those methods have been extended to include REs. If not, you could use still use those methods, but then apply a GEE-type sandwich variance instead of the model-based one.



    Constantine Daskalakis, ScD
    Thomas Jefferson University, Philadelphia, PA

  • 4.  RE: Single-cell analyses - suggestions requested

    Posted 03-02-2024 17:58


    I agree with you and Constantine. The flasks are the experimental units, and the cells are the observational units. 

    I like your idea of a mixed-effects model. I vote for two mixed effects, one for flask within repetition and one for repetition. I think you have to explain to the investigator that the 60,000 data points, although real, are misleading because they are not independent. As Constantine indicated, you have 12 clusters of 5,000 observations per cluster, not 60,000 independent observations. I don't know if this analogy will work on your investigator, but the 12 flasks with 5,000 cells per flask are much like 12 pregnant rats that one treats before they give birth so that one can see what happens to their offspring from the in-utero exposure. Ask the investigator if she can really consider different pups from the same litter to be independent. 

    However, although I like your idea of a mixed-effects model, all that it addresses will be how the means of the 5,000 cells/flask vary with treatment. But what if the treatment also affects variability? What if some treatments make the cell expressions tightly clustered around their means while other treatments make the cell expressions really spread out? Or how about skewness? What if some treatments make the cell expressions really right-skewed while other treatments make for roughly symmetric cell-expression distributions? 

    For that reason, I'd like to propose a second approach, the heart of which is this: The 5,000 cells per flask are not just 5,000 observations, they are a distribution. That distribution has not only a mean, but also a standard deviation (SD), a skewness, a kurtosis, and various percentiles of potential interest. Each of which will from flask to flask. My proposal is to summarize the distribution within each flask using the above summary measures, and then to treat each summary measure as a flask-level outcome in the analysis. If you want to, you should be able to use the Repeated statement in the SAS Mixed Procedure to analyze flask means, SDs, skewnesses, etc., as components of a vector-valued outcome. 

    Did you say that the investigator normalized the flow-cytometry intensities to min-max for each flask? On the one hand, ouch, I wish she hadn't done that, but on the other hand, hmmm, each flask must now have a minimum of zero, a maximum of one, and 4,998 values in between that could maybe be modelled using a beta distribution. Hmmm....   

    Good luck. It sounds like you have an interesting problem. 

    Eric Siegel, MS
    Biostatistics Project Manager
    Department of Biostatistics
    Univ. Arkansas Medical Sciences