ASA Connect

View Only

Back to eGroups

Expand all | Collapse all

A classroom discussion problem or quiz question?

1. A classroom discussion problem or quiz question?

0 Recommend
Ralph O'Brien
Posted 12-17-2018 10:45
Saw this on a TV ad and can't resist sharing. At first, the plot is pretty stupid. However, what labeling of the X-axis would make it reasonable? Strong hint: 770/148 = 5.2; 4100/770 = 5.3. Of course, a TV ad would never do that.

------------------------------
Ralph O'Brien
Professor of Biostatistics (officially retired; still keenly active)
Case Western Reserve University
http://rfuncs.weebly.com/about-ralph-obrien.html
------------------------------
2. RE: A classroom discussion problem or quiz question?

0 Recommend
Emil Friedman
Posted 12-18-2018 13:50
The X-axis must be logarithmic.

------------------------------
Emil M Friedman, PhD
emilfriedman@gmail.com
http://www.statisticalconsulting.org
------------------------------

Original Message
3. RE: A classroom discussion problem or quiz question?

0 Recommend
Ralph O'Brien
Posted 12-19-2018 11:53
Correct, Emil. Thanks for responding.

If the X-axis had been labeled, it could have read "Roofing Costs (log-scaling)". With respect to measurement properties, this implies that roofing cost is ratio scaled, meaning that the difference between $x and $2x is the same "true impact" as the difference between $2x and $4x. This is not so if I were paying.

But it is often justifiable with biological measurements, such as concentrations or counts. (Depends on various factors.) And when used wisely, log transforming variables and then exponentiating the results often leads to summary statements that are easier for everyone to comprehend, remember, and act upon.

Here's a example based on using the Welch t test. A sentence in the abstract might be, "The geometric mean for group A was 39.2% greater than for Group B; 95% CI: [12.5%, 72.4%]." A little more thought and effort by us leads to more straightforward communications to our investigators and their audiences.

n <- c(38, 37)
group <- rep(c("A","B"),n)
Median1 <- 50 # Any positive value yields same ratio of geometric means.
TrueEffect <- 1.40 # Group A's geometric mean is 40% greater than group B's.
RelSpread95 <- 6 # ratio of 97.5 and 0.025 quantiles for Y ~ logNormal.
SD.logY <- log(RelSpread95)/(1.96*2) # SD of log(Y) ~ Normal
set.seed(170322)
Y = exp(c(rnorm(n[1], log(Median1*TrueEffect), SD.logY),
rnorm(n[2], log(Median1), SD.logY)))
(Welcht <- t.test(log(Y) ~ group))
# Welch Two Sample t-test
#
# data: log(Y) by group
# t = 3.0919, df = 69.573, p-value = 0.002861
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# 0.1174653 0.5445459
# sample estimates:
# mean in group A mean in group B
# 4.307968 3.976962

# ratio of geometric means
exp(Welcht$estimate[1] - Welcht$estimate[2])
# 1.392368

# 95% CI for ratio of geometric means
exp(Welcht$conf.int)
# [1] 1.124643 1.723825

------------------------------
Ralph O'Brien
Professor of Biostatistics (officially retired; still keenly active)
Case Western Reserve University
http://rfuncs.weebly.com/about-ralph-obrien.html
------------------------------

Original Message
4. RE: A classroom discussion problem or quiz question?

0 Recommend
Emil Friedman
Posted 12-19-2018 12:23
| view attached
Log-scaling is also common when looking at polymer molecular weight distributions. It also means that terms like "bimodal" can be ambiguous unless one specifies what sort of scale is used for the X-axis (see
"Modality of Molecular Weight Distributions", Emil M Friedman, Polymer Engineering and Science, 30, 569 (1990). http://dx.doi.org/10.1002/pen.760301002, attached).

------------------------------
Emil M Friedman, PhD
emilfriedman@gmail.com
http://www.statisticalconsulting.org
------------------------------

Attachment(s)

Modality of MW Distributions.pdf 146 KB 1 version

Original Message
5. RE: A classroom discussion problem or quiz question?

0 Recommend
Martha Smith
Posted 12-20-2018 17:42
For more on lognormal distributions and more examples where they are appropriate, see https://web.ma.utexas.edu/users/mks/ProbStatGradTeach/LognormalDistributions1.pdf (a handout I used in a summer course for secondary math teachers).

------------------------------
Martha Smith
University of Texas
------------------------------

Original Message
6. RE: A classroom discussion problem or quiz question?

0 Recommend
Ajit Thakur
Posted 12-22-2018 10:48
In preclinical sciences (such as toxicology, immunology, biochemistry, etc.) there are a lot of situations arise where investigators and statisticians feel the need for logarithmic transformation of the ordinate, abscissa, or both. There are certain advantages for such transformations during analysis of variance/covariance and regression analysis to determine a dose response (i.e. trend). Here are some of the reasons:

1. Linearization of the dose-response curve for using in course of dose-extrapolation/interpolation. An exponential response can be linearized using the log-logistic or log-probit transformations as is generally done in estimating median lethal/effectic dose and confidence intervals. In biochemistry and some other biological systems one deals with simple linear decays of radioactivity and other particles expressed as a first order linear ordinary differential equation such as dx/dt = - k x(t) which produces an exponential solution x(t) = Cexp(-kt) and after logarithmitizing ln(x(t)) = lnC - kt, where k is the decay rate (a constant) and C is the constant of integration.
2. Getting rid of (or minimizing) heteroscedasticity of error variances which is a requirement for standard univariate analyses.
3. Producing equal or approximately equal spacing of the X-axis using logarithmic transformation of the said axis. Often the design uses geometric or other such unequal spacing (such as 0, 1, 10, 100,... or some such). Equal spacing of the independent variable helps to bring about optimal statistics and is easier to handle programmatically.

There may be other reasons for such transformations. However, there may arise some problems that people do not always point out. For example:

1. How to deal with the old evil '0' which implies a control in such fields. Some standard statistical packages add a scaling or fudge factor (f) to the "dose" metamer so that instead of 0, 1, 10, 100,... etc. one now deals with f, 1+f, 10+f, 100+f, ... etc. Since 'f' is a constant scaling factor, it has no contribution to the variances.
2. If your data actually are derived from a normal (or at least a symmetric) distribution, by logarithmitizing them you are producing a log-normal or some other asymmetric distribution. In other words, although you are producing homoscedasticity of error variances, you are producing a long-tailed (skewed or asymmetric) distribution. It is always an excellent idea to either graphically (test for residuals?) or otherwise test for normality under such transformations.

In summary then, there may not be a magic solution- it is a case-by-case solution.

------------------------------
Ajit K. Thakur, Ph.D.
Retired Statistician
------------------------------

Original Message
7. RE: A classroom discussion problem or quiz question?

0 Recommend
Ralph O'Brien
Posted 12-23-2018 09:14
Responding to:

How to deal with the old evil '0' which implies a control in such fields. Some standard statistical packages add a scaling or fudge factor (f) to the "dose" metamer so that instead of 0, 1, 10, 100,... etc. one now deals with f, 1+f, 10+f, 100+f, ... etc. Since 'f' is a constant scaling factor, it has no contribution to the variances.

f is not "constant scaling factor," it is an additive factor. While the standard deviation of Y+f, sd(Y+f), does not depend on f, the issue here is about sd(log(Y+f)), which certainly depends on f.

For example:

# Generate X~logNormal, but rounding to 0.1 creates x = 0.00 values.
x <- round(exp(rnorm(100000, 0.1, 1.1)),1)
f <- c(0.00, 0.001, 0.005, 0.01, 0.05, 0.10)
SD <- sd(log(x+f[1]))
for (i in 2:length(f)) { SD <- c(SD, sd(log(x+f[i]))) }
data.frame(f,SD=round(SD,3))
# f SD
# 1 0.000 NaN
# 2 0.001 1.138
# 3 0.005 1.110
# 4 0.010 1.095
# 5 0.050 1.029
# 6 0.100 0.973

Given below is an R function that handles the issue by Winsorizing the X = 0.00 values, only. It is completely data driven, i.e. there is no need to arbitrarily define a value (here, f) for some patch-up repair like log(X + f).

logWin0 <- function(X) {
# Returns log(X) after Winsorizing X = 0.00 values so that log(0.00) is
# set as follows. Let X.u be the unique values of X. Thus, X.u[1] = 0.00; X.u[2]
# < X.u[3] are the next smallest unique values of X. To equalize the spacing
# between log(X.u[1]), log(X.u[2]), and log(X.u[3]), make
# log(X.u[1]) = 2*log(X.u[2]) - log(X.u[3]).

# This is equivalent to Winsorizing X = 0.00 to X' = w*X.u[2], where w =
# X.u[2]/X.u[3] < 1. So, if X.u[2] and X.u[3] are far apart, then X' is closer
# to 0.00. If X.u[2] and X.u[3] are nearly equal, then X' will be slightly
# less than X.u[2]. See examples.

# Ralph O'Brien, 23 December 2018, obrienralph@gmail.com

if (any(X < 0)) { stop("At least one X is negative.") }
logX <- numeric(length(X))
X.u <- unique(X)
if (length(X.u) > 2) {
w <- X.u[2]/X.u[3]
} else { stop("X does not have at least 3 unique values.") }
logX[X!=0] <- log(X[X!=0])
logX[X==0] <- log(w*min(X[X!=0]))
return(logX)
RunExamples <- FALSE
if (RunExamples) {
# Generate 100 X ~ logNormal observations, but rounding to 0.1 creates
# one X = 0 value.
set.seed(170322)
(x <- sort(round(exp(rnorm(100, 0.1, 1.1)),1)))
# [1] 0.0 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.4 0.4 0.4 0.4 0.4
# [14] 0.4 0.4 0.4 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
# [27] 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.7 0.7 0.7 0.7 0.7 0.7
# [40] 0.7 0.7 0.8 0.8 0.9 0.9 1.0 1.1 1.1 1.1 1.1 1.1 1.2
# [53] 1.3 1.4 1.4 1.4 1.4 1.5 1.6 1.6 1.6 1.7 1.7 1.8 1.9
# [66] 1.9 1.9 2.0 2.0 2.1 2.4 2.4 2.4 2.4 2.5 2.5 2.5 2.5
# [79] 2.6 2.9 3.1 3.1 3.3 3.5 3.6 3.9 3.9 4.1 4.1 4.3 4.8
# [92] 5.0 5.8 5.9 6.6 7.4 8.2 21.0 21.5 34.8

# Example 1. Use x as generated. Note single X = 0 value. The first positive
# unique value (0.10) is half of the second unique value (X = 0.20), so
# x = 0 is Winsorized to x' = 0.50*0.10 = 0.05. No other values are changed.
log.X <- logWin0(x)
data.frame(X=x[1:6], X.Win=exp(log.X[1:6]), logX.Win=log.X[1:6])
# X X.Win logX.Win
# 1 0.0 0.05 -2.995732
# 2 0.1 0.10 -2.302585
# 3 0.1 0.10 -2.302585
# 4 0.2 0.20 -1.609438
# 5 0.2 0.20 -1.609438
# 6 0.2 0.20 -1.609438

# Example 2. Make first two positive x values different but close together.
x. <- x
x.[2] <- 0.90*x.[3]
log.X <- logWin0(x.)
data.frame(X=x.[1:6], X.Win=exp(log.X[1:6]), logX.Win=log.X[1:6])
# X X.Win logX.Win
# 1 0.00 0.081 -2.513306
# 2 0.09 0.090 -2.407946
# 3 0.10 0.100 -2.302585
# 4 0.20 0.200 -1.609438
# 5 0.20 0.200 -1.609438
# 6 0.20 0.200 -1.609438

# Example 3. Set first two positive x values different and far apart.
x. <- x
x.[2] <- 0.10*x.[3]
log.X <- logWin0(x.)
data.frame(X=x.[1:6], X.Win=exp(log.X[1:6]), logX.Win=log.X[1:6])
# X X.Win logX.Win
# 1 0.00 0.001 -6.907755
# 2 0.01 0.010 -4.605170
# 3 0.10 0.100 -2.302585
# 4 0.20 0.200 -1.609438
# 5 0.20 0.200 -1.609438
# 6 0.20 0.200 -1.609438
} # end RunExamples
} # end logWin0()

------------------------------
Ralph O'Brien
Professor of Biostatistics (officially retired; still keenly active)
Case Western Reserve University
http://rfuncs.weebly.com/about-ralph-obrien.html
------------------------------

Original Message

ASA Connect

A classroom discussion problem or quiz question?

Ralph O'Brien12-17-2018 10:45

Emil Friedman12-18-2018 13:50

Ralph O'Brien12-19-2018 11:53

Emil Friedman12-19-2018 12:23

Martha Smith12-20-2018 17:42

Ajit Thakur12-22-2018 10:48

Ralph O'Brien12-23-2018 09:14

1. A classroom discussion problem or quiz question?

2. RE: A classroom discussion problem or quiz question?

3. RE: A classroom discussion problem or quiz question?

4. RE: A classroom discussion problem or quiz question?

5. RE: A classroom discussion problem or quiz question?

6. RE: A classroom discussion problem or quiz question?

7. RE: A classroom discussion problem or quiz question?

Contact Us

Membership

Privacy

Follow Us

ASA Connect

A classroom discussion problem or quiz question?

Ralph O'Brien12-17-2018 10:45

Emil Friedman12-18-2018 13:50

Ralph O'Brien12-19-2018 11:53

Emil Friedman12-19-2018 12:23

Martha Smith12-20-2018 17:42

Ajit Thakur12-22-2018 10:48

Ralph O'Brien12-23-2018 09:14

1. A classroom discussion problem or quiz question?

2. RE: A classroom discussion problem or quiz question?

3. RE: A classroom discussion problem or quiz question?

4. RE: A classroom discussion problem or quiz question?

5. RE: A classroom discussion problem or quiz question?

6. RE: A classroom discussion problem or quiz question?

7. RE: A classroom discussion problem or quiz question?

Related Content

RE: A classroom discussion problem or quiz question?

alpha of 0.10 for exploratory study

WHO releases AI ethics and governance guidance for large multi-modal models

Stat Tricks of the Month

World Statistics Day

Contact Us

Membership

Privacy

Follow Us