ASA Connect

 View Only
  • 1.  Seeking advice about how and where to publish an article on an application of statistics in public policy

    Posted 10-30-2023 20:27

    Hi!  I'm an old geezer who careened through a brief stint in academia back when Gorbachev sat atop the USSR and Mandela languished in a prison cell in apartheid South Africa.  (Wow, I guess some things have improved!)  It all came crashing down for me in the perish partition of the publish or perish paradigm.  The last of my modest handful of refereed publications appeared in the Journal of Chemical Physics a few months after the Tiananmen Square protests.  It's now but a distant memory.

    Recently I took an interest in a problem related to fair lending laws.  Banks and other financial institutions often use credit scoring models to help determine whether to extend credit offers to customers.  The most well-known such model is the ubiquitous FICO score from the Fair Isaac Company.  There are other vendors of credit scores, and many institutions develop their own in-house models.  The laws stipulate that the decisions must be fair, and not discriminate against any groups distinguished by attributes such as gender, race, or age.  In particular, the models must be tested for disparate outcomes.  A common way of doing this, which has been deemed suitable for use in litigation, is to take the Standardized Mean Difference (SMD) between a "control" group and a "test" group, e.g., male vs. female, white vs. black, "below 62" vs. "62 and above."  If the SMD exceeds a certain threshold value, typically 0.3, then remediation actions may need to be taken to avoid unfair treatment.

    For more information about the legal applications, see: 
    1088_Upstart Initial Report - Final.pdf (relmanlaw.com)

    1180_PUBLIC Upstart Monitorship_2nd Report_FINAL.pdf (relmanlaw.com)
    PUBLIC Upstart Monitorship 3rd Report FINAL.pdf (relmanlaw.com)

    SMD is defined as:           (mean[1] – mean[2]) / (pooled standard deviation of 1 and 2),     and it is usually measured by using the population values of each quantity; this estimator is sometimes known as Cohen's D.

    The SMD formula looks deceptively familiar, like the formula used for the difference of means significance tests.  But in the difference of means tests, the pooled standard deviation is the standard deviation of the mean.  The mean values are famously asymptotically normal in most cases of interest.  However, SMD uses the standard deviation of the underlying distribution(s), which may not be normal, and thus becomes harder to interpret.  And here's where things really get tricky.

    Credit scores like FICO are often derived from models that predict likelihood of default.  The scores are produced by transforming the probabilities to the "log-odds" scale:    a + b*log(p / (1-p)),        where a and b are suitably chosen constants.  The log-odds scale values are more likely to be normally distributed than the probabilities, but the opposite could occur, and it's possible that neither set of scores is normal.  Furthermore, the SMD of the log-odds version will not be the same as the SMD of the probabilities, and it is possible for the log-odds SMD to be above the threshold value and the probability SMD to be below, or vice versa.  So which SMD do you believe?  It's important to make a determination, because there are legal ramifications, and there is no set legal standard of which version of the model output, probabilities or log-odds scale scores or some other scale, should be used for the SMD test.

    So here is my proposal to find a single SMD equivalent value that would be equal to the SMD value if the two groups were normally distributed; it depends only on the relative rank ordering of the two groups, so it is invariant under monotonic transformation, such as between probabilities and log-odds scales scores.  The value will not be exactly the same as the Cohen's D estimator, because it relies on a mild assumption about the asymptotic properties of the normal distribution, namely that the mean equals the median.

    Suppose the size of group one is N1 and the size of group two is N2, where group one has the lower median value.  Let K1 be the number of members of group one with a value below the group two median, and let K2 be the number of members of group two with a value below the group one median.  Since we're assuming both groups are normally distributed, without loss of generality, assume that group one has a standard normal distribution with mean = zero and standard deviation = one.  Then, if C(x) is the cumulative standard normal distribution function, and if m2 is the value such that C(m2) = K1 / N1, then m2 will be considered the mean value of group two.  What is the standard deviation of group two?  If z2 is the value for which C(z2) = K2 / N2, then z2 = (0 – m2) / s2, where s2 is the standard deviation of group two.  Now we have the means and standard deviations and population sizes of both groups, so we can compute the pooled standard deviation and then, the SMD. 

    I have tested this computation against normally distributed synthetic data and it gets pretty close to the Cohen's D estimator as the population size increases.

    So now I'm wondering what would be the best way to publish this method?  Your suggestions are most welcome.  Thanks!



    ------------------------------
    Talbot Katz
    ------------------------------


  • 2.  RE: Seeking advice about how and where to publish an article on an application of statistics in public policy

    Posted 10-31-2023 08:48

    The ASA journal Statistics and Public Policy would be your best bet. It is open source so everyone will be able to read your research without a paywall.



    ------------------------------
    David Marker
    Senior Statistician
    Retired
    ------------------------------



  • 3.  RE: Seeking advice about how and where to publish an article on an application of statistics in public policy

    Posted 10-31-2023 10:26

    Hi @David Marker!  Thank you for the excellent suggestion, that sounds like a good venue.  Now I just have to see if I can figure out how to submit (and whether  I qualify to do so).



    ------------------------------
    Talbot Katz
    ------------------------------