Joe,
you (probably) didn't do anything wrong with the first two parts of your R-code. The crucial thing is that there are different MLEs involved here, and that "logistic regression" in Haggstrom's paper is not identical to today's textbook logistic regression.
Note that in Haggstrom's paper, a distribution assumption is made for X given class membership (multivariate normal distribution, equal covariance matrices in each class). You could view this as a joint model for Y and X; the likelihood function is then given by the joint probability of the observed Y and X, and maximization of this likelihood function results in the formulas for the MLE in Haggstrom's paper. (Note that these are closed form solutions.)
What is usually done in logistic regression as we know it (and what you did by coding R's glm routine), is not a joint modelling of Y and X, but just a modelling of Y conditional on X. The likelihood is then just the conditional probability of the observed Y given X. One could call this "conditional MLE" (and Haggstrom actually calls it that way, just have a look at section 5 in his paper). There is in general no closed form solution for this conditional MLE.
MLE and conditional MLE result in different estimators. Haggstrom's idea to use OLS estimation to derive the ML estimator does not translate to the derivation of the conditional ML estimator (though it would be nice, as this would result in a closed form solution).
(A Google search for "difference between logistic regression and discriminant analysis" will give you even more insight.)
Concerning your "manual" calculation of formula (1.13): You use the sample covariance matrix as an estimator of the common covariance in each group; this is not a good estimator, though (think of an example of two groups with variance 0 in each group; if the groups means differ, the pooled variance is always larger than 0), and it's not the ML estimator.
-Hans-
------------------------------
Hans Kiesl
Regensburg University of Applied Sciences
Germany
Original Message:
Sent: 04-03-2016 18:09
From: Jose Maisog
Subject: Logistic Regression and Discriminant Analysis by OLS
Dear colleagues, I recently dug up an interesting 1983 paper by Gus. W Haggstrom, entitled "Logistic Regression and Discriminant Analysis by Ordinary Least Squares" (URL: http://www.jstor.org/stable/1391344). In the paper, it is claimed that one can obtain the MLE coefficients for a dichotomous logistic regression from an "intermediate least squares" (ILS) regression model (Equation 2.1 in the paper):
y_i = alpha + Beta' * x_i + e_i
where y_i is an indicator variable of 0's and 1's, and the ILS regression coefficients are fitted using Ordinary Least Squares. The MLE coefficients can be obtained from the "ILS" coefficients, it is claimed, using fairly simple relations; see Equation 2.3 in the paper.
Intrigued, I tried to confirm the result in R, but to no avail. Attached is a plain text file containing R code, in which I unsuccessfully try out Theorem 1 of the paper (Equation 2.3), as well as a hand-computation of the MLE coefficients (Equation 1.13), using a data set that comes pre-packaged with R ("mtcars"). Where are my errors?
Thanks very much!
Joe
------------------------------
Jose Maisog
Senior Informatics Scientist
Blue Health Intelligence
------------------------------