Message Image  

SSPA Blog: New programming languages and new statistical techniques: Irrelevant or necessary?

By Rick Wicklin posted 03-07-2012 16:01

I like to think of myself as "current" regarding statistical programming, but when I read blogs and attend talks by younger researchers, I am amazed by the number of newer computer languages that are in vogue.

Of course, "newer" depends on how old you are! For many programmers, "newer" means either "since I left school" or "since I arrived at my current job." For me, newer languages include Groovy, Haskell, Julia, and Lua, just to name a few.

Crista Videira Lopes, an academic researcher in programming languages, recently wrote a long but interesting essay on recent programming languages. She makes several interesting claims:
  • Lopes claims that "a considerable percentage of [popular] new languages... were designed by ... kids with no research inclination, some as a side hobby, and without any grand goal other than either making some routine activities easier or for plain hacking fun."
  • Lopes argues that "there appears to be no correlation between the success of a programming language" and the "deep thoughts, consistency, rigor" that comes from a programming language that has been designed by a professional researcher. 
  • Lopes states that "one striking commonality in all modern programming languages, especially the popular ones, is how little innovation there is in them! Without exception, ... they all feel like mashups of concepts that already existed in programming languages in 1979, wrapped up in their own idiosyncratic syntax."
  • Lopes decries language proponents who claim "improved software development productivity...without providing any evidence for it whatsoever." In particular, he rails against claims such as "Haskell programs have fewer bugs because Haskell is...."

Lopes does concede that a language that "addresses an important practical need" can become popular, regardless of whether it is professionally designed. 

There are interesting parallels between people's attitudes about new programming languages and people's attitudes about new statistical methods.  I sometimes hear statisticians rail against newer data mining  methods as "black boxes" that are produced by the computer science or machine learning communities. What are the complaints? Well, in analogy with Lopes's arguments, here are some arguments against some newer predictive techniques:

  • They are created by people without statistical research backgrounds.
  • They can become successful in spite of the fact that they are not the product of "deep thoughts" and "rigor."
  • They are mashups of ideas that existed previously, but with their own idiosyncratic terminology.
  • They claim improved prediction or classification without providing rigorous proofs.

The opposite argument (that statistics need not be constrained by rigor) is presented a 2001 article, "Statistical Modeling: The Two Cultures," in which Leo Breiman (famous for his work on classification and regression trees, bagging, and random forests) criticizes the statistical community for its commitment to data models. Beiman states that "this commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems." Breiman says that "statisticians need to be more pragmatic. Given a statistical problem, find a good solution, whether it is a data model, an algorithmic model...or a completely different approach." With minor modification, his arguments also apply to new programming languages: given a programming problem, find a language that helps you solve it easily.

The Breiman article is followed by criticisms by Sir David Cox and Brad Efron, who defend traditional statistics. Efron's comments begin: "At first glance Leo Breiman’s stimulating paper looks like an argument against parsimony and scientific insight, and in favor of black boxes with lots of knobs to twiddle. At second glance it still looks that way." To me, this sounds like an argument Lopes might favor.

Where do you fall in this spectrum? Are you a fervent proponent of new programming languages or has it been a while since you last learned a new language?  Do you gravitate to new data mining techniques or do you favor the statistical rigor of logistic regression and mixed models?  What arguments do you use to justify your choices?




03-14-2012 19:41

Funny, we were just talking about this during lunch today. I've been using javascript more often because I am collecting data as students play a game and it is much better for game programming. I can't even imagine trying to write a game in SAS. When given a choice, I'd be more on the logistic regression spectrum for analysis because in my experience the newer methods add very little in terms of improved prediction. Yes a monte carlo study might show that method A is better than B but if it is only slightly better and B is a lot easier to explain to a lay audience, a lot better known, then I'm going with B. I sometimes have clients that want the shiniest new technique out of the box and haven't a clue how it works or how to interpret it. I do whatever is in the contract even though I think they'd be better off with an ANOVA they understood. I know why they want the latest data mining - neural network - propensity score whiz bang. Because a lot of academic journals seem to favor the more complex models whether needed for the specific question or not. But that's a whole new soap box.

03-09-2012 12:38

IMO, most (if not all) new languages are created to solve a problem that exists in other languages. In other words, since language X doesn't do function A well, I'll create language Y that does. And of course, while solving that problem it creates other which propagate other new languages.
It actually appears to follow the same route as religion. Back 'in the day', nobody had bibles (before printing press) and all interpretations had to come from 'the church.' Once bibles became mass-produced, then everybody was free to interpret it according to their own values. Hence, 'protestant' (protesting) religions were born and derivations of them continue to this day.
Now that everybody has laptops and the tools to create their own languages, they do so and we have thousands of 'languages' out there.
Nobody can learn them all, so there is a problem out there in terms of fracturing. You're either a language X programmer, or language Y, or... Really difficult to be proficient in more than a couple.
My hope is that we teach computers how to do all the translation so we can *write* in any language we prefer and read it the same way. I write in Java, you read in Ruby. Or vice versa. This will expose the real shortcomings and provide a distinct list of features to implement to give them all as much functionality as possible.