ASA Connect

 View Only
  • 1.  Statistical Challenges of AI

    Posted 07-11-2016 11:01

    Hi all!

         Some of you may be aware of new White House efforts to reach out to the scientific community regarding Artificial Intelligence. The Office of Science and Technology Policy has formed a series of workshops exploring opportunities brought forth by AI , and the National Science and Technology Council has formed a Subcommittee on Machine Learning and Artificial Intelligence, which will continue to monitor new developments and advances in the field.

        The ASA wants to make sure the important role of statistics in AI is recognized. Specifically, we would like to know what important statistical challenges need to be addressed. We would also appreciate suggestions for framing such a broad question. Please respond below, or send me your ideas at nussbaum@amstat.org. Thanks in advance!


    ------------------------------
    Amy Nussbaum
    Science Policy Fellow
    American Statistical Association
    ------------------------------


  • 2.  RE: Statistical Challenges of AI

    Posted 07-21-2016 05:38
    Edited by Christian Graf 07-21-2016 05:38

    Good morning Amy,

    about 2 years ago I became engaged in what is called machine learning for the first time.
    I was really surprised that I knew already many of the methods presented on the topic - not from other applied fields, but from my graduate education in numerics and statistics, which lies more than 20 years back.

    From my point of view, a large part of the methods applied in AI and machine learning are methods traditionally used in statistics & numerics and therefore in physics, astronomy, biology and so on. Many of the pioneers in modern informatics where mathematicians or natural scientists and also introduced their methods there. So, it is hardly surprising that almost (surely) all basic methods (and many advanced as well) used in AI originally come from numerics, statistics and their applied fields.

    To name some prominent methods now (also) tagged as machine learning (supervised and unsupervised):

    • Regression methods (eg logistic regression)
    • Gradient descent (supervised learning)
    • Clustering and classification methods
    • All kinds of Markov chain models 
    • Bayesian methods

    What is new and fascinating in machine learning: The amazing scale and creativity in combining well known methods and developing new ones from these. This has already set a new horizon for statistics.

    Hope this helps & best regards,
    Christian

    ------------------------------
    Christian Graf
    Dipl.-Math.
    Qualitaetssicherung & Statistik

    "To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of."

    Ronald Fisher in 'Presidential Address by Professor R. A. Fisher, Sc.D., F.R.S. Sankhyā: The Indian Journal of Statistics (1933-1960), Vol. 4, No. 1 (1938), pp. 14-17'



  • 3.  RE: Statistical Challenges of AI

    Posted 07-29-2016 15:55

    I'm not sure how best to get the message out to non-statisticians--or to many statisticians for that matter--but regarding AI, we should essentially claim credit.  Perhaps it's worth recounting the broad arc of AI history.  There were computers, and then computer scientists developed ways to easily encode complex deterministic logic algorithms and there was great hope for AI, and then it fizzled.  Research funding dried up in the so-called "AI winter".  In recent years we've had a remarkable thaw.  Why?  The application of models that comprehend that observed data comes from noisy processes.  That's statistical in nature, even if the people developing the models don't call themselves statisticians.

    To your original question, i.e., statistical challenges, one general topic could be "What types of AI problems cannot be addressed by probabilistic modeling methods, if any?"

    Another statistical challenge is assessing the performance of AI methods on a theoretical basis, and if possible explaining performance, is of particular concern to statisticians.  What practices tend to work better than others for certain classes of problems?  What are those classes of problems?  I can think of two examples where statistical methodological research has yielded benefits.  First, it's my understanding that Dr. Jerome Friedman took it upon himself to figure out why "boosting" works.  He found that it optimizes a criterion similar to likelihood.  Why not use likelihood itself?  Thus was gradient boosting born.

    Second, I recall seeing an article (I don't have references in front of me at the moment) proving that for every neural net that is fitted without weight decay, there exists a neural net fitted with non-zero weight decay that dominates it.  Further, weight decay can be seen as a roughness penalty, and also as approximately Bayesian.  In these examples, statistical (decision-theoretic) research applied to computer science ideas yielded both improved performance and increased knowledge about why or how these models work.

    ------------------------------
    Jim Garrett, PhD
    Sr. Assoc. Dir. of Biostatistics
    Novartis