Discussion: View Thread

  • 1.  Data Mining Software (open source)

    Posted 10-31-2011 18:09
    This message has been cross posted to the following eGroups: Statistical Consulting Section and Statistical Learning and Data Mining Section .
    -------------------------------------------

    Has anyone used "RapidMiner"?
    http://rapid-i.com/

    The software has an "open source" version, and an "enterprise version" (for a license fee).

    • I'd appreciate comments pro or con about the software. 
    You are welcome to send comments to my private email.

    -------------------------------------------
    Chris Barker, Ph.D.
    President - San Francisco Bay Area Chapter of the American Statistical Association
    www,barkerstats.com
    -------------------------------------------


  • 2.  RE:Data Mining Software (open source)

    Posted 11-01-2011 07:44
    I have not used that particular software.   Since before there were specific data mining packages I have done things that in later years came to be called data mining.  I now use SPSS although in the early seventies I also used numerous ad hoc FORTRAN programs.

    Some data mining terminology tasks have older names in the statistics world.
    Unsupervised learning is like cluster analysis or pattern detection.
    Supervised learning is like discriminant function analysis, categorical regression, or pattern recognition.
    Feature selection is like factor analyis or sometimes stepwise variable selection.

    I have not used more than 30 million cases but with many stat packages the limit is based on your disk storage. I did this on a home brew quad cpu 64-bit machine with 8 GB RAM and 350GB available disk.

    Depending what you want to do you may be able to do it with software that you are already using.


    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------