ASA Connect

 View Only
  • 1.  How can I play with Hadoop

    Posted 01-03-2018 16:27
    I'm looking at a career change, and it looks like all the interesting industry jobs are now called "big data analyst" or "data scientist." I don't want to start another debate on that, but I have noticed that a lot of these jobs require Hadoop. It looks easy enough, and I'd like to say I have some familiarity with it. But I don't have access to the sort of the hardware that Hadoop typically would run on. Can I put something on my laptop and run it so I could brag that I know how to run Hadoop? Does that even make sense? If not, is there something out in the cloud running Hadoop that would allow anyone to play with their system for free?

    ------------------------------
    Stephen Simon, blog.pmean.com
    Independent Statistical Consultant
    P. Mean Consulting
    ------------------------------


  • 2.  RE: How can I play with Hadoop

    Posted 01-03-2018 16:36
    Stephen,
    > access to the sort of the hardware that Hadoop typically would run on. Can I put something on my laptop and run it > so I could brag that I know how to run Hadoop? Does that even make sense? If not, is there something out in the
    > cloud running Hadoop that would allow anyone to play with their system for free?

    What you are looking for is essentially running a server on a different,
    local machine. What you could do is install a copy of Virtualbox, find a
    copy of a preferred linux distribution (Ubuntu or RedHat) being two
    common ones, and then install it as a virtual image. From there, you can
    boot up a local copy, and then investigate something like "Running
    Hadoop on [preferred copy of Linux]" and follow through with that. The
    distribution of multiple clusters or cores is essentially what Hadoop is
    designed to handle "transparently" for you, and thus it should scale
    without issue.

    Then you can play around with Hadoop as needed.


    >
    > ------------------------------
    > Stephen Simon, blog.pmean.com
    > Independent Statistical Consultant
    > P. Mean Consulting
    > ------------------------------
    >
    >
    > Reply to Sender : http://community.amstat.org/eGroups/PostReply/?GroupId=2653&SenderKey=7ae5b310-1664-4375-8eaf-c200808aa91e&MID=45314&MDATE=756%253d456458&UserKey=20cfa160-4e0b-4e9c-a787-1b5754a90040&sKey=KeyRemoved
    >
    > Reply to eGroup : http://community.amstat.org/eGroups/PostReply/?GroupId=2653&MID=45314&MDATE=756%253d456458&UserKey=20cfa160-4e0b-4e9c-a787-1b5754a90040&sKey=KeyRemoved
    >
    >
    >
    > You are subscribed to "ASA Connect" as ljrhurley@gmail.com. To change your subscriptions, go to http://community.amstat.org/preferences?section=Subscriptions&MDATE=756%253d456458&UserKey=20cfa160-4e0b-4e9c-a787-1b5754a90040&sKey=KeyRemoved. To unsubscribe from this community discussion, go to http://community.amstat.org/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=20cfa160-4e0b-4e9c-a787-1b5754a90040&sKey=KeyRemoved&GroupKey=4061ce8b-1847-47dd-b1ae-357745003563.
    >




  • 3.  RE: How can I play with Hadoop

    Posted 01-04-2018 02:34
    Hadoop is just data processing optimized to run quickly on a large number of discs, in parallel.  The best way to process data on multiple storage units is through map/reduce, which is more of a computational strategy (start with the wikipedia article on mapReduce).  Map/Reduce with Hadoop can be used with C, Python or Java if you know one of these.
    Practically, Hadoop can be implemented with a higher-level languages that are less flexible than MapReduce but can save you programming time.  These are Pig and Hive (Apache Pig and Apache Hive) which are query languages.  Hive, in particular, is very similar to SQL.  I learned Hive since most people who process large datasets should know SQL (and SQL is pretty easy to learn).  O'Reilly has good books for both Pig and Hive.
    For computer implementation, Google Single Node Cluster Implementation Apache Hadoop.


    ------------------------------
    Ronald Barry
    Professor of Statistics
    Univ of Alaska Fairbanks
    ------------------------------



  • 4.  RE: How can I play with Hadoop

    Posted 01-04-2018 09:43

    Some of the Data Science boot camps may be quite useful.

     

    For example, here is a course for beginners offered through Coursera:

     

    https://www.coursera.org/learn/hadoop

     

    "With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment."

     






  • 5.  RE: How can I play with Hadoop

    Posted 01-05-2018 15:56
    Take a look at Amazon Web Services (AWS) free tier.

    ------------------------------
    Alan Forsythe
    ------------------------------



  • 6.  RE: How can I play with Hadoop

    Posted 01-08-2018 02:10
    If you want some basic Hadoop experience, with a bit of MapReduce as well, you can check out the free course at udacity.com. Then you could experiment with the virtual machine they set you up with.