Setting your Machine for Learning Big Data

When starting to learn big data, the biggest problem most people face is the lack of a place to try things out. This becomes a hindrance in learning and eventually people lose hope and put an end to the learning attempts. So, I have decided to document the steps needed to get started on this journey. 

In this series of posts, I will walk you through the steps needed to setup your machine (a Macbook in my case) with tools needed for learning big data technologies. I will start with a simple post about setting up the bash profile for your Macbook. Then I will install Hadoop services and move on to install Hive and Presto using the same Hadoop namenode. The inspiration here is a set of wonderful posts from Keith, but unfortunately not all the steps work anymore. So, I will be documenting all the steps which worked for me (as of June 2019).


You can follow all these posts here:

Setting up the .bash_profile 


By default, OS X will not create a bash profile file for you. In this post, I will document the steps needed to create this file. We will be using this a lot as we need to keep editing the path after every install, so make sure you understand what you are doing here.

  1. Start Terminal
  2. Type "cd ~/" to go to your home directory
  3. Create a new file (either vi .bash_profile or touch .bash_profile)
  4. Edit this file to add your path values. E.g. I will add couple of lines in my file to set my JAVA_HOME and SCALA_HOME

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_201.jdk/Contents/Home/
export SCALA_HOME=/usr/local/opt/scala/idea

(To find out the location of your libraries you can use "brew info scala" and then cross check the path in the output by actually going there)

     5. Save this file
     6. Load this file to reload the Terminal "source ~/.bash_profile
     7. You can check if the environment variables were updated or not by printing any one of the path variables

> echo $SCALA_HOME


This concludes the introductory post of this series of posts. In the next post, I will setup Apache Hadoop and set things up for the bigger and better things to come. Do share your feedback.


Comments

Popular posts from this blog

Uber Data Model

Data Engineer Interview Questions: SQL

Cracking Data Engineering Interviews

Hive Challenges: Bucketing, Bloom Filters and More

UBER Data Architecture