Setting Up Presto On Your Machine

This is the fourth post in this series, geared up to making the reader self-sufficient for learning big data applications. In the previous posts of the series, we have installed Hadoop, Hive on the same namenode and MySQL metastore for Hive. In this post, we will build on the same setup and install Presto to use the same HDFS and Hive metastore. Let's get started then.

Prerequisites

Installing Presto

Again, we will use Homebrew to install Presto on our machine. Run brew install presto on your terminal.

Configuring Presto

You will need to edit following files ( at location etc/ ):

node.properties
config.properties
log.properties
jvm.config

Export Environment Variables

Add the following to your .bash_profile file and restart terminal (or source it):

export PRESTO_VERSION=0.221
export PRESTO_HOME=/usr/local/Cellar/presto/0.221/libexec
export PRESTO_CONF_DIR=$PRESTO_HOME/etc
export PRESTO_CATALOG_DIR=$PRESTO_CONF_DIR/catalog
export PATH=$PATH:$PRESTO_HOME/bin

Make sure that your Presto version is the same as what you have installed.

Edit config.properties

This file contains the configurations which controls the port on which the Presto server will run. Change the http-server.http.port and discovery.uri in this file.

http-server.http.port=9988

discovery.uri=http://localhost:9988

Edit node.properties

Next, we need to set the data dir for Presto. Create a folder and set that value in this file:

node.data-dir=/Users/kautukp/Documents/learning/presto/data

Starting Presto Server

Now that the configurations are done, we need to start our Presto server. Use presto-server start to start Presto services.


Similarly, if you want to stop the Presto server, just run presto-server stop.

Presto UI

You can go to localhost:9988 to see the Presto UI. This should look something like this:


Any query running on the Presto server can be tracked here using the filters. Also, this UI is useful for monitoring the server status and load on the server.

Using Hive Metastore

Now to the good part, we will set your hive metastore (that we installed in the last post) as a catalog to be queried from Presto. This allows you to query HDFS over SQL. You can use any JDBC supported SQL client( like SQL Workbench/J) to connect to and run SQL queries directly on hive tables.

Go to $PRESTO_CATALOG_DIR and create a file named hive.properties . Enter the following in the file and save it,

connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083 

Now, to make sure these changes are picked up by Presto, stop and start the Presto server (or just restart it).

Testing Hive Connectivity

To test whether our Presto server is able to connect to Hive or not, open hive CLI and create a new database/ schema. Go to the new schema and create a test table,


Now connect to the Presto CLI using 

presto --server localhost:9988 --catalog hive --schema presto_test

and then fire show tables, you should see the hive table you just created.


You can also see the same query on the Presto UI.

So, with this working demo, I conclude this post. I hope this was helpful for readers in setting up their machines to learn and work on big data. Do share your feedback and thoughts in the comments below. 




Comments

Popular posts from this blog

Uber Data Model

Data Engineer Interview Questions: SQL

Cracking Data Engineering Interviews

Hive Challenges: Bucketing, Bloom Filters and More

UBER Data Architecture