Setting Up Presto On Your Machine
This is the fourth post in this series, geared up to making the reader self-sufficient for learning big data applications. In the previous posts of the series, we have installed Hadoop, Hive on the same namenode and MySQL metastore for Hive. In this post, we will build on the same setup and install Presto to use the same HDFS and Hive metastore. Let's get started then.
Prerequisites
Installing Presto
Again, we will use Homebrew to install Presto on our machine. Run brew install presto on your terminal.
Configuring Presto
You will need to edit following files ( at location etc/ ):
node.properties
config.properties
log.properties
jvm.config
Export Environment Variables
Add the following to your .bash_profile file and restart terminal (or source it):
export PRESTO_VERSION=0.221
export PRESTO_HOME=/usr/local/Cellar/presto/0.221/libexec
export PRESTO_CONF_DIR=$PRESTO_HOME/etc
export PRESTO_CATALOG_DIR=$PRESTO_CONF_DIR/catalog
export PATH=$PATH:$PRESTO_HOME/bin
Make sure that your Presto version is the same as what you have installed.
Edit config.properties
This file contains the configurations which controls the port on which the Presto server will run. Change the http-server.http.port and discovery.uri in this file.
http-server.http.port=9988
discovery.uri=http://localhost:9988
Edit node.properties
Next, we need to set the data dir for Presto. Create a folder and set that value in this file:
node.data-dir=/Users/kautukp/Documents/learning/presto/data
Starting Presto Server
Now that the configurations are done, we need to start our Presto server. Use presto-server start to start Presto services.
Similarly, if you want to stop the Presto server, just run presto-server stop.
Presto UI
You can go to localhost:9988 to see the Presto UI. This should look something like this:
Any query running on the Presto server can be tracked here using the filters. Also, this UI is useful for monitoring the server status and load on the server.
Using Hive Metastore
Now to the good part, we will set your hive metastore (that we installed in the last post) as a catalog to be queried from Presto. This allows you to query HDFS over SQL. You can use any JDBC supported SQL client( like SQL Workbench/J) to connect to and run SQL queries directly on hive tables.
Go to $PRESTO_CATALOG_DIR and create a file named hive.properties . Enter the following in the file and save it,
connector.name=hive-hadoop2
hive.metastore.uri=thrift://localhost:9083
Now, to make sure these changes are picked up by Presto, stop and start the Presto server (or just restart it).
Testing Hive Connectivity
To test whether our Presto server is able to connect to Hive or not, open hive CLI and create a new database/ schema. Go to the new schema and create a test table,
Now connect to the Presto CLI using
presto --server localhost:9988 --catalog hive --schema presto_test
and then fire show tables, you should see the hive table you just created.
You can also see the same query on the Presto UI.
So, with this working demo, I conclude this post. I hope this was helpful for readers in setting up their machines to learn and work on big data. Do share your feedback and thoughts in the comments below.
Previous Post: Setting up Apache Hive on your Machine
Comments
Post a Comment