Big Data Things

Posts

Showing posts from May, 2019

Controlling Executors and Cores in Spark Applications

May 30, 2019

In this post, I will cover the core concepts which govern the execution model of Spark. I will talk about the different components, how they interact with each other and what happens when you fire a query. I will also take few examples to illustrate how Spark configs change these behaviours. Let's talk about the architecture first. Each Spark application (instance of SparkContext) starts with a Driver program. The Driver talks to the Cluster Manager( YARN, Mesos, Kubernetes etc.) to demand the resources it needs. Executors (being one such resource) are processes which run computations and stored data for applications. Each application gets its own executor processes on Worker Nodes. Effectively, it means that data can't be shared across different applications without writing to a storage layer. Also, executors typically run for the lifetime of a Spark application and run multiple tasks over its lifetime (parallelly and sequentially). What happens when you submit