Posts

Showing posts from April, 2019

Uber Data Model

Image
In this post I will try to come up with a data model which can serve the requirements of ride sharing companies like Uber, Lyft, Ola etc. We will approach the problem as an interview and see how we can come up with a feasible data model by answering important questions. Important Entities The first step towards building a data model is to identify important actors/ entities involved in the process. In our case, if we think about our interaction with taxi apps, we can identify important entities involved. The user (i.e. Rider) is one such entity, so is the Driver/ Partner . Once we open the app, we try to book a trip by finding a suitable taxi/ cab from a particular location to another . After the trip gets finished, the app collects the payment and we are done . Ideally, the flow continues to reviews/ ratings, helpcenter in case of issues etc. but for this post we will only consider scenarios till the ride gets finished. So, to summarize, we have the following key entities;

Data Engineer Interview Questions: SQL

Image
In this post, I will try to share some actual questions asked by top companies for Data Engineer positions. A lot of these companies will cover data modelling as one of the rounds and will use the data model for the next round based on SQL queries. Q1: Find the number of drivers available for rides in any area at any given point of time. Q2: Do you consider Driver and Rider as separate entities? Why or why not? Q3: Give me all passenger names who used the app for only airport rides. Q4: How will you decide where to apply surge pricing? Q5: How will you calculate wait times for rides? Q6: A driver can ride multiple cars, how will you find out who is driving which car at any moment? Q7: Find out Rank without using any function. Q8: How will you delete duplicates from a table? Q9: How will you find percentile? Q10:  You have 3 tables, user_dim (user_id, account_id), account_dim (account_id, paying_customer), and dload_facts (date, user_id, and downloads), find the ave

UBER Data Architecture

Image
In this post, I will try to cover the data architecture Uber has built to support their big data applications. This should be applicable for other ride hailing apps as well. There are multiple modules at play here so I will try to give a brief overview along with a detailed discussion on the data architecture.  An aeroplane view of the problem tells us that effectively we are trying to solve a demand vs. supply problem. All the Drivers active at any point of time constitute the supply while all the cab requesters (Riders) form the demand. What Uber tries to do is to have the best matching between demand and supply "at any point at any moment". I have highlighted the location and temporal conditions as they are critical for the success of what Uber tries to do. Let's get started to see how Uber comes around these challenges by creating a scalable data architecture. Every Driver active on Uber keeps sending his location data to the server (e.g. every 5 seconds). T