Amazon Price Tracker: A Simple Python Web Crawler

Have you ever wanted something but couldn't afford it at the regular price? Or did you ever think that something was a good-to-have but not really worth at the regular price? Have you ended up checking Amazon every few days for a price drop on the item? Well, now you don't need to. You can build a very simple Python program that does this for you. That's what we are going to do today, in this post. We will build a very basic web crawler, we will scrape the price of the item from the item page on Amazon and we will compare whether the price has dropped, increased or remained the same.

Prerequisites:

You need to install a couple of things to get started. You will need Python. Once, you have installed Python, you need to install a couple of libraries. If you are using a Mac, you can get Python installed using Homebrew,

>brew install python
>brew install python@2

Once you have installed Python, you can use pip to install requests and bs4.

>pip install requests
>pip install bs4

Go to the Google search box and type "my user agent". Copy the result, we will need it.



Now, we have everything that we need.

We will be using "BeautifulSoup" from bs4. It allows you to search for any html tag and its value from html content. Open the Amazon link of the item you want to track and select the price, right-click and click on "inspect element". Notice the span is : "pricelock_ourprice". We will need this as well.



Once we have the price, we need to truncate the Rupee symbol and convert it into a number (the scraped value is a string). 

In my current version, I am storing this value into a file. With each run, I read the previous price from the file into a variable, compare it with the current price sourced from Amazon and overwrite the file with the new value and save it. In a future version, we can store multiple values and build a history of the price. You can also implement a mailing feature, which sends you a mail alert if the price drops so that you can quickly order that thing that you always wanted.

You can trigger the file from your Terminal by calling the python shell:



You can get the full code from my github repository. Do share your feedback.





Comments

  1. Amazing blog!! Thank you for sharing information. Wonderful blog & good post.Its really helpful for me, waiting for a more new post. Keep Blogging!
    Amazon Price Tracker

    ReplyDelete
  2. The newHadoop-based databases are being adopted by some of the biggest players in the industry, such as Facebook, Yahoo, Google and others. In fact, Google and Yahoo recently announced new cloud-based services that will make it easier for the average company to tap into the big data held in their mountains of unstructured data.

    ReplyDelete

Post a Comment

Popular posts from this blog

Uber Data Model

Data Engineer Interview Questions: SQL

Cracking Data Engineering Interviews

Hive Challenges: Bucketing, Bloom Filters and More

UBER Data Architecture