NETWORK PROGRAMMING IN PYTHON

Implementing Web Scraping

Let's try to extract some data from the e-commerce giant, Amazon. Let's search for "Protien Bars" and related products, and then we will scrape data from the search results that we get.

Example of Web Scraping


Above we have a screenshot of the webpage, with the search results. Now the first step will be to indentify the HTML tag which holds the data that we want to scrape.

  • For Item Name: Right click on Product Name → Inspect element
  • For Item Price: Right click on Product Price → Inspect element

For item price below HTML tag has been used:

<span class="a-size-base a-color-price a-text-bold">

So, inside span tag we have to look for class attribute with value a-size-base a-color-price a-text-bold.

Similarly for item name, following HTML tag is used:

<h2 class="a-size-medium s-inline s-access-title a-text-normal" ...>
    ITEM_NAME
</h2>

Now, let's code write the program/script for extracting the data.


Program/Script for Web Scraping

#!usr/bin/env python


import requests
from bs4 import BeautifulSoup

# url of the search page
url = "http://www.amazon.in/s/ref=nb_sb_ss_i_4_8?url=search-alias%3Daps&field-keywords=protein+bars&sprefix=protein+%2Caps%2C718&crid=1SW4WFJE8O22T&rh=i%3Aaps%2Ck%3Aprotein+bars"


r = requests.get(url)			# get the search url using requests
soup = BeautifulSoup(r.content)	# create a BeautifulSoup object 'soup' of the content

# Item Name
i_name = soup.find_all("h2",{"class": "a-size-medium s-inline  s-access-title  a-text-normal"})

#'find_all' method is used to find the  matching criteria as mentioned in parenthesis

# Item Price
i_price = soup.find_all("span",{"class": "a-size-base a-color-price a-text-bold"})


# Now print Item name and price
# 'zip' is used to traverse parallely to both name and price
for name,price  in zip(i_name,i_price):
	print "Item Name: " +name.string
	print "Item Price:" +price.text
	print '-'*70

Covering all the technicalities and features of BeautifulSoup module in a single tutorial is impossible. So, we will recommend you to read official documentation here.

Note: Here you might get confused as the price of some products are not getting displayed correctly. This is because the class name which we have used here for price extraction is different for some items(which are in offer). So you need to change the class name for such items.

Example of Web Scraping


So now you know how to scrape data from any website. Although BeautifulSoup module does provide a lot of other functionalities too, but using the above script/program, you can easily scrape data from any website.

Remember the 2 steps: Identify the HTML tag and then use the program to scrape.