Signup/Sign In

Scraping Topic Names from Studytonight Tutorial Webpage

NOTE: This tutorial is just for educational purpose and we request the reader to not use the code to carry out harm to the website in any form whatsoever.

Every page of tutorials on the Studytonight website have a content section where all the content is listed and a left sidebar where the list of topics is listed.

In this tutorial, as part of our web scraping project, you will have to scrape the list of topics from the current page, and print all the topic names.

Here is what we will be scraping from this tutorial page:

Studytonight Sidebar tutorial topics

Let the scraping begin...

## importing bs4, requests and fake_useragent modules
import bs4
import requests
from fake_useragent import UserAgent

## initializing the UserAgent object
user_agent = UserAgent()
url = "https://www.studytonight.com/python/web-scraping/scraping-studytonight-tutorial-sidebar"

## getting the reponse from the page using get method of requests module
page = requests.get(url, headers={"user-agent": user_agent.chrome})

## storing the content of the page in a variable
html = page.content

By this step, we already have the complete source code for the webpage stored in our variable html. Now let's create BeautifulSoup object. You can even try and run the prettify method.

## creating BeautifulSoup object
soup = bs4.BeautifulSoup(html, "html.parser")

We have also created the BeautifulSoup object, now what? How do we know which tag to find and extract from the HTML code. Should we search HTML code for it? No way!

Now it's time to use the Developer's tool.

Open the Developer Tools(in chrome browser) by pressing F12 key if you are using Windows and Option + Command + I if you are a Mac user.

Click on the top-left corner button and hover the mouse above the sidebar to find HTML tags which are used to make the list of tutorial topics.

Now as this is a project for you guys to complete, here are a few hints for your help.

  1. The sidebar has List elements created using the HTML tag <li>
  2. The List items which are headings have a class main for styling.
  3. All the tutorial topics are below the list item with class main, until the next list item with class main is encountered.
  4. You should first find out all the <li> HTML tags, store them in a list.
  5. Then look for <li> HTML tags with class as main, store them in a list too.
  6. The additional <li> tags in the first list are the tutorial topics.

It sounds a bit tricky, but you guys can do this. If you face any problem while coding the solution, you can ask your question on Studytonight's Q & A Forum.