Find HTML Tags using BeautifulSoup

In this tutorial we will learn about searching any tag using BeautifulSoup module. We suggest you to go through the previous tutorials about the basic introduction to the BeautifulSoup module and the tutorial covering all the useful methods of the BeautifulSoup module.

We have already learned different methods to traverse the HTML tree like parent, parents, next_sibling, previous_sibling etc. But it becomes difficult to find all the similar tags using those methods. So, now we will learn how to find any pariculat HTML tag using teh find and find_all method of the BeautifulSoup module.

If you are coming from the last tutorial, we will be using the same HTML code, if you are new here, please create a file sample_webpage.html and copy the following HTML code in it:

<!DOCTYPE html>
<html>
    
    <head>
        <title> Sample HTML Page</title>
        <style>
            * {
                margin: 0;
                padding: 0;
            }

            div {
                width: 95%;
                height: 75px;
                margin: 10px 2.5%;
                border: 1px dotted grey;
                text-align: center;
            }
              
            p {
                font-family: sans-serif;
                font-size: 18px;
                color: #000;
                line-height: 75px;
            }

            a {
                position: relative;
                top: 25px;
            }
        </style>
    </head>
    
    <body>
        <div id="first-div">
            <p class="first">First Paragraph</p>
        </div>

        <div id="second-div">
            <p class="second">Second Paragraph</p>
        </div>

        <div id="third-div">
            <a href="https://www.studytonight.com">Studytonight</a>
            <p class="third">Third Paragraph</p>        
        </div>

        <div id="fourth-div">
            <p class="fourth">Fourth Paragraph</p>        
        </div>

        <div id="fifth-div">
            <p class="fifth">Fifth Paragraph</p>        
        </div>
    </body>
</html>

To read the content of the above HTML file, use the following python code to store the content into a variable:

## reading content from the file
with open("sample_webpage.html") as html_file:
    html = html_file.read()

Once we have read the file, we create the BeautifulSoup object:

import bs4

## reading content from the file
with open("sample_webpage.html") as html_file:
    html = html_file.read()
    
## creating a BeautifulSoup object
soup = bs4.BeautifulSoup(html, "html.parser")

And the process of web scraping begins...

BeautifulSoup: `find_all` method

find_all method is used to find all the similar tags that we are searching for by prviding the name of the tag as argument to the method. find_all method returns a list containing all the HTML elements that are found. Following is the syntax:

find_all(name, attrs, recursive, limit, **kwargs)

We will cover all the parameters of the find_all method one by one. Let's start with the name parameter.

`find_all`: name Parameter

Let's find all the p tags from the HTML code:

import bs4

## reading content from the file
with open("sample_webpage.html") as html_file:
    html = html_file.read()
    
## creating a BeautifulSoup object
soup = bs4.BeautifulSoup(html, "html.parser")

## finding all p tags
p_tags = soup.find_all("p")     

print(p_tags)

print("\n-----Class Names Of All Paragraphs-----\n")

for tag in p_tags:
    print(tag['class'][0])
    
print("\n-----Content Of All Paragraphs-----\n")

for tag in p_tags:
    print(tag.text)

[First Paragraph, Second Paragraph, Third Paragraph, Fourth Paragraph, Fifth Paragraph] -----Class Names Of All Paragraphs----- first second third fourth fifth -----Content Of All Paragraphs----- First Paragraph Second Paragraph Third Paragraph Fourth Paragraph Fifth Paragraph

As you can see, not only we can find the tags, but we can also find all the information related to those tags.

`find_all`: attribute Parameter

Let's find all the tags from the HTML code who have the attribute class equals to link(this code is after we have created the soup object in the above code snippet):

## finding using class name
link_class_tags = soup.find_all(class_="link")

print(link_class_tags)

print("----------")

for tag in link_class_tags:
    print(tag.name)

<a href="https://www.google.com">Google</a> ---------- a

Note the syntax for providing the class attribute with an underscore(_), you must follow that.

`find_all`: Tags containing any string

We can use find_all method to find all the HTML tags containing a given string. As the method find_all expects a regular expression to search, hence in the code example below we have used the re module of python for generating a regular expression.

## finding tags using a string

## importing regular expression module to find all the strings
import re

## defining an re variable which contains "Paragraph" text
s = re.compile("Paragraph")

## finding all the content of the tags which contains "Paragraph"
tags_containing_paragraph = soup.find_all(string=s)

print(tags_containing_paragraph)

['First Paragraph', 'Second Paragraph', 'Third Paragraph', 'Fourth Paragraph', 'Fifth Paragraph']

While writing the above code, keep the import re statement at the top along with import bs4 statement.

`find_all`: limit Parameter

The limit parameter is used to limit the resultset. When provided a limit the find_all method only returns the tags equal to the given limit, other qualifying tags are not included in the list returned.

## finding the first p tag using limit parameter
first_p_tag = soup.find_all("p", limit=1)

print(first_p_tag)

First Paragraph

You can use multiple parameters together like we did in this example.

BeautifulSoup: `find` method

find method is used to find the first matching tag. It is similar to passing limit=1 parameter value to the find_all method.

Let's take an example:

p_tag = soup.find("p")

print(p_tag)
print("----------")
print(p_tag.text)

First Paragraph ---------- First Paragraph

one more example,

a_tag = soup.find("a")

print(a_tag)
print("----------")
print(a_tag.text)
print("\n")
print(a_tag['href'])

<a href="https://www.studytonight.com">Studytonight</a> ---------- Studytonight https://www.studytonight.com

And with that we have learned web scraping using BeautifulSoup module. We have covered all the important and useful methods, but there are many more. If you want to dig in deep, check the BeautifulSoup documentation.

In the next tutorial we will scrape a website.

Python Interview Tests

Best Python questions to crack job interview.

Python Tutorial

Best Python tutorial for Beginners to learn Python with examples, programs and projects.

C TUTORIAL

C PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

C++ TUTORIAL

C++ PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

PYTHON TUTORIAL

PYTHON HOW TOS

INTERVIEW TESTS

EXECUTE CODE

JAVA TUTORIAL

JAVA CODE EXAMPLES

SPRING TUTORIAL

MORE IN JAVA

COMPUTER ARCHITECTURE

COMPUTER NETWORK

OPERATING SYSTEM

DBMS & SQL

PL/SQL

MongoDB

EXECUTE SQL

ANDROID DEVELOPMENT

GO LANGUAGE

LINUX

DOCKER

HTML TAGS (A to Z)

CSS REFERENCES

SASS/SCSS

KOTLIN

GAME DEVELOPMENT

PHP

GIT GUIDE

JAVASCRIPT

ADVANCED DSA

Getting Started

More About BeautifulSoup

Advanced

Find HTML Tags using BeautifulSoup

BeautifulSoup: find_all method

find_all: name Parameter

find_all: attribute Parameter

find_all: Tags containing any string

find_all: limit Parameter

BeautifulSoup: find method

Python Interview Tests

Python Tutorial

BeautifulSoup: `find_all` method

`find_all`: name Parameter

`find_all`: attribute Parameter

`find_all`: Tags containing any string

`find_all`: limit Parameter

BeautifulSoup: `find` method