Python's mechanize Library

Generally, a users can either view a website using a browser or by viewing the source code using a number of different methods and tools; the Linux program wget is a popular method. If you want to open a website using Python, the only way to browse the Internet is to retrieve and parse the website's HTML source code. In this tutorial, we'll learn how to use Mechanize Library for this purpose.

To use the mechanize library, download it's tar.gz file from here. Extract the tar file and install it using python setup.py install

Mechanize's primary class, Browser, allows the manipulation of anything that can be manipulated inside a browser. Let's see an example to view source code of a website using Mechanize Library:

mech1.py

#!usr/bin/env python
#Program to view source code using mechanize

import mechanize

def page_view(url):
	try:
		#create browser object
		browser = mechanize.Browser()

		#browser.set_handle_robots(False)
		page = browser.open(url)
		src_code = page.read()
		#print source code
		print src_code  	
	except:
		print "Error in browsing..."

url = "http://www.syngress.com/"
page_view(url)

Output:

Now, in the script mech1.py change the url to https://www.google.com. What do you see? "Error in browsing..." Now let's analyse the error closely. Remove the try & except statement from the above code and try to execute the code again. Oops! It still didn't work, but this time you will see the detailed error. You must be seeing the error message stating:

As we can see in the error, there is something about the robots.txt file. Do you know what a robots.txt file is? Using this file, any website can inform the search engines like Google, Bing etc to crawl or not to crawl any webpage. Hence, if you have a website, and you don't want Google to crawl any particular webpage(might be for internal usage), then you can specify that in the robots.txt file.

Now, coming on to the problem. So the above error is raised because the website is preventing our browser to visit their webpages. So, what should we do? We instruct our mechanize browser object to ignore the website parsing for robots file. In order to do that, simply uncomment the following line in mech1.py: browser.set_handle_robots(False)

Now, if you visit Google.com, you can view something like below:

Python MCQ Tests

Best Python questions to crack job interview.

Python Tutorial

Complete Python tutorial for Beginners with examples, programs and projects.

C TUTORIAL

C PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

C++ TUTORIAL

C++ PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

PYTHON TUTORIAL

PYTHON HOW TOS

INTERVIEW TESTS

EXECUTE CODE

JAVA TUTORIAL

JAVA CODE EXAMPLES

SPRING TUTORIAL

MORE IN JAVA

COMPUTER ARCHITECTURE

COMPUTER NETWORK

OPERATING SYSTEM

DBMS & SQL

PL/SQL

MongoDB

EXECUTE SQL

ANDROID DEVELOPMENT

GO LANGUAGE

LINUX

DOCKER

HTML TAGS (A to Z)

CSS REFERENCES

SASS/SCSS

KOTLIN

GAME DEVELOPMENT

PHP

GIT GUIDE

JAVASCRIPT

ADVANCED DSA

Introduction & Basics

Start with Network Analysis

Practical Application

Python's mechanize Library

Python MCQ Tests

Python Tutorial