In my previous post, we discussed a few performance metrics and the situations in which they can be used. In today's post, we will explore a popular 2-dimensional plotting library known as Matplotlib.
For reference purposes, the official documentation of Matplotlib version 3.1.1 (stable) can be found here- https://matplotlib.org/3.1.1/users/index.html
Matplotlib is a huge library that has around 70,000 lines of code. It is inspired by MathWorks's software MATLAB. It is built on NumPy and caters to work on SciPy. It is one of the most powerful plotting libraries to work in Python and Numpy.
It is usually seen that we tend to remember pictures and images more than words. On the same lines, it is easier to grab insights from visually represented data in comparison to data which is stored in the formats of Excel, CSV, Word or any form of text documents. This is where plotting libraries come into the picture.
In my previous posts, I have visualized the data using Matplotlib so that it would be easy to understand the data, the abnormalities (anomalies), and trends in the data. Matplotlib can be used to represent line plots, bar plots, histograms, scatter plots and much more.
This library can be installed with the following command:
pip install matplotlib
And to use the library in your python code, use the following statement to import the module,
import matplotlib.pyplot as plt # or from matplotlib import pyplot as plt
It has several parts to it, namely:
Figure(): It encompasses the entire visualization, which contains the x and y-axes.
Axes(): These are the plots, the X, Y and Z axes (Z axis in certain cases).
Axis(): This contains the numbers and help in generating the limits of the graph.
Artist(): The visual objects which are tied to axes, such as 'Text', 'Line2D', and 'collection'.
yticks(): The two components of matplotlib, that are used to label the tick points of the x and y-axis.
Legend(): The component which helps in naming the observation variables.
When a graph is plotted, the x and y axes are adjusted to take the default
yticks(), but these values can be customized as per one's requirements.
We will start by using
pyplot, which is a matplotlib module that will help us add plot elements such as lines, images, text to the axes.
import matplotlib.pyplot as plt import numpy as np plt.plot([2,4,6,8], [3,1,5,0]) plt.title('My first plot') plt.xlabel('X axis') plt.ylabel('Y axis') plt.show()
Output of the Code:
NOTE: Always remember to run the
plt.plot() function and the
plt.show() function together.
In the above code we have imported the necessary packages to plot a graph. We invoke the
plot() function by passing 2 arrays to it and then calling the
show() function to invoke the graph plotting. The first array will be plotted against the x-axis and the second array will be plotted against the y-axis. The title and names for x and y-axis can be added using the functions
import matplotlib.pyplot as plt import numpy as np plt.figure(figsize = (8,5)) plt.plot([2,4,6,8], [1,2,3,4]) plt.show()
Output of the Code:
In this code, the method
figure() has been used to specify the size of the plot. Here the parameters to the
figure() include the length and breadth of the plot, i.e 8 is the length and 5 is the breadth of the plot.
plot() function, a third parameter can be specified, which is a string that can be used to indicate the color and line type of the plot. For example: the default is
b, which is a solid blue line. Other colors can be go (green), red, yellow, orange.
import matplotlib.pyplot as plt import numpy as np plt.figure(figsize = (8,5)) plt.plot([2,4,6,8], "green") plt.title('My first plot') plt.xlabel('X axis') plt.ylabel('Y axis') plt.show()
Output of the Code:
Your next question might be, What if I wish to plot more than one visualization in a single graph?
The answer is, Yes, it is possible. The below code example shows the same:
In the code example above. Try uncommenting
plt.show() code from one line and comment in the other two, you will see different plot every time. Another illustration of how multiple plots can be represented in one single plot:
import matplotlib.pyplot as plt import numpy as np # Sample data generation x = np.linspace(0, 2 * np.pi, 400) y = np.sin(x ** 2) fig, axs = plt.subplots(2) fig.suptitle('Vertically stacked subplots') axs.plot(x, y) axs.plot(-x, -y) plt.show()
subplots() is used to create 2 plots and
suptitle() method helps create a centralized title for the figure.
The subplot() method takes three arguments, they are
index. They indicate the number of rows, number of columns and the index number of the sub-plot.
In the next post, we will see how different types of visualizations (graphs) such as bar graph, pie chart, histogram, scatter plots and 3-D plotting can be represented.
Following are some of the advantages of using the Matpotlib module:
Large datasets can be easily visualized
Variety of formats to visualize in
Helps understand trends and make correlations