CLOSE

   Matplotlib  Python  Data Analysis  
   Technology    Programming

Visual Representation of Large Dataset using Matplotlib

           
 AUGUST 2, 2019   by SmritiS

In my previous post, we discussed a few performance metrics and the situations in which they can be used. In today's post, we will explore a popular 2-dimensional plotting library known as Matplotlib.

For reference purposes, the official documentation of Matplotlib version 3.1.1 (stable) can be found here- https://matplotlib.org/3.1.1/users/index.html




What is Matplotlib?

Matplotlib is a huge library that has around 70,000 lines of code. It is inspired by MathWorks's software MATLAB. It is built on NumPy and caters to work on SciPy. It is one of the most powerful plotting libraries to work in Python and Numpy.

Matpotlib module for graph plotting

It is usually seen that we tend to remember pictures and images more than words. On the same lines, it is easier to grab insights from visually represented data in comparison to data which is stored in the formats of Excel, CSV, Word or any form of text documents. This is where plotting libraries come into the picture.

In my previous posts, I have visualized the data using Matplotlib so that it would be easy to understand the data, the abnormalities (anomalies), and trends in the data. Matplotlib can be used to represent line plots, bar plots, histograms, scatter plots and much more.

This library can be installed with the following command:

pip install matplotlib

And to use the library in your python code, use the following statement to import the module,

import matplotlib.pyplot as plt

# or

from matplotlib import pyplot as plt

It has several parts to it, namely:

  1. Figure(): It encompasses the entire visualization, which contains the x and y-axes.

  2. Axes(): These are the plots, the X, Y and Z axes (Z axis in certain cases).

  3. Axis(): This contains the numbers and help in generating the limits of the graph.

  4. Artist(): The visual objects which are tied to axes, such as 'Text', 'Line2D', and 'collection'.

  5. Xticks(), yticks(): The two components of matplotlib, that are used to label the tick points of the x and y-axis.

  6. Legend(): The component which helps in naming the observation variables.

When a graph is plotted, the x and y axes are adjusted to take the default xticks() and yticks(), but these values can be customized as per one's requirements.

We will start by using pyplot, which is a matplotlib module that will help us add plot elements such as lines, images, text to the axes.

import matplotlib.pyplot as plt
import numpy as np

plt.plot([2,4,6,8], [3,1,5,0])
plt.title('My first plot')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.show()

Output of the Code:

matpotlib code example 1

NOTE: Always remember to run the plt.plot() function and the plt.show() function together.


Explanation of the above code:

In the above code we have imported the necessary packages to plot a graph. We invoke the plot() function by passing 2 arrays to it and then calling the show() function to invoke the graph plotting. The first array will be plotted against the x-axis and the second array will be plotted against the y-axis. The title and names for x and y-axis can be added using the functions title(), xlabel() and ylabel().




Using figure() method:

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize = (8,5))
plt.plot([2,4,6,8], [1,2,3,4])
plt.show()

Output of the Code:

Matpotlib example code 2


Explanation of the above code:

In this code, the method figure() has been used to specify the size of the plot. Here the parameters to the figure() include the length and breadth of the plot, i.e 8 is the length and 5 is the breadth of the plot.




Colour of the plot and line

In the plot() function, a third parameter can be specified, which is a string that can be used to indicate the color and line type of the plot. For example: the default is b, which is a solid blue line. Other colors can be go (green), red, yellow, orange.

import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize = (8,5))
plt.plot([2,4,6,8], "green")
plt.title('My first plot')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.show()

Output of the Code:

matpotlib code eample 3

Your next question might be, What if I wish to plot more than one visualization in a single graph?

The answer is, Yes, it is possible. The below code example shows the same:

In the code example above. Try uncommenting plt.show() code from one line and comment in the other two, you will see different plot every time. Another illustration of how multiple plots can be represented in one single plot:

import matplotlib.pyplot as plt
import numpy as np

# Sample data generation
x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)
fig, axs = plt.subplots(2)

fig.suptitle('Vertically stacked subplots')
axs[0].plot(x, y)
axs[1].plot(-x, -y)
plt.show()

Explanation of the above code:

subplots() is used to create 2 plots and suptitle() method helps create a centralized title for the figure.

The subplot() method takes three arguments, they are nrows, ncols and index. They indicate the number of rows, number of columns and the index number of the sub-plot.

In the upcoming posts, we will see how different types of visualizations (graphs) such as bar graph, pie chart, histogram, scatter plots and 3-D plotting can be represented.




Advantages of Matplotlib

Following are some of the advantages of using the Matpotlib module:

  • Large datasets can be easily visualized

  • Variety of formats to visualize in

  • Helps understand trends and make correlations


SHARE YOUR THOUGHTS WITH US!