CLOSE

Matplotlib  Python  Data Analysis
Technology    Machine Learning

# Visual Representation of Large Dataset using Matplotlib

AUGUST 2, 2019   by SmritiS

In my previous post, we discussed a few performance metrics and the situations in which they can be used. In today's post, we will explore a popular 2-dimensional plotting library known as Matplotlib.

For reference purposes, the official documentation of Matplotlib version 3.1.1 (stable) can be found here- https://matplotlib.org/3.1.1/users/index.html

## What is Matplotlib?

Matplotlib is a huge library that has around 70,000 lines of code. It is inspired by MathWorks's software MATLAB. It is built on NumPy and caters to work on SciPy. It is one of the most powerful plotting libraries to work in Python and Numpy. It is usually seen that we tend to remember pictures and images more than words. On the same lines, it is easier to grab insights from visually represented data in comparison to data which is stored in the formats of Excel, CSV, Word or any form of text documents. This is where plotting libraries come into the picture.

In my previous posts, I have visualized the data using Matplotlib so that it would be easy to understand the data, the abnormalities (anomalies), and trends in the data. Matplotlib can be used to represent line plots, bar plots, histograms, scatter plots and much more.

This library can be installed with the following command:

``pip install matplotlib``

And to use the library in your python code, use the following statement to import the module,

``````import matplotlib.pyplot as plt

# or

from matplotlib import pyplot as plt``````

It has several parts to it, namely:

1. `Figure()`: It encompasses the entire visualization, which contains the x and y-axes.

2. `Axes()`: These are the plots, the X, Y and Z axes (Z axis in certain cases).

3. `Axis()`: This contains the numbers and help in generating the limits of the graph.

4. `Artist()`: The visual objects which are tied to axes, such as 'Text', 'Line2D', and 'collection'.

5. `Xticks()`, `yticks()`: The two components of matplotlib, that are used to label the tick points of the x and y-axis.

6. `Legend()`: The component which helps in naming the observation variables.

When a graph is plotted, the x and y axes are adjusted to take the default `xticks()` and `yticks()`, but these values can be customized as per one's requirements.

We will start by using `pyplot`, which is a matplotlib module that will help us add plot elements such as lines, images, text to the axes.

``````import matplotlib.pyplot as plt
import numpy as np

plt.plot([2,4,6,8], [3,1,5,0])
plt.title('My first plot')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.show()``````

Output of the Code: NOTE: Always remember to run the `plt.plot()` function and the `plt.show()` function together.

### Explanation of the above code:

In the above code we have imported the necessary packages to plot a graph. We invoke the `plot()` function by passing 2 arrays to it and then calling the `show()` function to invoke the graph plotting. The first array will be plotted against the x-axis and the second array will be plotted against the y-axis. The title and names for x and y-axis can be added using the functions `title()`, `xlabel()` and `ylabel()`.

## Using `figure()` method:

``````import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize = (8,5))
plt.plot([2,4,6,8], [1,2,3,4])
plt.show()```
```

Output of the Code: ### Explanation of the above code:

In this code, the method `figure()` has been used to specify the size of the plot. Here the parameters to the `figure()` include the length and breadth of the plot, i.e 8 is the length and 5 is the breadth of the plot.

## Colour of the plot and line

In the `plot()` function, a third parameter can be specified, which is a string that can be used to indicate the color and line type of the plot. For example: the default is `b`, which is a solid blue line. Other colors can be go (green), red, yellow, orange.

``````import matplotlib.pyplot as plt
import numpy as np

plt.figure(figsize = (8,5))
plt.plot([2,4,6,8], "green")
plt.title('My first plot')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.show()``````

Output of the Code: Your next question might be, What if I wish to plot more than one visualization in a single graph?

The answer is, Yes, it is possible. The below code example shows the same:

In the code example above. Try uncommenting `plt.show()` code from one line and comment in the other two, you will see different plot every time. Another illustration of how multiple plots can be represented in one single plot:

``````import matplotlib.pyplot as plt
import numpy as np

# Sample data generation
x = np.linspace(0, 2 * np.pi, 400)
y = np.sin(x ** 2)
fig, axs = plt.subplots(2)

fig.suptitle('Vertically stacked subplots')
axs.plot(x, y)
axs.plot(-x, -y)
plt.show()```
```

### Explanation of the above code:

`subplots()` is used to create 2 plots and `suptitle()` method helps create a centralized title for the figure.

The subplot() method takes three arguments, they are `nrows`, `ncols` and `index`. They indicate the number of rows, number of columns and the index number of the sub-plot.

In the next post, we will see how different types of visualizations (graphs) such as bar graph, pie chart, histogram, scatter plots and 3-D plotting can be represented. 