Machine Learning

SEPTEMBER 14, 2019
by **SmritiS**

In my previous post, we explored the **Matplotlib** python plotting library which is used for visualization of datasets by plotting graphs. We also saw various attributes which can be used to modify the plots and explored various line graphs. In today's post, we will discuss other different type of graphs: **bar graph**, **pie** **chart**, **histogram**, **scatterplot** as well as **3-D** **plotting**.

It is a type of visualization that helps in representing categorical data. It has rectangular bars (hence the name, bar graph) that can be represented horizontally and vertically.

In the code above, two lists have been defined so as to plot the graph for popular programming languages against the number of people using them. These lists are hypothetical and haven't been taken from any surveys whatsoever (hence ignore the numerical values).

The first function `plot_bar_vertically()`

shows a vertical graph and the second function `plot_bar_horizontally()`

shows the same data in a horizontal graph.

The only difference which has been made to represent the graph horizontally is the usage of `barh`

instead of `bar`

in the code.

Also known as a **circle chart** or **Emma chart**, it is used to represent **proportions **of data. The central angle and the area between each of the parts of the pie chart represent the quantity of data. It has been named as **pie chart** due to its resemblance to a piece of a pie.

Usually the data shown in a pie chart are in **percentages**.

```
import matplotlib.pyplot as plt
slices_usage = [2.5, 4, 1]
persons = ['Person A', 'Person B', 'Person C']
colors = ['r', 'b', 'y']
plt.pie(slices_usage, labels=persons, colors=colors, startangle=90, autopct='%.1f%%')
plt.show()
```

**Output:**

**Note:** Try running this code in the terminal above.

The calculation in the above code goes like this:

The slice usage is = [2.5, 4, 1]

Since the pie chart data is represented using percentages, take the sum of the slice usage, i.e **7.5**, now find out how much percentage of 7.5 is 2.5?

The same technique is used for all the other values in the slice usage list.

If we want, we can add labels to our pie chart,

```
daily_life = [8, 2, 9, 1, 4]
activities = ['Sleep', 'Activities-Commute', 'Work', 'Exercise', 'Family-Time']
plt.pie(daily_life, labels=activities, startangle=90, autopct='%.1f%%')
plt.title("An engineer's life")
plt.show()
```

**Output:**

It is also known as **scatter graph, scatter chart, scattergram** or **scatter diagram**.

It helps visualize the relationship between 2 or more variables. In addition to this, it helps in identifying **outliers**(**abnormalities**) which could be present in a dataset. This way, the exceptions in data could be better understood and the reason behind the same could be found out.

It uses Cartesian coordinates to display the values for two variables in a dataset. It is represented as a collection of points, wherein the value of one data point is visualized with respect to its pair in the dataset, horizontally and vertically, respectively.

```
import matplotlib.pyplot as plt
surprise_test_grades = [56, 90, 75, 89, 99, 45, 90, 100, 86, 64]
prepared_test_grades = [10, 92, 80, 48, 100, 48, 77, 99, 68, 77]
grades_range = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
plt.scatter(grades_range, surprise_test_grades, color='r')
plt.scatter(grades_range, prepared_test_grades, color='g')
plt.xlabel('Range')
plt.ylabel('Grades Scored')
plt.show()
```

**Output:**

This code has 2 lists, namely `surprise_test_grades`

and `prepared_test_grades`

. 10 students scored certain marks in the surprise test(`surprise_test_grades`

) and the same 10 students scored certain marks in the test for which they were prepared(`prepared_test_grades`

). These marks have been plotted with the help of scatter plot with respect to the `grades_range`

list, which defines a list of range.

It clearly shows outlier for a few students who have scored less in the prepared test in comparison to their score in the surprize test.

Histograms help in representing grouped data. The X-axis and the Y-axis represent the range and the frequencies respectively. The histogram is based on the area of the bar and not always the height of the bar. It usually represents the number of data attributes (y-axis values) in a particular range (x-axis values).

```
import matplotlib.pyplot as plt
import numpy as np
x = np.random.random_integers(1, 100, 7)
plt.hist(x, bins=11)
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.show()
```

**Output:**

In the code above, a random set of 7 integers between 1 and 100 is generated every time and sent to the `hist`

function. The **bins** in the `plt.hist`

represent the range within which data is present. After this, the x-axis and y-axis are given labels and this plot is displayed on the screen.

**Quick Note:** The value for **bin** has to be chosen in a way such that the histogram doesn't become too small or too large. If it is too small, it ends up showing too much individual data thereby missing out on the underlying pattern present in the data. On the other hand, if the **bin** value is too large, it dissolves the patterns in data and we end up observing nothing. Usually, the value of **bin** is in the range of **8 to 15**, but this isnâ€™t a hard and fast rule.

**mplot3d** is the library (that comes pre-installed with Matplotlib) that helps in the 3-dimensional representation of data. In the 3-D space, lines, as well as points, can be represented. The advantage of using 3 D plots is its ability to be viewed from different angles.

```
from mpl_toolkits import mplot3d
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
axis = plt.axes(projection="3d")
plt.show()
```

**Output:**

The above code shows how a simple 3-D projection can be represented.

Plotting a 3-D graph can be reduced into 3 steps:

**Step 1:** Generate points that will help make the surface for the 3 D plot. Define the points for **x**, **y** and define a function that uses **x** and **y** to calculate the **z** value.

```
fig = plt.figure()
ax = plt.axes(projection="3d")
def z_function(x, y):
return np.sin(np.sqrt(x ** 2 + y ** 2))
x = np.linspace(-5, 10, 20)
y = np.linspace(-5, 10, 20)
X, Y = np.meshgrid(x, y)
Z = z_function(X, Y)
fig = plt.figure()
```

**Step 2:** Plot a wireframe, which will help in the estimation of the surface for the 3-D plot.

```
ax = plt.axes(projection="3d")
ax.plot_wireframe(X, Y, Z, color='green')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
plt.show()
```

**Step 3:** Project the created surface on the plotted wireframe and extend the remaining points beyond their range.

```
ax = plt.axes(projection='3d')
ax.plot_surface(X, Y, Z, rstride=1, cstride=1,cmap='flag', edgecolor='none')
ax.set_title('3-D plot')
```

**Note:** The `cmap`

is basically the **color map** and the following link describes different Color codes in Matpotlib.

In the upcoming posts, we will look into other Machine Learning algorithms and their implementations using Python.