This tutorial covers all the basic functions of numpy module of python. If you're here for learning numpy from scratch or just for revising topics, this tutorial is perfect for you. Anybody familiar or non-familar with numpy can read this post. After completing this tutorial, you will find yourself in a good position and level in numpy and you can either dig further into this amazing library, jump to matplotlib, or start with machine learning using whatever you have learned.
Without any further delay, let's get started!
Numpy also known as Numerical Python is the most used library for machine learning and data science. It's not only the popular one but also very powerful, easy and helpful library for all the data science enthusiasts and scientists. A lot of mathematical and logical computation and operations can be easily done with the help of numpy library. In short, numpy is used for mathematical operations and most importantly for creating N-dimensional array(we'll talk about this later).
The numpy library combined with pandas and matplotlib libraries is used for playing around with data like visualization of data, analysis of data, building machine learning models, etc.
As we specified above that Numpy is used for creating an N-dimensional array, the question arises, why we need it when we have list type object in Python using which we can perform various operations on data like slicing etc? Well, the answer to this obvious question is speed.
Numpy's computational power and speed is far better than default python lists. While working with large datasets, speedy computations are desirable. The numpy library functions are very easy to use and gives the result faster. Also, when it comes to space utilization, numpy functions takes less space than lists while execution. And it is more convenient for vector and matrix manipulations.
Unlike lists, numpy array's size remains fixed and even if the size of numpy array is increased the original array gets deleted and new array is created, in other words, no unecessary copies are stored.
List is an in-built data structure in python whereas Numpy is a third party library but fortunately it is an open source library hence anyone can use it and contribute to it's codebase, which is also a positive point in favour of this amazing library.
To import numpy use the keyword import
.
import numpy as np
Adding the above statement to your python script will import the numpy library to it but the library should be installed or available in your python installation. If not, you can do so by using the pip
/pip3
command:
>> pip3 install numpy
N-dimensional arrays or Ndarrays are the reason why numpy is so popular. Numpy's ndarrays or n-dimensional arrays are used to create an array or matrix of N-Dimension. The dimension of an array is known as rank. It creates an array of N-dimension with each element of same data type. The shape of the array represents the number of elements across each dimension. Numpy's arrays are created by simply using np.array()
function.
To create an n dimensional array we use the np.array()
function, in which we can provide the data just like we do in python list, you can see it in the code example below.
arr1 = np.array([1,2], [3,4], [5,6])
The elements in a numpy array are accessed in a similar manner as we do in python lists, i.e., via square brackets and index of element. All the elements of numpy array have to be of same data type. For example,
arr1[0] # will give the element at index 0 of a one dimensional array
arr2[1][4] # will give the element at index 1,4 i.e. at row 1 and column 4
To get the rank(dimension) of our array we can use ndim
function. For example,
arr1.ndim
To get the shape(number of columns and number of rows) of the array we can use shape function. For example,
arr.shape
Now that you know about the methods used, let's try our hands with a code example,
The np.array()
function have additional optional arguments which can be used to provide more information for the array. Some of the optional arguements are:
dtype
: By explicitly mentioning the datatype we can create arrays of that particular datatype with each element of the same data type.
ndim
: Specifies minimum dimensions of resultant array.
order
: It specifies whether the array is column major (with 'F' value) or row major (with 'C' value). By default it's value is any (with 'A' value).
Let's take an example specifying some of the additional parameters while calling the np.array()
function:
import numpy as np
arr3=np.array([1,2,3,4,5], ndmin=2, dtype=complex)
print(arr3)
Output:
[[1.+0.j 2.+0.j 3.+0.j 4.+0.j 5.+0.j]]
reshape()
and shape()
functionAfter creating a one dimensional array what if we want to convert it into a two dimensional array? In that case we can use the reshape
function of the numpy module. The reshape function is used to change the shape of any existing array. It takes the values of new shape desirable as arguments.
import numpy as np
# creating one dimensional array
arr1 = np.array([1,2,3,4,5])
print(arr1)
# converted arr1 which is 1-D array to 2-D array
arr1 = arr1.reshape(5,1)
print(arr1)
Output:
[1 2 3 4 5] [[1] [2] [3] [4] [5]]
We can also reshape the array using the shape
function of numpy module. Let's use it to shape the array back to one dimensional:
# converting arr1 back to 1-D array again
arr1.shape = (5,)
flatten()
and ravel()
functionsA multi-dimensional array can be flattened using flatten()
and ravel()
functions of the numpy module. Irrespective of the dimension and shape, the new array formed using the flatten()
and ravel()
function is a 1-Dimenrional array.
Both these functions work in the same manner, the only difference is that the ravel function is actually a reference to the parent array, that is , any changes done to the array created using ravel function is reflected back in the orginal/parent array. Whereas the flatten function on the other hand creates a copy of the original array.
But ravel() function is memory efficient since it does not create a copy of the original array. Let's see a code example to see both the functions in action:
Numpy Arrays can be sliced to create subarrays. Slicing is done using colons in square brackets of the array [:]
just like we do it in python.
Lets say, we have an array arr = [1,2,4445,4657,767,878,86]
and we want elements upto index 4. To slice the array we can simple use the code arr[:5]
and we will get elements of the array starting from beginning upto 5th index, and not the element at 5th index and above.
In other words, the statement arr[:5]
will slice the original array to create a sub-array having elements at indices 0,1,2,3,4. The index to the left of the :
indicate the starting index of subarray and the index present at right of the colon indicates the ending index.
Basically slicing syntax is:
array[start_index : end_index]
Did you notice, the slicing operation above created a subarray upto 4th index, while I wrote :5
and not :4.
Well the answer is, the number to the right of the colon is exclusive and number before the colon is inclusive. It is an open interval. Imagine slicing as [ : ).
Let's understand this with help of a few examples:
array = np.array([5,51,35,6,59,98,74,56]) # new array created
# creating sub-array starting from index 0 and ending at index 5.
sub_array = array[:6] # 6 is exclusive
print(sub_array)
Output:
[ 5 51 35 6 59 98]
What if we need a sub-array not starting from index 0 but from somwhere between which goes till end? Well we can do that too,
sub_array2 = array[4:] # creating sub_array starting from index 4 and going till end
print(sub_array2)
Output:
[59 98 74 56]
What about starting from somewhere between and ending before the last index? Can do that too,
sub_array3 = array[2:6] # sub_array starting from index 2 and ending at index 5
print(sub_array3)
Output:
[35 6 59 98]
To exclude only the last element we can use the code arr[:-1]
We can reverse the whole array with just one line of code .To reverse the whole array we can use the code arr[::-1]
Let's have a quick example demonstrating the above tricks:
print(array[:-1]) # excluded last element 56
print(array[::-1]) # reversing the array with just one line of code
Output:
[ 5 51 35 6 59 98 74] [56 74 98 59 6 35 51 5]
To slice a 2-D numpy array, we just need to include one more colon for the columns. Say we are slicing a 2-D array arr
, we will write,
arr[start_index row:end_index row , start_index column:end_index colummn]
In this case also, the index after the colon is exclusive, i.e. not included. Let's take an example:
We can reverse the 2-D array too in the similar manner using the code [::-1]
# try this and run to see the output in above terminal
print(arr[::-1]) # reversing only rows
print(arr[::-1,::-1]) # reversing both rows and columns
Now, that you are perfectly comfortable with slicing operation, there's an important point to be noted, when subarrays are created, they are not the copies of original array, infact it is just a part of original array and modifying the subarray would reflect changes in the original array too. Let us try and change the element at 1,1 of sub_array3
and see what happens to original array arr
.
print("Original array", arr) # before changing anything
print("\nElement at 1,1", sub_array3[1,1])
# changing element at 1,1
sub_array3[1,1] = 51
print("\nElement at 1,1", sub_array3[1,1])
print("\nOriginal array is updated". arr)
Try running this code in the terminal above and you will see the element which had value 676 earlier changes to 51 in the parent array even when the change was made in the subarray. So,we need to be very careful before modifying the subarrays too.
We can use the copy
function while creating the subarrays. For example while creating sub_array3
, we can use the following code:
sub_array3 = arr[2:4,1:3].copy()
Thanks to the copy()
function , now we can easily modify our subarray,
sub_array3 = arr[2:4,1:3].copy()
sub_array3[1,1] = 74
print("Original array", arr) # see parent array is not affected this time by modifying the sub_array
Try running the above code to see if the original array is affected or not.
rand()
, randn()
and randint()
Numpy comes with multiple in-built random functions. Random numbers can be generated using these functions. These functions are randint
, randn
and rand
.
NOTE: To use rand()
, randn()
and randint()
we need to provide the complete package name, like this: np.random.randn()
etc, where np.random
is the class which has these functions.
rand()
: This function returns an array of shape mentioned explicitly, filled with random values.
randn()
: This function returns an array of shape mentioned explicitly, filled with values from standard normal distribution.The values are always between 0 and 1.
randint()
: This function returns an array of shape mentioned explicitly, filled with random integer values. If size
keyword is not explicitly mentioned this function will give a random integer value between the range mentioned(start index and end index) instead of the array. To get an array of random integer values we need to create array as np.random.randint(start_index, end_index, size=(m,n))
.
empty()
: It is not a random function but I have included it in this list for a reason.When we don't have values to initialize our array, we use empty()
function to create an array of mentioned shape but, since we didn't have any values to initialize the array, it is initialized with random values.
Let us dive deeper using examples,
arange()
functionThe arange(start,stop,step)
function returns an array with sequence of numbers between the mentioned intervals. Out of the given arguements only stop is mandatory and by deafult optional arguments are start=0 and step=1. This function can also be used to create an array in numpy.
print(np.arange(10))
print(np.arange(5,20,2))
Output:
[0 1 2 3 4 5 6 7 8 9] [ 5 7 9 11 13 15 17 19]
arange()
vs linspace()
functionWe used the arange()
function to generate a sequence of numbers between the start and stop index with step mentioned(1 by default). But what if we are concerned about the number of elements?
For example: np.arange(1,55,1)
would generate an array of 55 elements starting from 1 and ending at 55 but what if we want only 10 numbers between 1 and 55?
We have to calculate a step function for it, manually creating step function might give us a headache so we ca take advantage of numpy's great function linspace()
, using which we can create the array of as many elements as we want, without an explicit calculating step function. Below we have specified the syntax of the linspace()
function:
np.linspace(start_index, end_index, no_of_elements, dtype)
And let's take an example as well,
linspace_array1 = np.linspace(1,55,num=10) # by default dtype=int
print("Default int type array", linspace_array1)
linspace_array2=np.linspace(20,75,num=15,dtype=float) # explicitly mentioning dtype as float
print("\nFloat type array", linspace_array2)
arange()
It's very simple, to generate the reverse counting using the arange function we just need to appropriately set the start and step index. For example: to print counting from 50 to 1 in reverse order:
print(np.arange(50,1,-1))
ones()
, zeroes()
and eye()
In numpy we can create arrays filled with identical values for dummy tasks like array with every element with value 1 or 0 using the following functions:
np.ones()
: creates an array of shape specified filled completely with 1's.
np.zeros()
: creates an array of specified shape filled completely with 0.
np.eye()
: We can also create an identity matrix using the eye()
function. An identity matrix consists of diagonal elements equal to 1 and 0 elsewhere.
Below we have a simple code example to demonstrate this,
print(np.ones((3, 1))) # 3 x 1 matrix filled with one
print(np.zeros((2, 1))) # 2 X 1 matrix filled with zero
print(np.eye(4, 4)) # 4 X 4 identity matrix
unique()
and count()
We can get the count of unique items in an array by using unique()
function. In an array consisting duplicate values, unique function will return all the unique elements. To get the number of times duplicate elements have occured we can add optional parameter return_counts=True
within unique()
function.
The count()
function when used separately gives us an array telling how many times a substring has occured. For example, if we have an array of strings and we want to count how many times a substring (say aa) is present in that array, we can use the count()
function for this.
Using the count() function returns an array with count for the string in each element of the numpy array.
When we don't have any value to enter at any index of array, we can use NaN
or inf
to fill the vacant place.
NaN
: Not a Number is used to fill the missing values using np.nan
inf
: This is used to fill the missing values with infinity using np.inf
arr1 = np.array([1,2,np.nan,5,6,8,np.inf])
print("Array1 is", arr1)
# we can also explicitly set values as NaN and inf
arr2 = np.random.randn(5,2)
arr2[2,1] = np.nan
arr2[4,1] = np.inf
print("Array2 is", arr2)
Output:
Array1 is [ 1. 2. nan 5. 6. 8. inf] Array2 is [[-1.99064682 -0.1090853 ] [ 0.49331139 0.5788719 ] [ 0.63429376 nan] [ 0.14265484 -0.05095396] [-1.31894955 inf]]
We can use various mathematical functions to perform mathematical operations on numpy array, here are some of the most widely used functions:
max()
: We can get the maximum value out of an array using ndarray.max()
min()
: We can get the minimum value out of an array using ndarray.min()
mean()
: We can calculate mean value of an array using ndarray.mean()
sin()
/cos()
: We can use np.sin()
and np.cos()
functions for performing trignometric operations.
Let's see these functions in action in the code below,
If we have multiple arrays and we want to concatenate/stack them, we can use 3 different methods for it:
np.concatenate([x,y],axis)
: Simply passing the arrays as arguments in this function will return one array having elements of both the arrays. We can even use axis=0
or axis=1
to concatenate the arrays column or row wise respectively.
np.hstack()
, np.vstack([x,y])
: One of my favourite methods, intuitively hstack
will stack the arrays horizontally(row-wise) and vstack
will stack the elements vertically(column-wise).
np.r_[x,y]
and np.c_[x,y]
: np.r[]
will concatenate elements of array row-wise and np.c[]
will concatenate elements of array column-wise.
Let's take an example,
import numpy as np
x = np.random.randint(1,10,size=(2,3))
y = np.random.randint(11,20,size=(2,3))
print('Using Concatenate function:', np.concatenate([x,y],axis=1))
print('Using hstack function:', np.hstack([x,y]))
print('Using vstack function:', np.vstack([x,y]))
print('Using np.r_ function:', np.r_[x,y])
print('Using np.c_ function:', np.c_[x,y])
You can try running this code in the terminal above to see it in action.
Numpy has extremely useful function called datetime64
which is used to show the current date and time. We can easily calculate yesterday's and tomorrow's date too using another function timedelta64
with datetime64
function. Below we have simple code example where we have used these functions,
today = np.datetime64('today')
print("Today's date is:",today)
yesterday = np.datetime64('today') - np.timedelta64(1)
print("Yesterday's date :",yesterday)
tomorrow = np.datetime64('today') + np.timedelta64(1)
print("Tomorrow's date :",tomorrow)
np.meshgrid()
function for VisualizationThis function is extremely useful in visualization. Some of the matplotlib's functions like contourf plots requires numpy's meshgrid to plot on. The purpose of meshgrid is to create a rectangular grid out of an array of x values and an array of y values so that we have each integer point for all the (x,y) points.
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(-6, 6, 0.1)
y = np.arange(-6, 6, 0.1)
x_, y_ = np.meshgrid(x, y, sparse=True)
z = np.sin(x_**2 + y_**2) / (x_**2 + y_**2)
plt.contourf(x,y,z)
Congratulations! you are now familiar with some of the most useful functions of numpy module of python. After following this tutorial carefully, you will now be able to use these functions by yourself. Please leave your comments on how useful this post is for you and don't foregt to share with your friends.