Signup/Sign In

Visual Representation of Large Dataset using Matplotlib

Posted in Machine Learning   LAST UPDATED: JUNE 27, 2023

    In my previous post, we discussed a few performance metrics and the situations in which they can be used. In today's post, we will explore a popular 2-dimensional plotting library known as Matplotlib.

    For reference purposes, the official documentation of Matplotlib version 3.1.1 (stable) can be found here- https://matplotlib.org/3.1.1/users/index.html


    What is Matplotlib?

    Matplotlib is a huge library that has around 70,000 lines of code. It is inspired by MathWorks's software MATLAB. It is built on NumPy and caters to work on SciPy. It is one of the most powerful plotting libraries to work in Python and Numpy.

    Matpotlib module for graph plotting

    It is usually seen that we tend to remember pictures and images more than words. On the same lines, it is easier to grab insights from visually represented data in comparison to data which is stored in the formats of Excel, CSV, Word or any form of text documents. This is where plotting libraries come into the picture.

    In my previous posts, I have visualized the data using Matplotlib so that it would be easy to understand the data, the abnormalities (anomalies), and trends in the data. Matplotlib can be used to represent line plots, bar plots, histograms, scatter plots and much more.

    This library can be installed with the following command:

    pip install matplotlib

    And to use the library in your python code, use the following statement to import the module,

    import matplotlib.pyplot as plt
    
    # or
    
    from matplotlib import pyplot as plt

    It has several parts to it, namely:

    1. Figure(): It encompasses the entire visualization, which contains the x and y-axes.

    2. Axes(): These are the plots, the X, Y and Z axes (Z axis in certain cases).

    3. Axis(): This contains the numbers and help in generating the limits of the graph.

    4. Artist(): The visual objects which are tied to axes, such as 'Text', 'Line2D', and 'collection'.

    5. Xticks(), yticks(): The two components of matplotlib, that are used to label the tick points of the x and y-axis.

    6. Legend(): The component which helps in naming the observation variables.

    When a graph is plotted, the x and y axes are adjusted to take the default xticks() and yticks(), but these values can be customized as per one's requirements.

    We will start by using pyplot, which is a matplotlib module that will help us add plot elements such as lines, images, and text to the axes.

    import matplotlib.pyplot as plt
    import numpy as np
    
    plt.plot([2,4,6,8], [3,1,5,0])
    plt.title('My first plot')
    plt.xlabel('X axis')
    plt.ylabel('Y axis')
    plt.show()

    Output of the Code:

    matpotlib code example 1

    NOTE: Always remember to run the plt.plot() function and the plt.show() function together.

    Explanation of the above code:

    In the above code we have imported the necessary packages to plot a graph. We invoke the plot() function by passing 2 arrays to it and then calling the show() function to invoke the graph plotting. The first array will be plotted against the x-axis and the second array will be plotted against the y-axis. The title and names for x and y-axis can be added using the functions title(), xlabel() and ylabel().


    Using figure() method:

    import matplotlib.pyplot as plt
    import numpy as np
    
    plt.figure(figsize = (8,5))
    plt.plot([2,4,6,8], [1,2,3,4])
    plt.show()
    

    Output of the Code:

    Matpotlib example code 2

    Explanation of the above code:

    In this code, the method figure() has been used to specify the size of the plot. Here the parameters to the figure() include the length and breadth of the plot, i.e. 8 is the length and 5 is the breadth of the plot.


    Color of the plot and line

    In the plot() function, a third parameter can be specified, which is a string that can be used to indicate the color and line type of the plot. For example the default is b, which is a solid blue line. Other colors can go (green), red, yellow, orange.

    import matplotlib.pyplot as plt
    import numpy as np
    
    plt.figure(figsize = (8,5))
    plt.plot([2,4,6,8], "green")
    plt.title('My first plot')
    plt.xlabel('X axis')
    plt.ylabel('Y axis')
    plt.show()

    Output of the Code:

    matpotlib code eample 3

    Your next question might be, What if I wish to plot more than one visualization in a single graph?

    The answer is, Yes, it is possible. The below code example shows the same:

    In the code example above. Try uncommenting plt.show() code from one line and comment in the other two, you will see a different plot every time. Another illustration of how multiple plots can be represented in one single plot:

    import matplotlib.pyplot as plt
    import numpy as np
    
    # Sample data generation
    x = np.linspace(0, 2 * np.pi, 400)
    y = np.sin(x ** 2)
    fig, axs = plt.subplots(2)
    
    fig.suptitle('Vertically stacked subplots')
    axs[0].plot(x, y)
    axs[1].plot(-x, -y)
    plt.show()
    

    Explanation of the above code:

    subplots() is used to create 2 plots and suptitle() method helps create a centralized title for the figure.

    The subplot() method takes three arguments, they are nrows, ncols and index. They indicate the number of rows, the number of columns, and the index number of the sub-plot.

    In the next post, we will see how different types of visualizations (graphs) such as bar graphs, pie charts, histograms, scatter plots, and 3-D plotting can be represented.


    Advantages of Matplotlib

    Following are some of the advantages of using the Matpotlib module:

    • Large datasets can be easily visualized

    • Variety of formats to visualize in

    • Helps understand trends and make correlations

    Conclusion

    Visualizing large datasets using Matplotlib opens up a world of possibilities for data exploration and analysis. By harnessing the capabilities of this powerful library, you can effectively communicate complex information and unlock valuable insights. In this article, we have explored various techniques to handle the challenges posed by large datasets, optimizing performance and enhancing scalability.

    From choosing the appropriate plot types and managing memory efficiently to utilizing interactivity and customization, you now have the tools to craft visually appealing and informative plots.

    Frequently Asked Questions(FAQs)

    1. How can I visualize large datasets using Matplotlib?

    Matplotlib offers various plot types and customization options to visualize large datasets effectively. Techniques like data sampling, aggregation, and interactivity can be employed to handle scalability and enhance plot performance.

    2. What are some recommended plot types for visualizing large datasets?

    Depending on the nature of your data, scatter plots, line plots, and bar plots are often suitable for visualizing large datasets. These plot types provide an overview of the data distribution, trends, and relationships.

    3. How can I optimize performance when plotting large datasets with Matplotlib?

    To optimize performance, consider techniques such as data sampling, reducing the number of plotted points, utilizing plot interactivity, and employing backend optimizations like using the "agg" backend for faster rendering.

    4. Can I customize the appearance of plots created with Matplotlib?

    Absolutely! Matplotlib provides extensive customization options, allowing you to tailor the appearance of your plots to suit your needs. You can adjust colors, markers, labels, axes, and much more to create visually appealing and informative visualizations.

    5. Are there any alternatives to Matplotlib for visualizing large datasets?

    While Matplotlib is a powerful library for data visualization, there are alternative libraries such as Seaborn, Plotly, and Bokeh that offer additional features and interactivity for handling large datasets. Exploring these libraries can provide alternative approaches to visualize your data effectively.

    You may also like:

    About the author:
    Hi, My name is Smriti. I enjoy coding, solving puzzles, singing, blogging and writing on new technologies. The idea of artificial intelligence and the fact that machines learn, impresses me every day.
    Tags:matplotlib
    IF YOU LIKE IT, THEN SHARE IT
     

    RELATED POSTS