General-purpose programming language.
Python is a powerful language for data analysis, largely due to its extensive ecosystem of libraries. Two of the most fundamental libraries for data analysis and visualization are Numpy and Matplotlib. This article will provide an overview of these libraries and their capabilities.
Numpy, short for Numerical Python, is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
At the core of the Numpy package, is the ndarray object. This encapsulates n-dimensional arrays of homogeneous data types, with many operations being performed in compiled code for performance. Numpy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). Changing the size of an ndarray will create a new array and delete the original.
Numpy provides a host of operations you can perform with and on arrays. You can reshape an array, compute the transpose of an array, find the index of elements, etc. Arithmetic operations on arrays are element-wise operations. For example, multiplication of two arrays results in an array that is the element-wise multiplication of the elements.
Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but functions that construct arrays usually also include an optional argument to explicitly specify the datatype.
Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension Numpy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK.
Matplotlib's pyplot
submodule provides the plot()
function which is frequently used to plot lines and markers to a figure. The show()
function is used to display the figure to the user. The figure()
function in the pyplot
module of matplotlib is used to create a new figure.
Matplotlib allows you to adjust every aspect of your plot. You can change the line width, line style, marker style, font size, font style, text color, etc. You can also add labels to the x-axis and y-axis and add a title to the plot.
Matplotlib's pyplot
submodule provides functions to generate multiple plots and subplots in the same figure. The subplot()
function is used to add multiple plots in one figure. The subplots()
function in pyplot module of matplotlib library is used to create a figure and a set of subplots.
By the end of this unit, you should have a solid understanding of how to manipulate arrays using Numpy and how to create and customize plots using Matplotlib. These skills are fundamental to any data analysis task in Python.