A Beginner’s Quick Guide to NumPy

A Beginner's Guide to NumPy

To get started on bigger projects, relying on just Python can sometimes get messy and hard to follow. Data scientists and analysts are typically using more powerful tools for fine-tuned results and this is where the NumPy library comes in. NumPy (short for numerical python) is the basis for data structures and algorithms needed for scientific applications in Python, and serves as the fundamental library for Pandas, SciKit-learn, and Matplotlib. So, if it’s the most basic, why bother? Well, getting a strong foundation in the basics can build an even greater understanding of the scientific process of data analysis. Let’s get started!

Installing & Importing NumPy

Like with most of the cool things in Python, NumPy requires a bit of installation. The fastest (and easiest) way is installing Anaconda which includes NumPy (and many other libraries) by default. If you don’t wish to install Anaconda but still want NumPy’s numerical processing, you can always download it in Python by using pip install numpy or following instructions here.

After you have installed, NumPy will be accessible when imported into an Integrated Development Environment (IDE) like IPython or JupyterLab. These IDEs provide a shell where you can visually verify that your data manipulation is working correctly. To import run the following:

Just a quick note: you will need to run NumPy for it to work with your commands each time you open your notebook. In this guide, we are using JupyterLab as the preferred IDE.

Getting Started with NumPy

Creating a NumPy Array

NumPy works with multi-dimensional arrays. Arrays function like a list of objects and can be organized in a single vector fashion (a simple list) or a matrix (a table). NumPy can create arrays or organize and filter through large Datasets which are typically a series of matrices. NumPy arrays are faster to create than typing individual lists, so we already begin to see benefits of the library.

Creating an array is simple:

You can also write this in a different way

Create a range of numbers in a list:

You can also create a matrix:

To get a list of random numbers:

Reminder: NumPy arrays are running on Python script, so when creating a range, the last object in will not be included in the outcome (below).

Array Indexing & Manipulation

Now that you have a list or a matrix you may want to organize and isolate specific information. Basic NumPy slice syntax has the optional arguments [start, stop, step]. Your start will be where the index begins, stop will be where it ends, and step is a number (greater than zero) that will be added to your start.

To get the first element in an array:

Or the last:

Just the ones in the middle:

Sorting through your array:

Reversing the order:

Logic Functions & Boolean Arrays

A Boolean statement is essentially True or False. We can utilize the Boolean argument across large data sets if you would like to see what stands out from your typical values. You can also utilize them to mask for desired elements.

Adding up your values:

Checking if one or more values are True with np.any():

Or if all values are True with np.all():

Masking for specific elements in your array:

Statistical Arrays

For most projects, summary statistics are going to play a role in analyzing your data.

To get a sum or your elements:

To get the mean:

Or maybe just the sum of one column:

Now you have a good base for getting started with NumPy. Of course, the best way to get a deeper understanding is to get started with your own project. You can also look at some tried and true resources like numpy.org for more details on the math behind the library. For the time being, feel free to use this as your guide and make sure to leave a comment below if you find it helpful!

Leave a comment