Introducing NumPy in Python for Data Science

Introduction

Numpy is a python library that provides computation on large array elements and matrices. Numpy provides fast and efficient processing on n-dimensional arrays.

Array elements in numpy are stored in contiguous memory locations so that the processing of elements is fast and efficient, unlike in a list where the elements are stored in a random memory location.

 

Installation of numpy

Make sure that your system has python and PIP installed already

$ pip install numpy

Creation of numpy array

First of all, we’ve to import the numpy module

import numpy as np

Here, the numpy module is imported as np means numpy is shortened to np.

 

One dimensional array

Let’s create a numpy array

arr = np.array([1,3,5,6,8,9])
print("Array Created: ", arr)

Output

Array Created:  [1 3 5 6 8 9]

 

Now, let’s see the type of arr we just created

print("Type of arr is: ", type(arr))

Output

Type of arr is:  <class 'numpy.ndarray'>

We can create one-dimensional array using arange() function

arr = np.arange(10)
print("Array created: ", arr)

Output

Array created:  [0 1 2 3 4 5 6 7 8 9]

Note: arange() function in numpy is similar to range() function in python

 

Now, let’s use a list to create an array using numpy

a = [1,2,3,4,5]
arr = np.array(a)
print("Array created from list: ", arr)

Output

Array created from list:  [1 2 3 4 5]

 

Two dimensional array

Previously we’ve seen how a one-dimensional array is created using the numpy module in python. Now, we’re going to see how a two-dimensional array is created using numpy

a  = [1,2,3,4,5]
b = [6,7,8,9,10]
arr = np.array([a, b])
print( arr)

Output

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]

This is how we can create a two-dimensional array in python using a list

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print( arr)

Output

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]

This is another method where we can give elements directly to create two dimensional array

 

The shape of an array

We can determine the shape of an array using shape() function

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print("Shape of array is: ", arr.shape)

Output

Shape of array is:  (2, 5)

The shape function returns the no of rows and columns present in an array. In this case, there are 2 rows and 5 columns

 

Dimension of an array

we can find the dimension of a numpy array

arr = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print("Dimension of array is: ", arr.ndim)

Output

Dimension of array is:  2

Let’s see another example

arr = np.array([1,2,3,4,5])
print("Dimension of array is: ", arr.ndim)

Output

Dimension of array is:  1
arr = np.array([[[1,2,3,4,5], [1,2,3,4,5]], [[1,2,3,4,5], [1,2,3,4,5]]])
print(arr)
print("Dimension of array is: ", arr.ndim)

Output

[[[1 2 3 4 5]
  [1 2 3 4 5]]

 [[1 2 3 4 5]
  [1 2 3 4 5]]]
Dimension of array is:  3

 

Size of an array

we can also check the size of a numpy array

a = [1,3,5,6,8,9,6,8]
arr = np.array(a)
print("size of arr is: ", arr.size)

Output

size of arr is:  8

The size function returns the number of elements in an array

arr = np.array([[[1,2,3,4,5], [1,2,3,4,5]], [[1,2,3,4,5], [1,2,3,4,5]]])
print("size of array is: ", arr.size)

Output

size of array is:  20

 

Accessing elements of an array

Array elements can be accessed using index same as list and tuple

a = [1,2,3,4,5]
arr = np.array(a)
print(arr[0])
print(arr[2])
print(arr[4])

Output

1
3
5

Elements can also be accessed using a loop

for ele in arr:
    print(ele)

Output

1
2
3
4
5

 

Let’s see an example of a 2-D array

arr = np.array([[1,2,3,4,5], [23, 45, 67 ,98, 100]])
print(arr[0][1])
print(arr[0][0])
print(arr[1][3])
print(arr[1][0])

Output

2
1
98
23

Explanation

In the above example, we passed two lists inside a single list where the first list index is 0 and another one index is 1 to make a 2-dimensional array with 2 rows and 5 columns.

Thus, if we have to access elements from the first row two indices must be passed i.e. the first index for the selection of row and the second index for selecting column. The first ‘print statement’ prints the value of first row and second column which is 2. Like as, third ‘print statement’ prints elements from the second row and fourth column which is 98.

 

Slicing array

Syntax

array_name[start : end : step]

Let’s see slicing of one dimensional array

a = [1,2,3,4,5]
arr = np.array(a)
print(arr[0:4])

Output

array([1, 2, 3, 4])

Index of array starts with 0 and end with one less than length of an array. Here in this example index starts from 0 and end with 4. We know the end index or upper bound is exclusive the above example retrieves the elements indexing 0 to 3.

print(arr[-5:-1])

Output

array([1, 2, 3, 4])

Indexing is assigned in negative as well. The negative indexing starts from -1 which is assigned to last element of array and ends with a negative length of an array.

Let’s see slicing of two dimensional array

Syntax

array_name[start_row : end_row: step_row, start_column : end_column: step_column]
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr)

Output

[[1 2 3]
 [4 5 6]
 [7 8 9]]

Now, let’s take elements from first two rows and last two columns

print(arr[0:2, 1:])

Output

[[2 3]
 [5 6]]

 

Lets take out last element of array

print(arr[2:, 2:])

Output

[[9]]

 

Let’s take out last two elements from the second row

print(arr[1:2, 1:])

Output

[[5 6]]

 

Reshaping an array

Using reshape() function, we can define a new array from a previously defined array

arr1 = np.array([1,2,3,4,5,6])
arr2 = arr1.reshape(3,2)
print(arr2)

Output

[[1 2]
 [3 4]
 [5 6]]
arr1 = np.array([1,2,3,4,5,6])
print("shape of arr1: ", arr1.shape, "\n")
arr2 = arr1.reshape(3,2)
print(arr2, "\n")
print("shape of arr2: ", arr2.shape)

Output

shape of arr1:  (6,) 

[[1 2]
 [3 4]
 [5 6]] 

shape of arr2:  (3, 2)

Here we’ve changed the shape of arr1. All we have to care about during reshaping is that the no of elements must be the same in both the new and previous array

 

arr2 = arr1.reshape(2,2)
print(arr2)

Output

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-70-a75024147a88> in <module>
      1 arr1 = np.array([1,2,3,4,5,6])
----> 2 arr2 = arr1.reshape(2,2)
      3 print(arr2)

ValueError: cannot reshape array of size 6 into shape (2,2)

Here, we got ValueError because we are trying to reshape an array having 6 elements to an array with 4 elements in it.

arr1 = np.array([[1,2,3,4,5,6], [7,8,9,10,11,12]])
arr2 = arr1.reshape(12,1)
print(arr2)

Output

[[ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]
 [11]
 [12]]

In both arrays, the number of elements is same just the shape is changed.

arr2 = arr1.reshape(4, 3)
print(arr2)

Output

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]

 

Appending & Inserting rows and columns in the array

Using append() function

Row wise appending

Syntax

np.append(previous_array, [array_to_be_add], axis =0)
a = np.array([20,21,22])
np.append(arr2,[a],axis=0)

Output

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12],
       [20, 21, 22]])

 

Column wise appending

Syntax

 np.append(previous_array, array_to_be_add, axis =1)
a = np.array([20,21,22,23])
b= a.reshape(4,-1)
np.append(arr2,b,axis=1)

Output

array([[ 1,  2,  3, 20],
       [ 4,  5,  6, 21],
       [ 7,  8,  9, 22],
       [10, 11, 12, 23]])

 

Using insert()function

Row wise inserting

Syntax

np.insert(previous_array, inserting_index, array_tobe_inserted, axis=0)
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
inserting_arr = np.array([11,12,13])
print("Before insertion: ")
print(arr)
print("after insertion at index 2: ")
print(np.insert(arr, 2, inserting_arr, axis=0))

Output

Before insertion: 
[[1 2 3]
 [4 5 6]
 [7 8 9]]
after insertion at index 2: 
[[ 1  2  3]
 [ 4  5  6]
 [11 12 13]
 [ 7  8  9]]

 

Column wise inserting

Syntax

np.insert(previous_array, inserting_index, array_tobe_inserted, axis=1)
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
inserting_arr = np.array([11,12,13])
print("Before insertion: ")
print(arr)
print("after insertion at index 1: ")
print(np.insert(arr, 1, inserting_arr, axis=1))

Output

Before insertion: 
[[1 2 3]
 [4 5 6]
 [7 8 9]]
after insertion at index 1: 
[[ 1 11  2  3]
 [ 4 12  5  6]
 [ 7 13  8  9]]

 

Matrix generation using numpy

We can generate matrices having elements all one and zero using the ones() and zeros() function

zero_matrix = np.zeros([3,3], dtype=int)
print(zero_matrix)

Output

[[0 0 0]
 [0 0 0]
 [0 0 0]]

 

Let’s take another example

ones_matrix = np.ones([4,5], dtype=float)
print(ones_matrix)

Output

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

Using random, we can generate matrices too

Syntax

np.random.rand(size of array)
arr = np.random.rand(3,5)
print(arr)
print(arr.shape)

Output

[[0.74206847 0.44733595 0.10237527 0.34372174 0.87838503]
 [0.48042584 0.46966427 0.318181   0.88341896 0.46838867]
 [0.02591508 0.58777176 0.07273747 0.80669176 0.69172011]]
(3, 5)

 

Element wise operation of an array

One dimensional array

arr1 = np.array([1,2,3,4])
arr2 = np.array([4,3,2,1])
print("sum: ", arr1+arr2)
print("Difference: ", arr1-arr2)
print("Multiplication: ", arr1*arr2)
print("Division: ", arr1/arr2)

Output

sum:  [5 5 5 5]
Difference:  [-3 -1  1  3]
Multiplication:  [4 6 6 4]
Division:  [0.25       0.66666667 1.5        4.        ]

 

Two-dimensional array
arr1 = np.array([[1,2,3,4], [4,3,2,1]])
arr2 = np.array([[4,3,2,1], [4,3,2,1]])
print("sum:\n ", arr1+arr2)
print("Difference:\n ", arr1-arr2)
print("Multiplication:\n ", arr1*arr2)
print("Division:\n ", arr1/arr2)

Output

sum:
  [[5 5 5 5]
 [8 6 4 2]]
Difference:
  [[-3 -1  1  3]
 [ 0  0  0  0]]
Multiplication:
  [[ 4  6  6  4]
 [16  9  4  1]]
Division:
  [[0.25       0.66666667 1.5        4.        ]
 [1.         1.         1.         1.        ]]

 

Conclusion

Numpy is a powerful library that provides fast computation on a large array of elements and matrices. Elements are stored in contiguous memory locations so the processing of array elements is faster than in the list.

Numpy has a use case in image processing too as OpenCV sees images as an array of 1’s and 0’s. Numpy can be used with the matplotlib library to plot various bar charts, histograms, etc. So numpy is the powerful and most useable library in python.

Happy Learning 🙂

Reference

https://numpy.org/

Leave a Comment