NumPy arrays are designed for data analysis, offering specific functionalities such as element-wise mathematics and matrix operations, unlike general-purpose Python data structures like lists, tuples, dictionaries, and sets.
NumPy arrays are similar to lists as they store sequences of objects but are optimized for numerical operations and handling multi-dimensional data.
Heterogeneous vs. Homogeneous Data: Lists can store items of different types (heterogeneous), whereas NumPy arrays store items of the same type (homogeneous).
Dimensionality: NumPy arrays have dimensions, allowing them to be one-dimensional (like lists), two-dimensional (tables or matrices), or multi-dimensional.
To use NumPy arrays, import the NumPy package using import numpy as np
. The alias np
is a common shorthand for NumPy.
If NumPy is not installed, it can be installed via the command line using pip install numpy
.
The Anaconda distribution of Python includes NumPy and other data science packages by default.
NumPy arrays can be created by passing a list into the np.array()
function. Example:my_list = [1, 2, 3, 4] my_array = np.array(my_list) type(my_array) # Output: numpy.ndarray
Multi-dimensional arrays can be created by passing a list of lists to np.array()
:my_array_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
.shape
: Returns the dimensions of the array as a tuple (rows, columns). my_array_2d.shape # Output: (2, 4)
.size
: Returns the total number of elements in the array. my_array_2d.size # Output: 8
.dtype
: Returns the data type of the elements in the array.my_array_2d.dtype # Output: int64
np.identity(n)
: Creates an identity matrix of dimension n \times n, with ones on the diagonal.identity_matrix = np.identity(5)
np.ones(shape)
: Creates an array filled with ones.ones_array = np.ones((2, 4))
np.zeros(shape)
: Creates an array filled with zeros.zeros_array = np.zeros((4, 6))
Indexing and slicing in NumPy arrays are similar to Python lists.my_array = np.array([1, 2, 3, 4, 5]) my_array[3] # Returns 4 (index 3) my_array[3:] # Returns [4, 5] (from index 3 to the end)
Reversing an array using slicing:my_array[::-1] # Returns [5, 4, 3, 2, 1]
To index a 2D array, use comma-separated indices within square brackets: array[row, column]
.two_d_array = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]]) two_d_array[1, 4] # Returns 11 (row 1, column 4)
Slicing in two dimensions allows you to extract segments of the array.
Reversing a two-dimensional array:two_d_array[::-1, ::-1] # Reverses both dimensions
.reshape(shape)
: Changes the dimensions of an array.reshaped_array = np.reshape(two_d_array, (6, 3))
np.ravel(array, order='C' or 'F')
: Unravels a multi-dimensional array into a one-dimensional array.
order = 'C' unravels by rows.
order = 'F' unravels by columns (Fortran style).raveled_array = np.ravel(two_d_array, order='C')
.flatten()
: Flattens a multi-dimensional array into a one-dimensional array (returns a copy).flattened_array = two_d_array.flatten()
.T
: Returns the transpose of a two-dimensional array (rows become columns and vice versa).transposed_array = two_d_array.T
.flipud()
: Flips an array vertically (up-down).
.fliplr()
: Flips an array horizontally (left-right).
.rot90(k=n)
: Rotates an array by 90 degrees n times.
np.roll(array, shift, axis)
: Shifts elements in an array along a specified axis.rolled_array = np.roll(two_d_array, 2, axis=1) # shift each row by 2 columns
If the axis
argument is not specified, the array is flattened before rolling.
np.concatenate((array1, array2), axis)
: Joins two arrays along a specified axis.
new_array = np.array([[19, 20, 21],
[22, 23, 24],
[25, 26, 27]])
concatenated_array = np.concatenate((two_d_array, new_array), axis=1)
When concatenating, the dimensions along the specified axis must match for a valid matrix.
NumPy arrays allow for efficient element-wise math operations, which are faster than using loops with lists.
Scalar operations are applied to each element in the array.array = np.array([1, 2, 3, 4]) array + 100 # Adds 100 to each element array * 2 # Multiplies each element by 2 array ** 2 # Squares each element array % 2 # Modulus 2 for each element
Element-wise operations between two NumPy arrays of the same dimension:small_array = np.array([[1, 2], [3, 4]]) small_array + small_array # Adds corresponding elements small_array * small_array # Multiplies corresponding elements small_array ** small_array # Element-wise exponentiation
np.mean(array, axis)
: Calculates the mean of an array (optionally along a specified axis).
np.std(array)
: Calculates the standard deviation of an array.
np.sum(array, axis)
: Calculates the sum of elements in an array (optionally along a specified axis).
np.log(array)
: Calculates the natural logarithm of each element.
np.sqrt(array)
: Calculates the square root of each element.
np.dot(array1, array2)
: Calculates the dot product of two arrays (for 1D arrays) or performs matrix multiplication (for 2D arrays).row1 = two_d_array[0, :] row2 = two_d_array[1, :] dot_product = np.dot(row1, row2)
Matrix multiplication with np.dot()
:matrix_product = np.dot(small_array, small_array)
NumPy arrays are powerful and efficient for numerical calculations, especially with multi-dimensional data.
However, NumPy arrays are limited to homogeneous data types. For datasets with mixed data types, pandas data frames are used.