JB

NumPy Arrays

NumPy Arrays

Introduction to NumPy Arrays

  • NumPy arrays are designed for data analysis, offering specific functionalities such as element-wise mathematics and matrix operations, unlike general-purpose Python data structures like lists, tuples, dictionaries, and sets.

  • NumPy arrays are similar to lists as they store sequences of objects but are optimized for numerical operations and handling multi-dimensional data.

Key Differences Between Lists and NumPy Arrays

  • Heterogeneous vs. Homogeneous Data: Lists can store items of different types (heterogeneous), whereas NumPy arrays store items of the same type (homogeneous).

  • Dimensionality: NumPy arrays have dimensions, allowing them to be one-dimensional (like lists), two-dimensional (tables or matrices), or multi-dimensional.

Accessing NumPy Arrays

  • To use NumPy arrays, import the NumPy package using import numpy as np. The alias np is a common shorthand for NumPy.

  • If NumPy is not installed, it can be installed via the command line using pip install numpy.

  • The Anaconda distribution of Python includes NumPy and other data science packages by default.

Creating NumPy Arrays

  • NumPy arrays can be created by passing a list into the np.array() function. Example:
    my_list = [1, 2, 3, 4] my_array = np.array(my_list) type(my_array) # Output: numpy.ndarray

  • Multi-dimensional arrays can be created by passing a list of lists to np.array():
    my_array_2d = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])

Array Attributes

  • .shape: Returns the dimensions of the array as a tuple (rows, columns).
    my_array_2d.shape # Output: (2, 4)

  • .size: Returns the total number of elements in the array.
    my_array_2d.size # Output: 8

  • .dtype: Returns the data type of the elements in the array.
    my_array_2d.dtype # Output: int64

Special Array Creation Functions

  • np.identity(n): Creates an identity matrix of dimension n \times n, with ones on the diagonal.
    identity_matrix = np.identity(5)

  • np.ones(shape): Creates an array filled with ones.
    ones_array = np.ones((2, 4))

  • np.zeros(shape): Creates an array filled with zeros.
    zeros_array = np.zeros((4, 6))

Indexing and Slicing

  • Indexing and slicing in NumPy arrays are similar to Python lists.
    my_array = np.array([1, 2, 3, 4, 5]) my_array[3] # Returns 4 (index 3) my_array[3:] # Returns [4, 5] (from index 3 to the end)

  • Reversing an array using slicing:
    my_array[::-1] # Returns [5, 4, 3, 2, 1]

Indexing in Two-Dimensional Arrays

  • To index a 2D array, use comma-separated indices within square brackets: array[row, column].
    two_d_array = np.array([[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]]) two_d_array[1, 4] # Returns 11 (row 1, column 4)

  • Slicing in two dimensions allows you to extract segments of the array.

  • Reversing a two-dimensional array:
    two_d_array[::-1, ::-1] # Reverses both dimensions

Array Manipulation

  • .reshape(shape): Changes the dimensions of an array.
    reshaped_array = np.reshape(two_d_array, (6, 3))

  • np.ravel(array, order='C' or 'F'): Unravels a multi-dimensional array into a one-dimensional array.

    • order = 'C' unravels by rows.

    • order = 'F' unravels by columns (Fortran style).
      raveled_array = np.ravel(two_d_array, order='C')

  • .flatten(): Flattens a multi-dimensional array into a one-dimensional array (returns a copy).
    flattened_array = two_d_array.flatten()

  • .T: Returns the transpose of a two-dimensional array (rows become columns and vice versa).
    transposed_array = two_d_array.T

  • .flipud(): Flips an array vertically (up-down).

  • .fliplr(): Flips an array horizontally (left-right).

  • .rot90(k=n): Rotates an array by 90 degrees n times.

  • np.roll(array, shift, axis): Shifts elements in an array along a specified axis.
    rolled_array = np.roll(two_d_array, 2, axis=1) # shift each row by 2 columns

  • If the axis argument is not specified, the array is flattened before rolling.

Array Concatenation

  • np.concatenate((array1, array2), axis): Joins two arrays along a specified axis.

    new_array = np.array([[19, 20, 21],
                          [22, 23, 24],
                          [25, 26, 27]])
    concatenated_array = np.concatenate((two_d_array, new_array), axis=1)
    
  • When concatenating, the dimensions along the specified axis must match for a valid matrix.

Element-Wise Math Operations

  • NumPy arrays allow for efficient element-wise math operations, which are faster than using loops with lists.

  • Scalar operations are applied to each element in the array.
    array = np.array([1, 2, 3, 4]) array + 100 # Adds 100 to each element array * 2 # Multiplies each element by 2 array ** 2 # Squares each element array % 2 # Modulus 2 for each element

  • Element-wise operations between two NumPy arrays of the same dimension:
    small_array = np.array([[1, 2], [3, 4]]) small_array + small_array # Adds corresponding elements small_array * small_array # Multiplies corresponding elements small_array ** small_array # Element-wise exponentiation

NumPy Math Functions

  • np.mean(array, axis): Calculates the mean of an array (optionally along a specified axis).

  • np.std(array): Calculates the standard deviation of an array.

  • np.sum(array, axis): Calculates the sum of elements in an array (optionally along a specified axis).

  • np.log(array): Calculates the natural logarithm of each element.

  • np.sqrt(array): Calculates the square root of each element.

  • np.dot(array1, array2): Calculates the dot product of two arrays (for 1D arrays) or performs matrix multiplication (for 2D arrays).
    row1 = two_d_array[0, :] row2 = two_d_array[1, :] dot_product = np.dot(row1, row2)

  • Matrix multiplication with np.dot():
    matrix_product = np.dot(small_array, small_array)

Conclusion:

  • NumPy arrays are powerful and efficient for numerical calculations, especially with multi-dimensional data.

  • However, NumPy arrays are limited to homogeneous data types. For datasets with mixed data types, pandas data frames are used.