Exploring Data with Pandas: Fundamentals

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/19

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

20 Terms

1
New cards

what is vectorized operations?

Vectorized operations in Python are a way to perform computations on multiple elements of an array, vector, or data frame at once, instead of using loops. This can make code faster, more readable, and use less memory.

  • series_a + series_b - Addition

  • series_a - series_b - Subtraction

  • series_a * series_b - Multiplication (this is unrelated to the multiplications used in linear algebra).

  • series_a / series_b - Division

<p><span style="font-family: Google Sans, Arial, sans-serif">Vectorized operations in Python are <strong><mark>a way to perform computations on multiple elements of an array, vector, or data frame at once, instead of using loops</mark></strong>. This can make code faster, more readable, and use less memory.</span></p><ul><li><p><code>series_a + series_b</code> - Addition</p></li><li><p><code>series_a - series_b</code> - Subtraction</p></li><li><p><code>series_a * series_b</code> - Multiplication (this is unrelated to the multiplications used in linear algebra).</p></li><li><p><code>series_a / series_b</code> - Division</p><p></p></li></ul><p></p>
2
New cards

How to find maximum number in a pandas series?

Series.max()

3
New cards

How to find minimum number in a pandas series?

Series.min()

4
New cards

How to find the average of pandas series?

Series.mean()

5
New cards

How to find the median of pandas series?

Series.median()

The median is the middle value when all values are sorted in order.

6
New cards

How to find the mode of pandas series?

Series.mode()

The mode of a set of values is the value that appears most often.

7
New cards

How to get sum of all the values in a pandas series?

Series.sum()

8
New cards

What is the output of series.describe()

Count, mean,std, min,max, 25%, 50%, 75%, Name (i.e., column name) and dtype.

9
New cards

What is method chaining?

The method that a way to combine multiple methods together in a single line.

E.g.: countries_counts = f500["country"].value_counts()

10
New cards

How to extract total number of china records in country series?

print(f500["country"].value_counts().loc["China"])

11
New cards

Which parameter the dataframe require to perform calculation?

Axis

12
New cards

What is the example of perform calculation on particular column?

f500[["revenues", "profits"]].median(axis=0)

The above example provide average value for revenues and profits columns of f500 dataframe.

For rows, the axis should be either 0 or “index” whereas for columns, the axis should be either 1 or “columns”.

13
New cards

What is the syntax to get maximum number from all the numerical columns in a dataframe?

print(df.max(numeric_only=True))

14
New cards

For what type of columns the describe method give the statistics?

numeric columns.

15
New cards

To make describe method to return statistics for non-numeric columns what is the syntax?

df.describe(include = [’0’])

16
New cards

How to assign a value to all the rows in a column of dataframe?

df[column name] = value

top5_rank_revenue["revenues"] = 0

17
New cards

How to assign a value to a specific row in a column of dataframe?

df.loc[row label, column label] = value

18
New cards

What is boolean indexing? with example.

Boolean indexing is used to filter data by selecting subsets of the data from a given Pandas DataFrame.

In the below example, the motor_bool returns a series of true and false. True for value of records in industry as "Motor Vehicles and Parts" else false. In motor_countries variable only the countries which industry as "Motor Vehicles and Parts" is available.

motor_bool = f500.loc[:,"industry"] == "Motor Vehicles and Parts"

motor_countries = f500.loc[motor_bool,"country"]

19
New cards

How to add new column in a dataframe?

df[new column name] = value

top5_rank_revenue["year_founded"] = 0

20
New cards

How to find top two countries?

top_2_countries = f500["country"].value_counts().head(2)