6 -Data Loading, Storage And File Format

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/24

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

25 Terms

1
New cards

read_csv

It loads delimited data from a file, URL, or file-like object, using a comma as the default delimiter.

2
New cards

read_table

Use read_table (e.g., pd.read_table('file.txt', sep=' ')).

3
New cards

header parameter in read_csv

It specifies the row number to use as column names (default is 0). Use header=None if there is no header row.

4
New cards

skiprows parameter

Use the skiprows parameter (e.g., pd.read_csv('file.csv', skiprows=[0, 2, 3])).

5
New cards

na_values parameter in read_csv

It specifies a list of strings to treat as missing values (e.g., na_values=['NULL', 'NA']).

6
New cards

nrows parameter

Use the nrows parameter (e.g., pd.read_csv('file.csv', nrows=5)).

7
New cards

chunksize parameter in read_csv

It reads the file in chunks of the specified size, useful for large datasets (e.g., chunker = pd.read_csv('file.csv', chunksize=1000)).

8
New cards

to_csv method

Use the to_csv method (e.g., data.to_csv('output.csv')).

9
New cards

index=False in to_csv

Set index=False in to_csv (e.g., data.to_csv('output.csv', index=False)).

10
New cards

read_json

It reads data from a JSON string or file into a DataFrame (e.g., pd.read_json('data.json')).

11
New cards

to_json method

Use the to_json method (e.g., data.to_json()).

12
New cards

read_html

It parses all tables in an HTML file or URL into a list of DataFrames (e.g., pd.read_html('page.html')).

13
New cards

read_excel

Use read_excel (e.g., pd.read_excel('file.xlsx', sheet_name='Sheet1')).

14
New cards

to_excel method

Use the to_excel method (e.g., data.to_excel('output.xlsx', sheet_name='Sheet1')).

15
New cards

read_sql

It reads the results of a SQL query into a DataFrame (e.g., pd.read_sql('SELECT * FROM table', con)).

16
New cards

read_sql with a connection object

Use read_sql with a connection object (e.g., pd.read_sql('SELECT * FROM test', sqlite3.connect('mydata.sqlite'))).

17
New cards

HDFStore

It provides a dict-like interface for storing DataFrames in HDF5 format (e.g., store = pd.HDFStore('mydata.h5')).

18
New cards

read_hdf function

Use the read_hdf function (e.g., pd.read_hdf('mydata.h5', 'key')).

19
New cards

to_pickle

It serializes a DataFrame to disk in pickle format (e.g., data.to_pickle('data.pkl')).

20
New cards

read_pickle

Use read_pickle (e.g., pd.read_pickle('data.pkl')).

21
New cards

sep parameter in read_csv

It specifies the delimiter to use (e.g., sep=',' for CSV, sep='\s+' for whitespace).

22
New cards

names parameter in read_csv

Use the names parameter (e.g., pd.read_csv('file.csv', header=None, names=['col1', 'col2'])).

23
New cards

index_col parameter in read_csv

It specifies the column(s) to use as the DataFrame's index (e.g., index_col='message').

24
New cards

handle missing values in read_csv

Use the na_values parameter to specify strings to treat as missing (e.g., na_values=['NA', 'NULL']).

25
New cards

parse_dates parameter in read_csv

It attempts to parse specified columns as datetime objects (e.g., parse_dates=['date']).