6 -Data Loading, Storage And File Format

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/24

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

25 Terms

New cards

read_csv

It loads delimited data from a file, URL, or file-like object, using a comma as the default delimiter.

New cards

read_table

Use read_table (e.g., pd.read_table('file.txt', sep=' ')).

New cards

header parameter in read_csv

It specifies the row number to use as column names (default is 0). Use header=None if there is no header row.

New cards

skiprows parameter

Use the skiprows parameter (e.g., pd.read_csv('file.csv', skiprows=[0, 2, 3])).

New cards

na_values parameter in read_csv

It specifies a list of strings to treat as missing values (e.g., na_values=['NULL', 'NA']).

New cards

nrows parameter

Use the nrows parameter (e.g., pd.read_csv('file.csv', nrows=5)).

New cards

chunksize parameter in read_csv

It reads the file in chunks of the specified size, useful for large datasets (e.g., chunker = pd.read_csv('file.csv', chunksize=1000)).

New cards

to_csv method

Use the to_csv method (e.g., data.to_csv('output.csv')).

New cards

index=False in to_csv

Set index=False in to_csv (e.g., data.to_csv('output.csv', index=False)).

New cards

read_json

It reads data from a JSON string or file into a DataFrame (e.g., pd.read_json('data.json')).

New cards

to_json method

Use the to_json method (e.g., data.to_json()).

New cards

read_html

It parses all tables in an HTML file or URL into a list of DataFrames (e.g., pd.read_html('page.html')).

New cards

read_excel

Use read_excel (e.g., pd.read_excel('file.xlsx', sheet_name='Sheet1')).

New cards

to_excel method

Use the to_excel method (e.g., data.to_excel('output.xlsx', sheet_name='Sheet1')).

New cards

read_sql

It reads the results of a SQL query into a DataFrame (e.g., pd.read_sql('SELECT * FROM table', con)).

New cards

read_sql with a connection object

Use read_sql with a connection object (e.g., pd.read_sql('SELECT * FROM test', sqlite3.connect('mydata.sqlite'))).

New cards

HDFStore

It provides a dict-like interface for storing DataFrames in HDF5 format (e.g., store = pd.HDFStore('mydata.h5')).

New cards

read_hdf function

Use the read_hdf function (e.g., pd.read_hdf('mydata.h5', 'key')).

New cards

to_pickle

It serializes a DataFrame to disk in pickle format (e.g., data.to_pickle('data.pkl')).

New cards

read_pickle

Use read_pickle (e.g., pd.read_pickle('data.pkl')).

New cards

sep parameter in read_csv

It specifies the delimiter to use (e.g., sep=',' for CSV, sep='\s+' for whitespace).

New cards

names parameter in read_csv

Use the names parameter (e.g., pd.read_csv('file.csv', header=None, names=['col1', 'col2'])).

New cards

index_col parameter in read_csv

It specifies the column(s) to use as the DataFrame's index (e.g., index_col='message').

New cards

handle missing values in read_csv

Use the na_values parameter to specify strings to treat as missing (e.g., na_values=['NA', 'NULL']).

New cards

parse_dates parameter in read_csv

It attempts to parse specified columns as datetime objects (e.g., parse_dates=['date']).