Data Analysis

  1. Spreadsheets (e.g., Excel, Google Sheets):

    Data is arranged in rows and columns, where each row represents an observation or record, and each column represents a variable or attribute.

    Example: A survey dataset with columns for respondent ID, age, gender, responses, etc.

  2. Databases (e.g., SQL, Access):

    Each table has a respondent,activity,etc identity written(primary key).

    To ensures that every individual or record can be uniquely recognized and distinguished from others

  3. CSV Files (Comma-Separated Values):

    -it's a format for storing and sharing structured raw data

    -Originally the data is stored in a plain text format with each row of data separated by commas. each row typically represents a record, and each column represents a variable.

    -When the data is put in a spreadsheet it becomes a table

    Example:

    Original

    Name,Age,Gender,Response

    Alice,25,Female,Yes

    Bob,30,Male,No

    Charlie,22,Male,Yes

    Spreadsheet

  1. JSON (JavaScript Object Notation):

    -A alternative data format in representation & easier mobility of data across programs

    *different with data frame: a specific data structure primarily used for analysis in programming environments like R(code in Rstudio)

    -can convert between data frames and JSON (e.g., in R) when you need to share or transport data, but they serve different purposes in the context of data analysis and exchange.

    -you can still use the same R studio code even after using JSON, basic structure of code in RStudio remains the same but the way to handle the data will differ depending on the format:

    1. Data frame: read.csv() or read.table() to load the data and perform analysis.

    2. JSON: jsonlite or rjson to read and parse JSON data. After parsing it, you might convert it to a data frame

    *but the steps to load and convert the data may differ slightly.

  1. Spreadsheets or Tables with Standardized Codes:

    -use Spreadsheets (Excel or Google Sheets) to clean up the data in the tables

    -its better to you spreadsheets for it especially with bigger data sets & external sources because spreadsheets automatically fixes ur errors instead of having to code it and fix it one by one like in R studio:

    Fixing formatting issues

    Removing duplicates

    Validating data types

    Performing simple transformations

    -its also easier when sharing the data with others, easier updates, ables to be used across different platforms:

    *If you're working with data from multiple sources or integrating data across platforms (for example, combining data from multiple surveys or databases) having standardized codes defined ahead of time helps ensure consistency across all data sources

  1. Data Frames (e.g., in programming languages like R or Python):

    -analyze data like normal using R studio(use the same code as what was teached in UGM)

    A table-like structure in programming environments that organizes data into rows and columns for statistical analysis.

    Example: A data frame in R or Python may store variables such as age, income, and education level for a group of respondents.