DATA SCIENCE QUIZ 3 FINAL FLASHCARD SET

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/134

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

135 Terms

New cards

What is a JSON

an open standard format for creating and storing files or exchanging data that uses comprehensible and human-readable text made up of attributes and serializable values

New cards

3 Pros of JSONs / Why are JSONs used

It is easy for humans to read and write.
It is easy for machines to parse and generate.
It is commonly used for APIs (Application Program Interface) to share data.

New cards

write the format of a JSON containing 4 variables (name, age, city, hobbies) for a 30 year old man named John for New York who likes to read, hike, and code,

New cards

what is a Key-Value Pair in a JSON?

Each pair consists of a key (string) and a value (can be string, number, boolean, array, or another object).

Example: key = “name”: , value = “John”

New cards

Curly Braces {} in a JSON

Enclose an object, which is a collection of key-value pairs (one row)

New cards

Square Brackets [] in a JSON

These enclose an array, which is an ordered list of values (all rows)

New cards

How do JSON files relate to data frames

Originally, a JSON is treated as a string of characters

Use fromJSON(simplifyVector = TRUE)

If it is an array that satisfies the constraints of a data frame, it may be able to simplify to data frame and each element of the array is one row of a dataset, each key is a column

Otherwise it is treated as a named list with elements many not all of the same length (not a data frame)

New cards

fromJSON(json, simplifyVector = FALSE) versus fromJSON(json, simplifyVector = TRUE)

simplifyVector = FALSE:

always possible
provides a list

[[1]]
[[1]]$Name
[1] "Mario"

[[1]]$Age
[1] 32

[[1]]$Occupation
[1] "Plumber"


[[2]]
[[2]]$Name
[1] "Peach"

[[2]]$Age
[1] 21

[[2]]$Occupation
[1] "Princess"

simplifyVector = TRUE:

Only possible If it is an array that satisfies the constraints of a data frame,
Simplify to data frame

Name Age Occupation
1  Mario  32    Plumber
2  Peach  21   Princess
3   <NA>  NA       <NA>
4 Bowser  NA      Koopa

New cards

How many main parent elements are in this JSON?

{
    "result_count": 3, 
    "results": [
        {
            "_href": "/ws.v1/lswitch/3ca2d5ef-6a0f-4392-9ec1-a6645234bc55", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }, 
        {
            "_href": "/ws.v1/lswitch/81f51868-2142-48a8-93ff-ef612249e025", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }, 
        {
            "_href": "/ws.v1/lswitch/9fed3467-dd74-421b-ab30-7bc9bfae6248", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }
    ]

2: results_count and results

New cards

If you converted this to R data types, what structure makes the most sense to use?

{
    "result_count": 3, 
    "results": [
        {
            "_href": "/ws.v1/lswitch/3ca2d5ef-6a0f-4392-9ec1-a6645234bc55", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }, 
        {
            "_href": "/ws.v1/lswitch/81f51868-2142-48a8-93ff-ef612249e025", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }, 
        {
            "_href": "/ws.v1/lswitch/9fed3467-dd74-421b-ab30-7bc9bfae6248", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }
    ]

Named list of 2 elements ($result_count - numeric class, $results - data frame class)

New cards

Focus only on the results. What does each { XXX } represent?

{
    "result_count": 3, 
    "results": [
        {
            "_href": "/ws.v1/lswitch/3ca2d5ef-6a0f-4392-9ec1-a6645234bc55", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }, 
        {
            "_href": "/ws.v1/lswitch/81f51868-2142-48a8-93ff-ef612249e025", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }, 
        {
            "_href": "/ws.v1/lswitch/9fed3467-dd74-421b-ab30-7bc9bfae6248", 
            "_schema": "/ws.v1/schema/LogicalSwitchConfig", 
            "type": "LogicalSwitchConfig"
        }
    ]

One row of data

New cards

What is an API?

Application Programming Interface describes a general class of tool that allows computer software, rather than humans, to interact with an organization’s data.

Application refers to software.
Interface: a contract of service between two applications
- This contract defines how the two communicate with each other using requests and responses
- Does not see it in a graphical format like humans

New cards

Is there a standard way to access APIs?

Every API has documentation for how software developers should structure requests for data / information and in what format to expect responses.

This makes it more ethical
There is not one standard way to access an API

New cards

Web APIs

Web Application Programming Interfaces, which focus on transmitting requests and responses for raw data through a web browser.

Our browsers communicate with web servers using a technology called HTTP or Hypertext Transfer Protocol.
Programming languages such as R can also use HTTP to communicate with web servers.

New cards

https://api.census.gov
what is the base url, the scheme and the hostname

https://api.census.gov is the base URL.
http:// is the scheme (tells your browser or program how to communicate with the web server)
api.census.gov is the hostname or host address, which is a name that identifies the web server that will process the request.

New cards

https://api.census.gov/data/2019/acs/acs1?get=NAME,B02015_009E,B02015_009M&for=state:*

What is the file path?

Tells the web server how to get to the desired resource.

data/2019/acs/acs1

New cards

What is the query string?

https://api.census.gov/data/2019/acs/acs1?get=NAME,B02015_009E,B02015_009M&for=state:*

it provides the parameters for the function you would like to call

?get=NAME,B02015_009E, B02015_009M&for=state:*

New cards

How are string / key pairs formatted in a query string?

This is a string of key-value pairs separated by &.

That is, the general structure of this part is key1=value1&key2=value2.

New cards

In R, it is easiest to access Web APIs through…?

A wrapper package, an R package written specifically for a particular Web API

New cards

Many APIs require users to obtain a ____ to use their services. Why?

key

This lets organizations keep track of what data is being used

it also rate limits their API and ensures programs don’t make too many requests per day/minute/hour.

New cards

API key rate limits

Ensures programs don’t make too many requests per day/minute/hour

New cards

Percent encoding in a url

URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.

New cards

httr2::request()

creates an API request object using the base URL

New cards

httr2::req_url_path_append()

builds up the URL by adding path components separated by /

New cards

httr2::req_url_query()
(4 arguments)

adds the ? separating the endpoint from the query and sets the key-value pairs in the query

get = c("variable1", "variable2")

`for` = I(“row: rowname”) or I(“row: *”)

I() funtion inhibits parsing of special characters like : and *
don't forget backticks

key = api key name

.multi = “comma” controls how multiple values for a given key are combined.

New cards

use httr2 to build the URL https://api.census.gov/data/2019/acs/acs1?get=NAME,B02015_009E,B02015_009M&for=state:*

req <- request("https://api.census.gov") %>% 
    req_url_path_append("data") %>% 
    req_url_path_append("2019") %>% 
    req_url_path_append("acs") %>% 
    req_url_path_append("acs1") %>% 
    req_url_query(get = c("NAME", "B02015_009E", "B02015_009M"), `for` = I("state:*"), key = census_api_key, .multi = "comma")

New cards

why would we use httr2 instead of just writing the URL string?

To generalize this code with functions!
To handle special characters
- e.g., query parameters might have spaces, which need to be represented in a particular way in a URL (URLs can’t contain spaces)

New cards

Once we’ve fully constructed our request, we can use _____ to send out the API request and get a response.

req_perform()

New cards

What format is the API response in? What can use do?

JSON

resp_body_json(simplifyVector = TRUE) creates a dataframe

Without simplifyVector = TRUE, the JSON is read in as a list.

New cards

API Documentation - what to look for

NOT SURE WHAT TO PUT HERE

New cards