amazon sql questions

What is a JOIN in SQL?
A JOIN is a keyword used to merge together two or more tables on the same key.
Which SQL command is used to add rows to a table?
The INSERT command is used to add rows to an existing table.
What is OLAP and OLTP? When do you denormalize data?
OLAP are databases intended for online analytical processing, while OLTP are databases intended for online transaction processing. Denormalize the data when it falls under OLAP operations and normalize when OLTP.
What is the difference between WHERE and HAVING?
The WHERE clause is used to filter rows before grouping and HAVING is used to exclude records after grouping.
When do you use the CASE WHEN function?
CASE WHEN lets you write complex conditional statements on the SELECT clause, and also allows you to pivot data from wide to long formats.
What is the difference between a LEFT JOIN and an INNER JOIN?
Breaking it down, a JOIN is used to combine data from multiple data sources. A LEFT JOIN combines and returns all rows from the left table, even if there are no matches in the right table. An INNER JOIN, only returns rows if there is a match in both of the tables.
When would you use UNION vs UNION ALL? What if there were no duplicates?
UNION is used to join multiple data sets. UNION is thus used to combine two queries into a single result, based on the parameters of the query. Conversely, UNION ALL extracts all rows from a set of two tables, including duplicates.
What is the difference between COUNT and COUNT DISTINCT?
The COUNT function is used to return the number of rows specified by the query, which is to say those rows that are specified by a WHERE condition. COUNT DISTINCT eliminates duplicate values before the count is applied, or it returns the number of unique rows in a table. Additionally, it does not count rows with NULL values.
When would you use a HAVING clause versus a WHERE clause?
You would use WHERE and HAVING if you are using them together with the GROUP BY clause. For example, WHERE would be used to filter rows before the GROUP BY clause, while HAVING would filter rows after the clause.

1. How would you explain SQL to a person with no technical knowledge?
    Sample answer: SQL stands for Structured Query Language, and as the name
    suggests, it is a language used by database administrators to communicate with the
    database. The administrator inputs a query in the form of a string of commands to
    search for values in a database.
2. What is the Amazon database?
    Sample answer: Amazon database is one of the Amazon Web Services, which is a fully
    managed petabyte-scale data warehouse service. There are various Amazon Database
    Services available, such as DynamoDB, RDS, ElastiCache, and RedShift.
3. What is DynamoDB Auto Scaling?
    Sample answer: Auto Scaling is a specialized feature of DynamoDB that allows it to
    automatically scale up and down the global secondary index or its own read and write.
4. Which of the four available AWS services will you choose from for collecting and
    processing e-commerce data for real-time analysis?
    Sample answer: For collecting and processing E-Commerce data for real-time analysis,
    I’ll use DynamoDB. DynamoDB is a fully managed NoSQL database service that can be
    used for any type of unstructured data and can also be used for the data taken from
    e-commerce websites.
5. What is an Inner Join?
    Sample answer: Inner join is the default join, which is used when no join is specified.
6. What is a full outer join in SQL?
    Sample answer: The full outer join returns all the rows from both the tables, inserting
    NULLs in places where no matching rows exist and matching up the rows where a
    match can be made.
7. Which is the only constraint that works at the column level?
    Sample answer: The only constraint that works at the column level is NOT NULL.
8. After using the ALTER command, can a rollback be performed?
    Sample answer: No, a rollback cannot be performed, as ALTER is a DDL command,
    and when the DDL statements are executed, the oracle server performs an automatic
    COMMIT.
9. What are some examples of pseudocolumns in SQL?
    Sample answer: ROWNUM, ROWID, and USER are some examples of pseudocolumns
    in SQL.
    10.Write a command to change the password of the user x from "abcd" to "dfgh."
    Sample answer: ALTER USER x IDENTIFIED BY dfgh

What is a leader node and a computer node?
    Sample answer: In a leader node, the queries from the client applications are received,
    whereas in a computer node, the steps assigned in the leader node are executed, and
    the data gets transmitted.
10. You have a flights table that contains information about all the flights an airline has booked. Select all entries from the flights table.

Hint: The * serves as a wildcard that will capture all columns in a table.

### 11. Given two tables, users and neighborhood, write a query to find all neighborhoods that have zero users.

In this Google SQL interview question, we are given two tables. The users table includes a user_id, name, neighborhood_id, and a created_at timestamp. The second table includes a user_id, neighborhoods_name, and a city_id.

One hint: Whenever a SQL question asks you to find values with 0, immediately think of the concept of a LEFT JOIN. Remember that an INNER JOIN finds values that are in both tables, while a LEFT JOIN keeps only the values in the left table.

One additional step in this question would be to use the WHERE to select only NULL values. Here is a simple solution:

```javascript
SELECT n.name   
FROM neighborhoods AS n 
LEFT JOIN users AS u
    ON n.id = u.neighborhood_id
WHERE u.id IS NULL

```

### 12. Write a query to return pairs of projects where the end date of one project matches the start date of another project.

Here is the schema for the projects table:

Columns	Type
`id`	INTEGER
`title`	VARCHAR
`start_date`	DATETIME
`end_date`	DATETIME
`budget`	FLOAT

With this question, our aim is to use values in a single table for comparisons. Specifically, we are looking to select pairs of projects in which one project starts on the same day another project ends.

This task requires us to perform an INNER JOIN. The order of the values must be precise to correctly perform this function. Pay particular attention of aliasing to make sure you keep your tables straight. For instance, you do not want to name one of your tables project_starts, but use that table to source the value for project_title_end, or vice-versa.

### 13. You are given a table that represents search results, and includes a search position and search rating. Write a query to measure the quality of the results for each search term.

    More Context: The search results table includes:
    * Query (search term).
    * Result_ID
    * Position
    * Rating (1 to 5 where 5 is high relevance and 1 is low relevance)

For example, the query “cat” shows a picture of a cat, and includes a rating of 4 and a position of 1. However, the query “dog” shows a picture of a hotdog, with a rating of 2 and a position of 1. The ranking precision of “cat” is better than “dog.” How would you write a query to create a metric that could validate the position by their search precision?

Hint: Taking the average of the ratings seems like a good way to measure the quality of the search results.

    We now have to find a way to weight the position in accordance to the rating to normalize the metric score. For example, in a table of two search results for one query:
    * If the first result is rated a 5 (position = 1, rating = 5) and the last result is rated a 1 (position = 2, rating = 1), this is good.
    * If the first result is rated a 5 (position = 1, rating = 5) and the last result is also rated a 5 (position = 2, rating = 5), this is even better
    * If the first result is rated a 1 (position = 1, rating = 1) and last result is rated a 5 (position = 2, rating = 5), this is bad

### 14. Write a SQL query to select the second-highest salary in the engineering department.

You are presented with an employees table (employee_id, first_name, last_name, salary and department_id) and a departments table (id, name). The department_id field in the employees table is associated with the “id” field in the departments table.

With this knowledge, department_id is a “foreign key” because it is a column that references the primary key of another table. Based on this common field, we can join both tables using INNER JOIN to associate the name of the department name to the employees that are a part of these departments.

```javascript
SELECT salary
FROM employees
INNER JOIN departments
ON employees.department_id = departments.id

```

What else would you need to do to filter to just the employees in the engineering department, their order by salary, and selection of the top two salaries?

### 15. Given three tables representing customer transactions and customer attributes, write a query to get the average order value by gender.

You are provided with a transactions table (id, user_id, created_at, product_id, and quantity), a users table (id, name, and sex), a products table (id, name, and price). To answer the question, we would need to apply an INNER JOIN between the users and transactions tables to find the value.

Here is a quick solution:

```javascript
SELECT
    u.sex
    , ROUND(AVG(quantity  *price), 2) AS aov
FROM users AS u
INNER JOIN transactions AS t
   ON u.id = t.user_id
INNER JOIN products AS p
    ON t.product_id = p.id
GROUP BY 1

```

### 16. Write a query to get the total amount spent on each item in the ‘purchases’ table by users that registered in 2022.

Let us say you have a database with two tables. The first table is composed of user information, including their registration date, and the second table is of purchases, which has the purchase history (if any) for all users.

    Here is a process you can use to solve this:
    * Use INNER JOIN or JOIN to connect the users and purchases table by the user_id column.
    * Filter the results by using the WHERE clause.
    * Use GROUP BY to aggregate item, and apply SUM to calculate the amount spent.

### 17. You are given a table containing information about the rides of users for a ride-sharing company. Write a query to find the duration of each ride over two hours.

Note: Find the duration in minutes and sort results by duration in descending order.

rides table:

Columns	Type
`id`	INTEGER
`passenger_user_id`	INTEGER
`start_dt`	DATETIME
`end_dt`	DATETIME
`distance`	FLOAT
`ride_region`	VARCHAR
`is_completed`	INTEGER

Output:

Column	Type
`duration_minutes`	INTEGER

Here is a quick solution:

```javascript
SELECT  TIMESTAMPDIFF(MINUTE,start_dt,end_dt) AS duration_minutes 
FROM rides 
WHERE TIMESTAMPDIFF(MINUTE,start_dt,end_dt) > 120 
ORDER BY duration_minutes DESC

```