1/247
Looks like no tags are added yet.
Name  | Mastery  | Learn  | Test  | Matching  | Spaced  | 
|---|
No study sessions yet.
What types of data are created during day-to-day activities?
Web data, e-commerce, financial transactions, online trading, social network data, and vehicle data.
What type of data is primarily found on the web?
Text data.
What is streaming data?
Data that is continuously generated and processed in real-time.
What is semi-structured data?
Data that does not conform to a rigid structure, such as XML.
What is an example of data used in social networks?
Graph data, including Semantic Web data like RDF.
What is relational data?
Data organized in tables, often used for transactions and legacy systems.
What type of data represents relationships and connections?
Graph data.
In which industries is data science utilized?
Travelling, Business, Healthcare, Automobile technology, Travel plan, Airline industry, Logistics industry.
What mathematical concept is important in data science?
Mathematical Modeling.
What is data science?
An interdisciplinary field of scientific methods, processes, algorithms, and systems to extract knowledge or insights from data.
Which statistical methods are used in data science?
Statistical and Stochastic modeling, Probability.
Which field of study in data science involves pattern recognition and visualization?
Computer Science.
What is one of the main goals of data science?
To manage, manipulate, extract, and interpret knowledge from large amounts of data.
What types of data does data science work with?
Both structured and unstructured data.
What does data cleaning achieve in the data science process?
Transforms raw data into usable data.
What is involved in the data collection step of the data science process?
Gathering the right set of high quality, targeted data.
What is the first step in the data science process?
Framing and understanding the problem.
What is the goal of model building in the data science process?
To use the collected data to make predictions.
What is the purpose of Exploratory Data Analysis in the data science process?
To uncover valuable insights about the data.
What does model deployment involve in the data science process?
Deploying the model in a real-time production environment.
Why is communicating results important in the data science process?
To communicate key findings to the stakeholders.
What type of skills are required for Business Intelligence?
Basic statistics, business knowledge, data transformation, and visualization skills.
What does Data Science focus on?
Extracting information from datasets and creating forecasts.
What is the main objective of Business Intelligence?
To identify historical trends and answer questions about what happened during the last period.
How does Data Science manage data?
Designed to manage a large volume of dynamic and less structured data.
What skills are necessary for Data Science?
More technical skills like coding, data mining, advanced statistics, and domain knowledge.
What makes Data Science more complex than Business Intelligence?
Its capacity for forecasting, ability to manage dynamic data, and requirements for more advanced skills.
How is data collection and management approached in Business Intelligence?
Designed to manage well-organized data.
What process is used for data integration in Data Science?
ELT (Extract-Load-Transform).
What storage method is used in Data Science?
Real-time clusters.
What method does Data Science utilize?
The scientific method.
Which has higher complexity, Data Science or Business Intelligence?
Data Science.
What does Business Intelligence primarily focus on?
The past and present.
What field uses mathematics and statistics to discover hidden patterns in data?
Data Science.
What tools are commonly used in Data Science?
SAS, BigML, MATLAB, Excel, etc.
What questions does Data Science deal with?
What will happen and what if.
What technologies are available for handling large data sets in Data Science?
Technologies such as Hadoop.
How flexible is Data Science in terms of data sources?
Much more flexible as data sources can be added as per requirement.
What do data scientists build using analysis?
Mathematical models or machine learning models.
What is the primary role of a data scientist?
To gather data, process it, manipulate it as per requirements, and feed it for analysis.
What statistical concepts are important in data science?
Basic statistics, probability distribution, dimension reduction.
Which tools are used for data wrangling and management?
Oracle, MongoDB, Hadoop.
What programming languages are required for data science?
Python, R, SAS, SQL, MATLAB.
What tools are commonly used for data visualization?
Tableau, Excel.
What are some machine learning techniques used in data science?
Linear regression, decision tree.
What is a key advantage of SQL regarding data access?
It can directly access a large amount of data without copying it to other applications.
What tasks are easier to perform in SQL compared to Excel?
Joining tables, automating, and reusing code.
How does data analysis in SQL compare to Excel or CSV files?
Data analysis done in SQL is easy to audit and replicate.
What is a limitation of Excel when dealing with databases?
Excel is not useful for large-sized databases.
What types of data can SQL handle?
Data of almost any shape and massive size.
What does the LOWER() and UPPER() function do in SQL?
LOWER() converts text data into lower case, while UPPER() converts text data into upper case.
What is data munging?
The phase of data transformation that simplifies data for better understanding.
What do the TRIM, LTRIM, and RTRIM operations do in SQL?
TRIM removes leading and trailing blank spaces, LTRIM removes leading spaces, and RTRIM removes trailing spaces from a given input string.
What does the REPLACE function do in SQL?
Replaces all occurrences of a source substring with a target substring in a given string.
What does the SUBSTR function do in SQL?
Returns a substring of a given string from a specified position.
What SQL operations are used to aggregate data?
MAX finds the maximum value, MIN finds the minimum value, AVG calculates the average, SUM calculates the total sum, and COUNT returns the number of records or values.
What does an inner join do in SQL?
Selects all rows from both tables as long as there is a match between the columns.
What does a full outer join return in SQL?
Records that have a match in either the left table or the right table.
What does a left outer join return in SQL?
All the rows of the left side table and matching rows in the right side table.
What does a right outer join return in SQL?
All the rows of the right side table and matching rows for the left side table.
What is the purpose of filtering in data science?
To remove redundant and useless data, making the dataset more efficient and useful.
Why is filtering important during data analysis?
It allows for the retrieval of a specific part of the actual data needed for analysis.
What types of records may exist in a dataset that require filtering?
Redundant records or impartial records.
What can happen if redundant records are not removed from a dataset?
It may result in wrong analysis.
What does the data filtering process consist of?
Different strategies for refining and reducing datasets.
How can query performance be enhanced?
By applying it to refined data.
What is one benefit of data filtering?
It can reduce strain on applications.
What is the purpose of the LIKE clause and its wildcards?
The LIKE clause is used to specify a pattern matching condition. It uses two wildcards: the percent sign '%' represents any string of zero or more characters, and the underscore '_' represents a single number or character.
What does the term 'window' refer to in SQL window functions?
A set of rows.
In which SQL clauses can window functions be called?
With the SELECT statement or the ORDER BY clause.
What do SQL window functions calculate their result based on?
A set of rows rather than on a single row.
How do SQL window functions differ from aggregate functions?
Window functions generate results with attributes of an individual row along with the results of the window function, while aggregate functions group the result set based on one or more columns.
Can window functions be called in the WHERE clause?
No, they cannot be called in the WHERE clause.
What is the purpose of the PARTITION BY clause in the OVER() clause?
To define window partitions and form groups of rows for window functions.
What is the role of the ORDER BY clause in SQL window functions?
It provides logical sorting of rows within a partition.
Which SQL clause defines the window for window functions?
The OVER() clause.
Can multiple window functions be included in a single query?
Yes.
What is the RANK() function and its usage?
The RANK() function assigns a rank to each row within a partition of a result set. It can rank salaries within departments using the PARTITION BY clause, or rank all salaries across the entire dataset without it.
How do you rank salaries in descending order using SQL?
By using ORDER BY SALARY DESC in the RANK() function.
What SQL function is used to rank salaries within departments?
RANK()
In the SQL query, what is used to partition the data for PERCENT_RANK()?
deptname.
What is the order used in the PERCENT_RANK() function in the provided SQL query?
By salary in descending order.
What does the PERCENT_RANK() function calculate?
The SQL percentile rank of each row.
What does the SQL query return along with the PERCENT_RANK()?
deptname, deptid, salary, ename, eid.
What is the range of the percentile ranking number produced by PERCENT_RANK()?
From zero to one.
What is the DENSE_RANK function and its differences from RANK()?
The DENSE_RANK function calculates the rank of a value in a group of rows based on the ORDER BY expression. Unlike RANK(), DENSE_RANK does not leave gaps in ranks for similar values, assigning the same rank to rows with the same values.
What is the starting rank for each partition in DENSE_RANK?
What SQL command is used to calculate DENSE_RANK?
SELECT DENSE_RANK() OVER (ORDER BY salary DESC).
What SQL clause is used with DENSE_RANK to specify the order?
The OVER clause.
In the provided SQL example, what is being ranked?
The salary of workers.
What is the output column name for the NTILE() function in the SQL example?
BUCKETS.
In the provided SQL example, how many buckets is the dataset partitioned into?
3 buckets.
What SQL clause is used with NTILE() to specify how to partition the data?
PARTITION BY.
What must the expression value in NTILE() result in?
A positive integer value for each partition.
What does the SQL NTILE() function do?
Partitions a logically ordered dataset into a number of buckets.
What is the purpose of the ORDER BY clause in the NTILE() function?
To determine the order of rows within each partition.
How are the buckets numbered in the NTILE() function?
From 1 through the expression value.
How is the value returned by CUME_DIST() calculated?
N/total rows, where N is the number of rows with the value less than or equal to the current row value.
What does the ORDER BY clause do in the CUME_DIST() function?
It orders the rows by SALARY within each partition.
In the SQL example, what is the purpose of the PARTITION BY clause?
To partition the data by DEPTNAME.