FDS: CAT1

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/247

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

248 Terms

1
New cards

What types of data are created during day-to-day activities?

Web data, e-commerce, financial transactions, online trading, social network data, and vehicle data.

2
New cards

What type of data is primarily found on the web?

Text data.

3
New cards

What is streaming data?

Data that is continuously generated and processed in real-time.

4
New cards

What is semi-structured data?

Data that does not conform to a rigid structure, such as XML.

5
New cards

What is an example of data used in social networks?

Graph data, including Semantic Web data like RDF.

6
New cards

What is relational data?

Data organized in tables, often used for transactions and legacy systems.

7
New cards

What type of data represents relationships and connections?

Graph data.

8
New cards

In which industries is data science utilized?

Travelling, Business, Healthcare, Automobile technology, Travel plan, Airline industry, Logistics industry.

9
New cards

What mathematical concept is important in data science?

Mathematical Modeling.

10
New cards

What is data science?

An interdisciplinary field of scientific methods, processes, algorithms, and systems to extract knowledge or insights from data.

11
New cards

Which statistical methods are used in data science?

Statistical and Stochastic modeling, Probability.

12
New cards

Which field of study in data science involves pattern recognition and visualization?

Computer Science.

13
New cards

What is one of the main goals of data science?

To manage, manipulate, extract, and interpret knowledge from large amounts of data.

14
New cards

What types of data does data science work with?

Both structured and unstructured data.

15
New cards

What does data cleaning achieve in the data science process?

Transforms raw data into usable data.

16
New cards

What is involved in the data collection step of the data science process?

Gathering the right set of high quality, targeted data.

17
New cards

What is the first step in the data science process?

Framing and understanding the problem.

18
New cards

What is the goal of model building in the data science process?

To use the collected data to make predictions.

19
New cards

What is the purpose of Exploratory Data Analysis in the data science process?

To uncover valuable insights about the data.

20
New cards

What does model deployment involve in the data science process?

Deploying the model in a real-time production environment.

21
New cards

Why is communicating results important in the data science process?

To communicate key findings to the stakeholders.

22
New cards

What type of skills are required for Business Intelligence?

Basic statistics, business knowledge, data transformation, and visualization skills.

23
New cards

What does Data Science focus on?

Extracting information from datasets and creating forecasts.

24
New cards

What is the main objective of Business Intelligence?

To identify historical trends and answer questions about what happened during the last period.

25
New cards

How does Data Science manage data?

Designed to manage a large volume of dynamic and less structured data.

26
New cards

What skills are necessary for Data Science?

More technical skills like coding, data mining, advanced statistics, and domain knowledge.

27
New cards

What makes Data Science more complex than Business Intelligence?

Its capacity for forecasting, ability to manage dynamic data, and requirements for more advanced skills.

28
New cards

How is data collection and management approached in Business Intelligence?

Designed to manage well-organized data.

29
New cards

What process is used for data integration in Data Science?

ELT (Extract-Load-Transform).

30
New cards

What storage method is used in Data Science?

Real-time clusters.

31
New cards

What method does Data Science utilize?

The scientific method.

32
New cards

Which has higher complexity, Data Science or Business Intelligence?

Data Science.

33
New cards

What does Business Intelligence primarily focus on?

The past and present.

34
New cards

What field uses mathematics and statistics to discover hidden patterns in data?

Data Science.

35
New cards

What tools are commonly used in Data Science?

SAS, BigML, MATLAB, Excel, etc.

36
New cards

What questions does Data Science deal with?

What will happen and what if.

37
New cards

What technologies are available for handling large data sets in Data Science?

Technologies such as Hadoop.

38
New cards

How flexible is Data Science in terms of data sources?

Much more flexible as data sources can be added as per requirement.

39
New cards

What do data scientists build using analysis?

Mathematical models or machine learning models.

40
New cards

What is the primary role of a data scientist?

To gather data, process it, manipulate it as per requirements, and feed it for analysis.

41
New cards

What statistical concepts are important in data science?

Basic statistics, probability distribution, dimension reduction.

42
New cards

Which tools are used for data wrangling and management?

Oracle, MongoDB, Hadoop.

43
New cards

What programming languages are required for data science?

Python, R, SAS, SQL, MATLAB.

44
New cards

What tools are commonly used for data visualization?

Tableau, Excel.

45
New cards

What are some machine learning techniques used in data science?

Linear regression, decision tree.

46
New cards

What is a key advantage of SQL regarding data access?

It can directly access a large amount of data without copying it to other applications.

47
New cards

What tasks are easier to perform in SQL compared to Excel?

Joining tables, automating, and reusing code.

48
New cards

How does data analysis in SQL compare to Excel or CSV files?

Data analysis done in SQL is easy to audit and replicate.

49
New cards

What is a limitation of Excel when dealing with databases?

Excel is not useful for large-sized databases.

50
New cards

What types of data can SQL handle?

Data of almost any shape and massive size.

51
New cards

What does the LOWER() and UPPER() function do in SQL?

LOWER() converts text data into lower case, while UPPER() converts text data into upper case.

52
New cards

What is data munging?

The phase of data transformation that simplifies data for better understanding.

53
New cards

What do the TRIM, LTRIM, and RTRIM operations do in SQL?

TRIM removes leading and trailing blank spaces, LTRIM removes leading spaces, and RTRIM removes trailing spaces from a given input string.

54
New cards

What does the REPLACE function do in SQL?

Replaces all occurrences of a source substring with a target substring in a given string.

55
New cards

What does the SUBSTR function do in SQL?

Returns a substring of a given string from a specified position.

56
New cards

What SQL operations are used to aggregate data?

MAX finds the maximum value, MIN finds the minimum value, AVG calculates the average, SUM calculates the total sum, and COUNT returns the number of records or values.

57
New cards

What does an inner join do in SQL?

Selects all rows from both tables as long as there is a match between the columns.

58
New cards

What does a full outer join return in SQL?

Records that have a match in either the left table or the right table.

59
New cards

What does a left outer join return in SQL?

All the rows of the left side table and matching rows in the right side table.

60
New cards

What does a right outer join return in SQL?

All the rows of the right side table and matching rows for the left side table.

61
New cards

What is the purpose of filtering in data science?

To remove redundant and useless data, making the dataset more efficient and useful.

62
New cards

Why is filtering important during data analysis?

It allows for the retrieval of a specific part of the actual data needed for analysis.

63
New cards

What types of records may exist in a dataset that require filtering?

Redundant records or impartial records.

64
New cards

What can happen if redundant records are not removed from a dataset?

It may result in wrong analysis.

65
New cards

What does the data filtering process consist of?

Different strategies for refining and reducing datasets.

66
New cards

How can query performance be enhanced?

By applying it to refined data.

67
New cards

What is one benefit of data filtering?

It can reduce strain on applications.

68
New cards

What is the purpose of the LIKE clause and its wildcards?

The LIKE clause is used to specify a pattern matching condition. It uses two wildcards: the percent sign '%' represents any string of zero or more characters, and the underscore '_' represents a single number or character.

69
New cards

What does the term 'window' refer to in SQL window functions?

A set of rows.

70
New cards

In which SQL clauses can window functions be called?

With the SELECT statement or the ORDER BY clause.

71
New cards

What do SQL window functions calculate their result based on?

A set of rows rather than on a single row.

72
New cards

How do SQL window functions differ from aggregate functions?

Window functions generate results with attributes of an individual row along with the results of the window function, while aggregate functions group the result set based on one or more columns.

73
New cards

Can window functions be called in the WHERE clause?

No, they cannot be called in the WHERE clause.

74
New cards

What is the purpose of the PARTITION BY clause in the OVER() clause?

To define window partitions and form groups of rows for window functions.

75
New cards

What is the role of the ORDER BY clause in SQL window functions?

It provides logical sorting of rows within a partition.

76
New cards

Which SQL clause defines the window for window functions?

The OVER() clause.

77
New cards

Can multiple window functions be included in a single query?

Yes.

78
New cards

What is the RANK() function and its usage?

The RANK() function assigns a rank to each row within a partition of a result set. It can rank salaries within departments using the PARTITION BY clause, or rank all salaries across the entire dataset without it.

79
New cards

How do you rank salaries in descending order using SQL?

By using ORDER BY SALARY DESC in the RANK() function.

80
New cards

What SQL function is used to rank salaries within departments?

RANK()

81
New cards

In the SQL query, what is used to partition the data for PERCENT_RANK()?

deptname.

82
New cards

What is the order used in the PERCENT_RANK() function in the provided SQL query?

By salary in descending order.

83
New cards

What does the PERCENT_RANK() function calculate?

The SQL percentile rank of each row.

84
New cards

What does the SQL query return along with the PERCENT_RANK()?

deptname, deptid, salary, ename, eid.

85
New cards

What is the range of the percentile ranking number produced by PERCENT_RANK()?

From zero to one.

86
New cards

What is the DENSE_RANK function and its differences from RANK()?

The DENSE_RANK function calculates the rank of a value in a group of rows based on the ORDER BY expression. Unlike RANK(), DENSE_RANK does not leave gaps in ranks for similar values, assigning the same rank to rows with the same values.

87
New cards

What is the starting rank for each partition in DENSE_RANK?

88
New cards

What SQL command is used to calculate DENSE_RANK?

SELECT DENSE_RANK() OVER (ORDER BY salary DESC).

89
New cards

What SQL clause is used with DENSE_RANK to specify the order?

The OVER clause.

90
New cards

In the provided SQL example, what is being ranked?

The salary of workers.

91
New cards

What is the output column name for the NTILE() function in the SQL example?

BUCKETS.

92
New cards

In the provided SQL example, how many buckets is the dataset partitioned into?

3 buckets.

93
New cards

What SQL clause is used with NTILE() to specify how to partition the data?

PARTITION BY.

94
New cards

What must the expression value in NTILE() result in?

A positive integer value for each partition.

95
New cards

What does the SQL NTILE() function do?

Partitions a logically ordered dataset into a number of buckets.

96
New cards

What is the purpose of the ORDER BY clause in the NTILE() function?

To determine the order of rows within each partition.

97
New cards

How are the buckets numbered in the NTILE() function?

From 1 through the expression value.

98
New cards

How is the value returned by CUME_DIST() calculated?

N/total rows, where N is the number of rows with the value less than or equal to the current row value.

99
New cards

What does the ORDER BY clause do in the CUME_DIST() function?

It orders the rows by SALARY within each partition.

100
New cards

In the SQL example, what is the purpose of the PARTITION BY clause?

To partition the data by DEPTNAME.