Flashcards for TV Recommendation System Project

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/76

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

77 Terms

1

New cards

Q1: What libraries are imported at the start of the project?

2

New cards

A1: pandas, numpy, matplotlib.pyplot, TfidfVectorizer, cosine_similarity, linear_kernel.

3

New cards

4

New cards

5

New cards

6

New cards

Q2: Why is pd.set_option("display.max_colwidth", 200) used?

7

New cards

A2: To show full text (no truncation) when displaying columns like overviews or taglines.

8

New cards

9

New cards

10

New cards

11

New cards

Q3: How is the dataset loaded efficiently?

12

New cards

A3: By selecting only needed columns (name, overview, tagline, up to 8 genres, and numeric stats).

13

New cards

14

New cards

15

New cards

16

New cards

Q4: What does fillna("") do during preprocessing?

17

New cards

A4: Replaces missing values with empty strings for safe concatenation.

18

New cards

19

New cards

20

New cards

21

New cards

Q5: How are genres combined into one field?

22

New cards

A5: By joining all genres[i].name columns into a single string called genres_combined.

23

New cards

24

New cards

25

New cards

26

New cards

Q6: What is the purpose of the content column?

27

New cards

A6: It concatenates overview + tagline + genres into one text field for text analysis.

28

New cards

29

New cards

30

New cards

31

New cards

Q7: How is genre distribution analyzed?

32

New cards

A7: By tokenizing genres_combined, counting occurrences, and plotting the top 20 genres.

33

New cards

34

New cards

35

New cards

36

New cards

Q8: Which numeric features are used for correlation analysis?

37

New cards

A8: vote_average, vote_count, popularity, number_of_episodes, number_of_seasons.

38

New cards

39

New cards

40

New cards

41

New cards

Q9: How is the correlation heatmap created?

42

New cards

A9: By converting numeric columns to numbers, computing .corr(), and visualizing with a heatmap.

43

New cards

44

New cards

45

New cards

46

New cards

Q10: What does TfidfVectorizer(stop_words="english") do?

47

New cards

A10: Converts text into numerical vectors based on word importance, removing common stop words.

48

New cards

49

New cards

50

New cards

51

New cards

Q11: What is stored in tfidf_matrix?

52

New cards

A11: The TF-IDF representation of all shows’ content text.

53

New cards

54

New cards

55

New cards

56

New cards

Q12: Why is an indices mapping created?

57

New cards

A12: To quickly map show names to their row index in the dataframe.

58

New cards

59

New cards

60

New cards

61

New cards

Q13: How does the basic recommend function work?

62

New cards

A13:

63

New cards

64

New cards

Look up show index.

65

New cards

Compute cosine similarity with all shows.

66

New cards

Sort and select top N results (excluding itself).

67

New cards

68

New cards

69

New cards

70

New cards

Q14: What improvement is added in the second recommend function?

71

New cards

A14: A fuzzy search fallback that suggests close matches if the title is not found.

72

New cards

73

New cards

74

New cards

75

New cards

Q15: How are recommendations visualized?

76

New cards

A15: With a horizontal bar chart showing the cosine similarity of the top similar shows.

77

New cards