Flashcards for TV Recommendation System Project

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/76

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

77 Terms

1
New cards

Q1: What libraries are imported at the start of the project?

2
New cards

A1: pandas, numpy, matplotlib.pyplot, TfidfVectorizer, cosine_similarity, linear_kernel.

3
New cards
4
New cards

5
New cards
6
New cards

Q2: Why is pd.set_option("display.max_colwidth", 200) used?

7
New cards

A2: To show full text (no truncation) when displaying columns like overviews or taglines.

8
New cards
9
New cards

10
New cards
11
New cards

Q3: How is the dataset loaded efficiently?

12
New cards

A3: By selecting only needed columns (name, overview, tagline, up to 8 genres, and numeric stats).

13
New cards
14
New cards

15
New cards
16
New cards

Q4: What does fillna("") do during preprocessing?

17
New cards

A4: Replaces missing values with empty strings for safe concatenation.

18
New cards
19
New cards

20
New cards
21
New cards

Q5: How are genres combined into one field?

22
New cards

A5: By joining all genres[i].name columns into a single string called genres_combined.

23
New cards
24
New cards

25
New cards
26
New cards

Q6: What is the purpose of the content column?

27
New cards

A6: It concatenates overview + tagline + genres into one text field for text analysis.

28
New cards
29
New cards

30
New cards
31
New cards

Q7: How is genre distribution analyzed?

32
New cards

A7: By tokenizing genres_combined, counting occurrences, and plotting the top 20 genres.

33
New cards
34
New cards

35
New cards
36
New cards

Q8: Which numeric features are used for correlation analysis?

37
New cards

A8: vote_average, vote_count, popularity, number_of_episodes, number_of_seasons.

38
New cards
39
New cards

40
New cards
41
New cards

Q9: How is the correlation heatmap created?

42
New cards

A9: By converting numeric columns to numbers, computing .corr(), and visualizing with a heatmap.

43
New cards
44
New cards

45
New cards
46
New cards

Q10: What does TfidfVectorizer(stop_words="english") do?

47
New cards

A10: Converts text into numerical vectors based on word importance, removing common stop words.

48
New cards
49
New cards

50
New cards
51
New cards

Q11: What is stored in tfidf_matrix?

52
New cards

A11: The TF-IDF representation of all shows’ content text.

53
New cards
54
New cards

55
New cards
56
New cards

Q12: Why is an indices mapping created?

57
New cards

A12: To quickly map show names to their row index in the dataframe.

58
New cards
59
New cards

60
New cards
61
New cards

Q13: How does the basic recommend function work?

62
New cards

A13:

63
New cards
64
New cards
  1. Look up show index.
65
New cards
  1. Compute cosine similarity with all shows.
66
New cards
  1. Sort and select top N results (excluding itself).
67
New cards
68
New cards

69
New cards
70
New cards

Q14: What improvement is added in the second recommend function?

71
New cards

A14: A fuzzy search fallback that suggests close matches if the title is not found.

72
New cards
73
New cards

74
New cards
75
New cards

Q15: How are recommendations visualized?

76
New cards

A15: With a horizontal bar chart showing the cosine similarity of the top similar shows.

77
New cards