Fundamentals of Big Data - Ch7 : Data Ingestion in Hadoop - Sqoop

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/10

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

11 Terms

1
New cards

What is Sqoop used for?

Sqoop is used to import data from RDBMS to HDFS and export from Hadoop back to RDBMS.

2
New cards

What kind of interface does Sqoop use?

Command-line interface (CLI).

3
New cards

Which protocol does Sqoop use to connect to databases?

JDBC (Java Database Connectivity).

4
New cards

Name some common data sources for Hadoop ingestion.

Sensors, telecom (CDR), healthcare, machine data, social media, aerospace data.

5
New cards

What are the challenges in data ingestion?

Multiple sources, streaming/real-time ingestion, scalability, parallel processing, and data quality.

6
New cards

What is the role of connectors in Sqoop?

Connectors help extract metadata and optimize data transfer.

7
New cards

How does Sqoop 1 differ from Sqoop 2?

Sqoop 1 uses CLI and Map-only jobs; Sqoop 2 offers REST API, centralized management, and better integration with other Hadoop tools.

8
New cards

What kind of job does Sqoop generate to transfer data?

A Map-only MapReduce job.

9
New cards

By default, how many mappers does Sqoop use and what does each do?

4 mappers, each handling 25% of the data.

10
New cards

How can you update existing records during export?

Add --update-key id to the export command.

11
New cards

What does the --update-mode allowinsertoption do?

Allows upsert: update if exists, insert if not.