1/10
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
What is Sqoop used for?
Sqoop is used to import data from RDBMS to HDFS and export from Hadoop back to RDBMS.
What kind of interface does Sqoop use?
Command-line interface (CLI).
Which protocol does Sqoop use to connect to databases?
JDBC (Java Database Connectivity).
Name some common data sources for Hadoop ingestion.
Sensors, telecom (CDR), healthcare, machine data, social media, aerospace data.
What are the challenges in data ingestion?
Multiple sources, streaming/real-time ingestion, scalability, parallel processing, and data quality.
What is the role of connectors in Sqoop?
Connectors help extract metadata and optimize data transfer.
How does Sqoop 1 differ from Sqoop 2?
Sqoop 1 uses CLI and Map-only jobs; Sqoop 2 offers REST API, centralized management, and better integration with other Hadoop tools.
What kind of job does Sqoop generate to transfer data?
A Map-only MapReduce job.
By default, how many mappers does Sqoop use and what does each do?
4 mappers, each handling 25% of the data.
How can you update existing records during export?
Add --update-key id
to the export command.
What does the --update-mode allowinsert
option do?
Allows upsert: update if exists, insert if not.