p6
Page 1: Loading Data into Hive
Loading Data: Use the following command to load data into the Hive table named
EXT_STUDENTfrom a file namedstudent.tsv:LOAD DATA LOCAL INPATH '/root/hivedemos/student.tsv' OVERWRITE INTO TABLE EXT_STUDENT;
Retrieving Data: After loading the data, use the following SQL command to retrieve student details from the
EXT_STUDENTtable:SELECT * from EXT_STUDENT;
Page 2: Understanding SerDe in Hive
What is SerDe?
- SerDe stands for Serializer and Deserializer.
- It contains the logic for converting unstructured data into structured records.
- Implemented using Java.
Roles of SerDe:
- Serializer:
- Operates at the time of writing data into the store.
- Deserializer:
- Used during query time, especially with SELECT statements.
Data Processing Flow in Hive:
- Hive utilizes SerDe along with FileFormat for reading and writing table rows.
- File Handling Steps:
- HDFS files -> InputFileFormat ->
<key, value>-> Deserializer -> Row object - Row object -> Serializer ->
<key, value>-> OutputFileFormat -> HDFS files