p6

Page 1: Loading Data into Hive

  • Loading Data: Use the following command to load data into the Hive table named EXT_STUDENT from a file named student.tsv:

    • LOAD DATA LOCAL INPATH '/root/hivedemos/student.tsv' OVERWRITE INTO TABLE EXT_STUDENT;
  • Retrieving Data: After loading the data, use the following SQL command to retrieve student details from the EXT_STUDENT table:

    • SELECT * from EXT_STUDENT;

Page 2: Understanding SerDe in Hive

  • What is SerDe?

    • SerDe stands for Serializer and Deserializer.
    • It contains the logic for converting unstructured data into structured records.
    • Implemented using Java.
  • Roles of SerDe:

    • Serializer:
    • Operates at the time of writing data into the store.
    • Deserializer:
    • Used during query time, especially with SELECT statements.
  • Data Processing Flow in Hive:

    • Hive utilizes SerDe along with FileFormat for reading and writing table rows.
    • File Handling Steps:
    • HDFS files -> InputFileFormat -> <key, value> -> Deserializer -> Row object
    • Row object -> Serializer -> <key, value> -> OutputFileFormat -> HDFS files