PIG and HIVE: Large-Scale Data Management
Hadoop Ecosystem
- Overview of the Hadoop ecosystem and its components.
- Components covered include:
- Data Storage:
- Hadoop Distributed File System (HDFS)
- HBase
- Avro
- Data Processing:
- MapReduce Framework
- Pig
- Hive
- Chukwa
- Data Access
- Data Connections
- Management
- Business Intelligence and Analytic Layer
- Query
- Reporting
- Data Mining
- Predictive Analytics
- Data Warehouse
- ZooKeeper
Query Languages for Hadoop
- Java: Hadoop’s native language.
- Pig: Query and workflow language.
- Hive: SQL-based language.
- HBase: Column-oriented database for MapReduce.
Java as Hadoop's Native Language
- Hadoop is written in Java, providing Java APIs.
- These APIs are used for:
- Mappers
- Reducers
- Combiners
- Partitioners
- Input and output formats
- Languages like Pig or Hive convert their queries to Java MapReduce code.
Levels of Abstraction in Hadoop
- Abstraction levels from less Hadoop visible to more Hadoop visible:
- HBase Queries against tables (More DB view).
- Hive SQL-Like language.
- Pig Query and workflow language.
- Java Write map-reduce functions (More map-reduce view).
Java Example: WordCount
- Example code illustrating a basic MapReduce program in Java for word counting.
- Key components:
- Map Class: Extends
MapReduceBase and implements Mapper<LongWritable, Text, Text, IntWritable>.- Maps input key-value pairs to output key-value pairs.
- Uses
StringTokenizer to split lines into words. - Collects words with a count of one.
- Reduce Class: Extends
MapReduceBase and implements Reducer<Text, IntWritable, Text, IntWritable>.- Reduces values for the same key.
- Sums the counts for each word.
- Main Method: Configures and runs the MapReduce job.
- Sets job name, output key/value classes, mapper/reducer classes, input/output formats.
- Specifies input and output paths.
Apache Pig
- A platform for analyzing large data sets.
- Consists of a high-level language for expressing data analysis programs.
- Compiles down to MapReduce jobs.
- Developed by Yahoo! and is an open-source language.
Big Picture of Pig
- Pig script (Pig Latin) is compiled and optimized.
- User-Defined Functions (UDFs) can be integrated.
- Compilation results in MapReduce statements executed on Hadoop.
- Reads data from Hadoop Distributed File System (HDFS) and writes results back to HDFS.
Features of Pig
- Rich set of operators: Provides operators for join, sort, filter, etc.
- Ease of programming: Pig Latin is similar to SQL.
- Optimization opportunities: Tasks optimize execution automatically.
- Extensibility: Users can develop their own functions (UDFs).
- Handles all kinds of data: Analyzes structured and unstructured data, storing results in HDFS.
Apache Pig vs MapReduce
| Feature | Apache Pig | MapReduce |
|---|
| Language Level | High-level data flow language | Low-level data processing paradigm |
| Programming | Simple, SQL-like | Requires Java knowledge |
| Join Operation | Pretty simple | Quite difficult |
| Code Length | Shorter, multi-query approach | Longer, requires more lines of code |
| Compilation | No compilation needed | Long compilation process |
| Internal Conversion | Converts operators into MapReduce jobs | N/A |
Apache Pig vs SQL
| Feature | Pig Latin | SQL |
|---|
| Language Type | Procedural | Declarative |
| Schema Requirement | Optional | Mandatory |
| Data Model | Nested relational | Flat relational |
| Query Optimization | Limited opportunity | More opportunity |
| Pipeline Splits | Allows | N/A |
| Data Storage | Allows storage anywhere | N/A |
| Execution Plans | Declares execution plans | N/A |
| ETL Functions | Provides ETL operators | N/A |
Data Model in Apache Pig
- Based on nested relational data model.
- Data is represented as a set of tuples, where each tuple can contain one or more fields.
- Fields can be of different data types (int, long, float, double, chararray, bytearray, etc.).
- Supports nested data structures:
- Bags (collection of tuples)
- Maps (key-value pair)
- Tuples within tuples
- Facilitates complex data modeling and processing.
Apache Pig vs Hive
| Feature | Apache Pig | Hive |
|---|
| Language | Pig Latin | HiveQL |
| Origin | Yahoo | Facebook |
| Language Type | Data flow language | Query processing language |
| Paradigm | Procedural, pipeline-oriented | Declarative |
| Data Handling | Structured, unstructured, semi-structured | Mostly structured data |
Applications of Apache Pig
- Data processing: Processes large datasets stored in HDFS.
- ETL workflows: Extracts, transforms, and loads data from various sources.
- Ad-hoc analysis: Flexible language for performing analysis on large datasets without complex MapReduce programs.
- Machine learning: Used for tasks like clustering, classification, and regression.
- Data integration: Integrates data from different sources into Hadoop for analysis.
Pig Compilation
- Compilation process involves multiple stages:
- Query Parser: Parses Pig Latin programs.
- Logical Plan: Generates after semantic checking.
- Logical Optimizer: Optimizes the logical plan.
- Logical to Physical Translator: Translates to physical plan.
- MapReduce Plan: Translates to MapReduce plan.
- MapReduce Launcher: Creates a job jar to be submitted to the Hadoop cluster.
Logical Plan
- Abstract representation of data processing operations.
- Consists of a series of operators (filtering, grouping, joining).
- Generated by the Pig compiler after parsing and syntax error correction.
- Serves as an intermediate representation for optimization.
- Language-independent, allowing integration of multiple data processing languages.
- Example:
A = LOAD 'file1' AS (x, y, z);B = LOAD 'file2' AS (t, u, v);C = FILTER A by y > 0;D = JOIN C BY x, B BY u;E = GROUP D BY z;F = FOREACH E GENERATE group, COUNT(D);STORE F INTO 'output';
Physical Plan
- Detailed representation of data processing steps to execute a Pig Latin script on Hadoop.
- Specifies MapReduce jobs or Apache Tez DAGs.
- Generated by the Pig compiler after the logical plan is optimized.
- Each operator in the physical plan corresponds to a specific task in the MapReduce or Tez job.
- Generated automatically by the Pig compiler, with optimizations.
- 1:1 correspondence with the logical plan except for Join, Distinct, (Co)Group, Order.
Generation of Physical Plans
- If the Join and Group By are on the same key, the two map-reduce jobs can be merged into one.
Java vs. Pig
- Performance is comparable, with Java being slightly better.
Data Model
- Atom: Simple atomic value (number or string).
- Tuple: Sequence of fields (each field can be any type).
- Bag: Collection of tuples. Duplicates are possible. Tuples can have different field lengths and types.
- Map: Collection of key-value pairs; key is an atom, value can be any type.
Data Model Examples
- Tuple:
(('lakers', 1) - Map:
['age'>20] - Bag:
alice', [('lakers', 1)) ('iPod', 2)
Control over Dataflow
- Example of less efficient dataflow:
spam_urls = FILTER urls BY isSpam(url);culprit_urls = FILTER spam_urls BY pagerank > 0.8;
- More efficient dataflow:
highpgr_urls = FILTER urls BY pagerank > 0.8;spam_urls = FILTER highpgr_urls BY isSpam(url);
Data Model Features
- Fully nested.
- More natural for procedural programmers.
- Data is often stored on disk in a nested fashion.
- Facilitates ease of writing user-defined functions.
- No schema required.
User-Defined Functions (UDFs)
- Useful for custom processing tasks.
- Can be used in many Pig Latin statements.
- Can use non-atomic values for input and output.
- Currently must be written in Java.
Speaking Pig Latin: LOAD
- Input is assumed to be a bag (sequence of tuples).
- Can specify a deserializer with "USING".
- Can provide a schema with "AS".
- Syntax:
LOAD newBag = LOAD ‘filename’ <USING functionName() > <AS (fieldName1, fieldName2,…)>;
- Example:
Queries = LOAD ‘query_log.txt’ USING myLoad() AS (userID,queryString, timeStamp)
Speaking Pig Latin: FOREACH
- Applies some processing to each tuple in a bag.
- Each field can be:
- A fieldname of the bag
- A constant
- A simple expression (ie:
f1+f2) - A predefined function (ie:
SUM, AVG, COUNT, FLATTEN) - A UDF (ie:
sumTaxes(gst, pst))
- Syntax:
newBag = FOREACH bagName GENERATE field1, field2, …;
Speaking Pig Latin: FILTER
- Selects a subset of the tuples in a bag.
- Syntax:
newBag = FILTER bagName BY expression;
- Expression uses comparison operators (
==, !=, <, >, etc.) and logical connectors (AND, NOT, OR). - Example:
some_apples = FILTER apples BY colour != ‘red’;
- Can use UDFs:
some_apples = FILTER apples BY NOT isRed(colour);
Speaking Pig Latin: COGROUP
- Groups two datasets together by a common attribute.
- Groups data into nested bags.
- Syntax:
grouped_data = COGROUP results BY queryString, revenue BY queryString;
Speaking Pig Latin: Why COGROUP and not JOIN?
- May want to process nested bags of tuples before taking the cross product.
- Keeps to the goal of a single high-level data transformation per pig-latin statement.
- However, JOIN keyword is still available:
JOIN results BY queryString, revenue BY queryString;
- Equivalent:
temp = COGROUP results BY queryString, revenue BY queryString;join_result = FOREACH temp GENERATE FLATTEN(results), FLATTEN(revenue);
Speaking Pig Latin : STORE (& DUMP)
- Output data to a file (or screen).
- Syntax:
STORE bagName INTO ‘filename’ <USING deserializer ()>;
- Other Commands (incomplete):
- UNION - return the union of two or more bags
- CROSS - take the cross product of two or more bags
- ORDER - order tuples by a specified field(s)
- DISTINCT - eliminate duplicate tuples in a bag
- LIMIT - Limit results to a subset
Compilation
- Pig system does two tasks:
- Builds a Logical Plan from a Pig Latin script.
- Supports execution platform independence.
- No processing of data performed at this stage.
- Compiles the Logical Plan to a Physical Plan and Executes.
- Convert the Logical Plan into a series of Map-Reduce statements to be executed (in this case) by Hadoop Map-Reduce.
Compilation: Building a Logical Plan
- Verify input files and bags referred to are valid.
- Create a logical plan for each bag(variable) defined.
Compilation Example
- Example Pig Latin script:
A = LOAD ‘user.dat’ AS (name, age, city);B = GROUP A BY city;C = FOREACH B GENERATE group AS city, COUNT(A);D = FILTER C BY city IS ‘kitchener’ OR city IS ‘waterloo’;STORE D INTO ‘local_user_count.dat’;
- The logical plan would include Load, Group, Foreach, Filter, and Store operations.
Compilation: Building a Physical Plan
- Only happens when output is specified by STORE or DUMP.
- Step 1: Create a map-reduce job for each COGROUP.
- Step 2: Push other commands into the map and reduce functions where possible.
- Certain commands may require their own map-reduce job (ie: ORDER).
Compilation: Efficiency in Execution
- Parallelism:
- Loading data - Files are loaded from HDFS.
- Statements are compiled into map-reduce jobs.
Compilation: Efficiency with Nested Bags
- Nested bags created in each tuple of a COGROUP statement may not need to physically materialize.
- Aggregation after a COGROUP, statements for said aggregation are pushed into the reduce function.
- Applies to algebraic functions (ie:
COUNT, MAX, MIN, SUM, AVG).
Compilation: Efficiency with Nested Bags Example
- Why this works: COUNT is an algebraic function; it can be structured as a tree of sub-functions with each leaf working on a subset of the data.
- Pig provides an interface for writing algebraic UDFs so they can take advantage of this optimization as well.
- Inefficiencies:
- Non-algebraic aggregate functions (ie: MEDIAN) need entire bag to materialize; may cause a very large bag to spill to disk if it doesn't fit in memory.
- Every map-reduce job requires data be written and replicated to the HDFS (although this is offset by parallelism achieved).
High-Level Language Example
- Example:
raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, id, time, query);clean1 = FILTER raw BY id > 20 AND id < 100;clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.sanitze(query) as query;user_groups = GROUP clean2 BY (user, query);user_query_counts = FOREACH user_groups GENERATE group, COUNT(clean2), MIN(clean2.time), MAX(clean2.time);STORE user_query_counts INTO 'uq_counts.csv' USING PigStorage(',');
Pig Components
- High-level language (Pig Latin).
- Set of commands.
- Two Main Components.
- Two execution modes:
- Local: reads/write to local file system.
- Mapreduce: connects to Hadoop cluster and reads/writes to HDFS.
- Interactive mode: Console.
- Two modes:
- Batch mode.
- Submit a script.
Why Pig?…Abstraction!
- Common design patterns as key words (joins, distinct, counts).
- Data flow analysis.
- A script can map to multiple map-reduce jobs.
- Avoids Java-level errors (not everyone can write java code).
- Can be interactive mode.
- Issue commands and get results.
Example I: More Details
- The input format (text, tab delimited).
- Read file from HDFS example:
raw = LOAD 'excite.log' USING PigStorage('\t') AS (user, id, time, query);clean1 = FILTER raw BY id > 20 AND id < 100;clean2 = FOREACH clean1 GENERATE user, time, org.apache.pig.tutorial.sanitze(query) as query;user_groups = GROUP clean2 BY (user, query);user_query_counts = FOREACH user_groups GENERATE group, COUNT(clean2), MIN(clean2.time), MAX(clean2.time);STORE user_query_counts INTO 'uq_counts.csv' USING PigStorage(',');
- Grouping of records.
- Compute aggregation for each group.
- Text, Comma delimited.
- Store the output in a file.
- Define run-time schema.
- Filter the rows on predicate.
- For each row, do some transformation.
Pig: Language Features
- Keywords: Load, Filter, Foreach Generate, Group By, Store, Join, Distinct, Order By, …
- Aggregations: Count, Avg, Sum, Max, Min.
- Schema: Defines at query-time not when files are loaded.
- UDFs: Packages for common input/output formats.
Example 2
- Define types of the columns.
- Data are “ctrl-A” delimited.
- Example:
A = load '$widerow' using PigStorage('\u0001') as (name: chararray, c0: int, c1: int, c2: int);B = group A by name parallel 10;C = foreach B generate group, SUM(A.c0) as c0, SUM(A.c1) as c1, AVG(A.c2) as c2;D = filter C by c0 > 100 and c1 > 100 and c2 > 100;store D into '$out';
- Script can take argument.
- Specify the need of 10 reduce task.
Example 3: Re-partition Join
- Register UDFs & custom inputformats
- Function the jar file to read the input file
- Example:
register pigperf.jar;A = load ‘page_views' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, query_term, timestamp, estimated_revenue);B = foreach A generate user, (double) estimated_revenue;alpha = load ’users' using PigStorage('\u0001') as (name, phone, address, city, state, zip);beta = foreach alpha generate name, city;C = join beta by name, B by user parallel 40;D = group C by $0;E = foreach D generate group, SUM(C.estimated_revenue);store E into 'L3out';
- Load the second file
- This grouping can be done in the same map-reduce job because it is on the same key (Pig can do this optimization)
- Join the two datasets (40 reducer)
Example 4: Replicated Join
- register pigperf.jar;
- A = load ‘pageviews' using org.apache.pig.test.udf.storefunc.PigPerformanceLoader() as (user, action, timespent, queryterm, timestamp, estimated_revenue);
- Big = foreach A generate user, (double) estimated_revenue;
- alpha = load ’users' using PigStorage('\u0001') as (name, phone, address, city, state, zip);
- small = foreach alpha generate name, city;
- C = join Big by user, small by name using ‘replicated’;
- store C into ‘out';
- Optimization in joining a big dataset with a small one.
- Map-only join (the small dataset is the second).
Example 5: Multiple Outputs
- SPLIT the records into set.
- STORE multiple out put.
A = LOAD 'data' AS (f1:int,f2:int,f3:int);DUMP A;(1,2,3)(4,5,6)(7,8,9)SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);DUMP X;(1,2,3)(4,5,6)DUMP Y;(4,5,6)STORE x INTO 'x_out';STORE y INTO 'y_out';STORE z INTO 'z_out';
Run independent jobs in parallel
D1 = load 'data1' …D2 = load 'data2' …D3 = load 'data3' …C1 = join D1 by a, D2 by bC2 = join D1 by c, D3 by dC1 and C2 are two independent jobs that can run in parallel.
Pig Latin vs. SQL
- Pig Latin is procedural (dataflow programming model).
- Step-by-step query style is much cleaner and easier to write.
- SQL is declarative but not step-by-step style.
Pig Latin vs. SQL
- In Pig Latin Lazy evaluation (data not processed prior to STORE command).
- Data can be stored at any point during the pipeline.
- Schema and data types are lazily defined at run-time.
- An execution plan can be explicitly defined.
- Use optimizer hints Due to the lack of complex optimizers.
- In SQL: Query plans are solely decided by the system.
- Data cannot be stored in the middle.
- Schema and data types are defined at the creation time.
Pig References
- Pig Tutorial http://pig.apache.org/docs/r0.7.0/tutorial.html
- Pig Latin Reference Manual 2 http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html
- Pig Latin Reference Manual 2 http://pig.apache.org/docs/r0.7.0/piglatin_ref2.html
- PigMix Queries https://cwiki.apache.org/PIG/pigmix.html
Hive
- Hive is a data warehouse infrastructure tool to process structured data in Hadoop.
- It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
- Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive.
- It is used by different companies. For example, Amazon uses it in Amazon Elastic MapReduce.
- Hive is not:
- A relational database
- A design for OnLine Transaction Processing (OLTP)
- A language for real-time queries and row-level updates
Features of Hive
- It stores schema in a database and processed data into HDFS.
- It is designed for OLAP.
- It provides SQL type language for querying called HiveQL or HQL.
- It is familiar, fast, scalable, and extensible.
Apache Hive
- A data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis
- Hive Provides ETL
- Structure Access to different storage (HDFS or HBase)
- Query execution via MapReduce
- Key Building Principles
- SQL is a familiar language
- Extensibility – Types, Functions, Formats, Scripts
- Performance
Architecture of Hive
- The following component diagram depicts the architecture of Hive:
- USER
- INTERFACES
- WEB UI
- HIVE COMMAND
- LINE
- HD Insight
- I
- Meta Store
- Hive QL Process Engine
- HDFS or HBASE Data Storage
- Execution Engine
- MAP REDUCE:
Hive Component Diagram
- The component diagram contains different units.
- The following table describes each unit:
- Unit Name
Operation:
- User Interface: Hive is a data warehouse infrastructure software that can create interaction between user and HDFS. The user interfaces that Hive supports are Hive Web UI, Hive command line, and Hive HD Insight (In Windows server) types, and HDFS mapping.
- Meta Store: Hive chooses respective database servers to store the schema or Metadata of tables, databases, columns in a table, their data types, and HDFS mapping.
- HiveQL Process Engine: HiveQL is similar to SQL for querying on schema info on the Metastore. It is one of the replacements of traditional approach for MapReduce program. Instead of writing MapReduce program in Java we can write a query for MapReduce job and process it.
- Execution Engine: The conjunction part of HiveQL process Engine and MapReduce is Hive Execution Engine. Execution engine processes the query and generates results as same as MapReduce results. It uses the flavor of MapReduce.
- HDFS or HBASE: Hadoop distributed file system or HBASE are the data storage techniques to store data into file system.
Hive Components
- High-level language (HiveQL).
- Set of commands.
- Two Main Components.
- Two execution modes:
- Local: reads/write to local file system.
- Mapreduce: connects to Hadoop cluster and reads/writes to HDFS.
- Interactive mode: Console.
- Two modes:
- Batch mode.
- Submit a script.
Working of Hive
- The following diagram depicts the workflow between Hive and Hadoop.
- Hive
- Hadoop
- Execution
- Engine
- 17
- Job Tracker
- Mapreduce
- 8
- N
- 6
- ZTERFACE
- Driver
- 10
- 7.1
- Map
- Task Tracker
- Reduce
- HDFS
- 5
- 3
- Compiler
- Meta Store
- Name
- Node
- nata Node
Hive and Hadoop Framework Interaction
- The following table defines how Hive interacts with Hadoop framework:
- Step No. Operation
- 1 Execute Query: The Hive interface such as Command Line or Web UI sends query to Driver (any database driver such as JDBC, ODBC, etc.) to execute.
- 2 Get Plan: The driver takes the help of query compiler that parses the query to check the syntax and query plan or the requirement of query.
- 3 Get Metadata: The compiler sends metadata request to Metastore (any database).
- 4 Send Metadata: Metastore sends metadata as a response to the compiler.
- 5 Send Plan The compiler checks the requirement and resends the plan to the driver.
- Up to here, the parsing and compiling of a query is complete.
- 6 Execute Plan: The driver sends the execute plan to the execution engine.
- 7 7.1 Metadata Ops: Meanwhile in execution, the execution engine can execute Execute JobInternally, the process of execution job is a MapReduce job. The execution engine sends the job to JobTracker, which is in Name node and it assigns this job to TaskTracker, which is in Data node. Here, the query executes MapReduce job metadata operations with Metastore.
- 8 Fetch ResultThe execution engine receives the results from Data nodes.
- 9 Send Results: The execution engine sends those resultant values to the driver.
- 10 Send Results: The driver sends the results to Hive Interfaces.
Hive Data
- Hive deals with Structured Data
- Data Units:
Databases
- Tables
- Partitions
- Buckets (or clusters)
- Very similar to SQL and Relational DBs
Hive DDL Commands
- CREATE TABLE sample (foo INT, bar STRING) PARTITIONED BY (ds STRING);
- SHOW TABLES '.\s';*
- DESCRIBE sample;
- ALTER TABLE sample ADD COLUMNS (new_col INT);
- DROP TABLE sample;
- A table in Hive is an HDFS directory in HadoopSchema is known at creation time (like DB schema)
- Partitioned tables have “sub-directories” , one for each partition
Hive DML
- Load data from local file system
LOAD DATA LOCAL INPATH './sample.txt' OVERWRITE INTO TABLE sample; - Delete previous data from that tableLOAD DATA INPATH '/user/falvariz/hive/sample.txt’ INTO TABLE partitioned_sample PARTITION (ds='2012-02-24'); Load data from HDFS Augment to the existing data
- Loaded data are files copied to HDFS under the corresponding directory
- Must define a specific partition for partitioned table
Hive Commonents
- Hive CLI: Hive Client Interface
- MetaStore: For storing the schema information, data types, partitioning columns, etc…
- Hive QL: The query language, compiler, and executer
Data Model
- 3-Levels: Tables -> Partitions -> Buckets
- Table: maps to a HDFS directory
- Table R: Users all over the world
- Partition: maps to sub-directories under the table
- Partition R by country name
- It is the user’s responsibility to upload the right data to the right partition
- Bucket: maps to files under each partition
- Divide a partition into buckets based on a hash function on a certain column(s)
Data Model (Cont’d)
Query Examples I
- Select & Filter
- SELECT foo FROM sample WHERE ds='2012-02-24';*
- INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT * FROM sample WHERE ds='2012-02-24';*
- Create HDFS dir for the output INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive-sample-out' SELECT * FROM sample;
Create local dir for the output
Query Examples II
- Aggregation & Grouping*
- SELECT MAX(foo) FROM sample;*
- SELECT ds, COUNT(*), SUM(foo) FROM sample GROUP BY ds; FROM sample s INSERT OVERWRITE TABLE bar SELECT s.bar, count(\) WHERE s.foo > 0 GROUP BY s.bar;Hive allows the From clause to come first !!! Store the results into a tableThis new syntax is to facilitate the “Multi-Insertion”
Query Example III
FROM page_view_stg pvs INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08' , country='US') SELECT pvs.viewTime, … WHERE pvs.country = 'US' INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08' , country='CA') SELECT pvs.viewTime, ... WHERE pvs.country = 'CA' INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08' , country='UK') SELECT pvs.viewTime, ... WHERE pvs.country = 'UK';
Query Example IV
- Joins *CREATE TABLE customer id INT,name STRING,address STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '#';CREATE TABLE order_cust id INT,