How do I import data into Hive from a table?
There are various ways to import data into a table in Hive, the specific method depends on the data’s source and format. Here are some common methods for importing data:
- By using the LOAD DATA statement, you can import data from a local file into a Hive table. For example, if you have a Hive table named my_table, you can use the following command to load data from a local file into that table:
LOAD DATA LOCAL INPATH '/path/to/local/data/file' INTO TABLE my_table;
Please note that the LOAD DATA statement, by default, will move the data to the Hive data directory. If you want to retain the data locally, use the LOCAL keyword.
- To insert data from another Hive table or query results into an existing Hive table, you can use the INSERT statement. For example, if you have a Hive table called my_table and another table named source_table, you can use the following command to insert the data from source_table into my_table:
INSERT INTO TABLE my_table SELECT * FROM source_table;
Specific conditions can also be used to filter the data to be inserted, for example:
INSERT INTO TABLE my_table SELECT col1, col2 FROM source_table WHERE col3 > 10;
- By using Hive’s Hadoop InputFormat, you can import data from Hadoop Distributed File System (HDFS). Simply upload the data files to HDFS, then create a table using Hive’s CREATE TABLE statement, specifying the Hadoop InputFormat and the location of the data files. For example:
CREATE TABLE my_table (col1 INT, col2 STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/path/to/hdfs/data/file';
In this example, the location of the data file is /path/to/hdfs/data/file, which can be a single file or a directory. Next, use the LOAD DATA statement to import the data into the table:
LOAD DATA INPATH '/path/to/hdfs/data/file' INTO TABLE my_table;
These are some common methods for importing data into tables in Hive, and the specific method can be chosen based on the source and format of the data.