Sometimes, we would need a specific Hive table’s HDFS Path which we usually get by running the statements in Hive CLI or Editor.
But there may be some situations, where we would need the consolidated list of all the Hive tables and their corresponding HDFS Paths for different purposes such as reporting or reviewing, etc., Extracting the HDFS Path of a specific table or a set of (or) all tables can be done by following .
To get the HDFS Path of Specific table:
- Connect to Beeline-Hive or Hue-Hive or any other clients connected to HiveServer2 using JDBC/ODBC connectors.
- Run the below Statement
SHOW CREATE TABLE <desired_table_name>;
3. Once done, there would be a value for the term LOCATION
in the result produced by the statement run above. It is the HDFS Path where the data for this table is stored.
To get the HDFS Path of all the Hive tables:
- Connect to the external DB that serves as Hive Metastore DB (connected to the Hive Metastore Service). This is where the Metadata details for all the Hive tables are stored.
For example, if its a mysql DB, you can connect to the Hive Metastore DB with name hive1 using the syntax
>mysql -u<userId> -p<password>;
>use hive1;
(Follow instructions as per the documentations of the DataBase you are using)
2. Run the below query to get the details of
SELECT DISTINCT B.TBL_ID AS TABLE_ID,
B.TBL_NAME AS TABLE_NAME,
A.LOCATION AS HDFS_PATH
FROM SDS A, TBLS B
WHERE A.SD_ID=B.SD_ID;
The result of the above query when run in HMS DB, you will be provided with the details of the tables and their corresponding HDFS Paths. (sample below). The above query can be modified by adding an additional WHERE
condition with the list of tables to pull the HDFS path of only a specific set of tables.
+----------+------------+-------------------------------------------------------------------------+
| TABLE_ID | TABLE_NAME | HDFS_PATH |
+----------+------------+-------------------------------------------------------------------------+
| 1 | sample_01 | hdfs://<host/nameservice>:8020/user/hive/warehouse/sample_01 |
| 2 | sample_02 | hdfs://<host/nameservice>:8020/user/hive/warehouse/sample_02 |
| 3 | customers | hdfs://<host/nameservice>:8020/user/hive/warehouse/customers |
| 4 | employees | hdfs://<host/nameservice>:8020/user/hive/warehouse/employees |
+----------+------------+-------------------------------------------------------------------------+
4 rows in set (0.00 sec)
Hope this helps!