I have the following folder structure in hdfs
  /input/data/yyyy/mm/dd/
and inside it data files, for example:
/input/data/2013/05/01/
      file_2013_05_01_01.json // file format yyyy_mm_dd_hh
      file_2013_05_01_02.json // file format yyyy_mm_dd_hh
      ....
I've defined hive external table for this folder:
CREATE EXTERNAL TABLE input_data (
    vr INT, ....
)
PARTITIONED BY (tsp STRING)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
STORED AS TEXTFILE;
adding for each folder a partition as following:
   alter table input_data ADD PARTITION (tsp="2013-05-01") LOCATION '/input/data/2013/05/01/';
The following query will take as input all files in date 2013-05-01
select ... from input_data where tps="2013-05-01"
How can I take only files of specific hour? without changing the hdfs structure to put each hour in separate folder?
LOAD DATA [LOCAL] INPATH '<The table data location>' [OVERWRITE] INTO TABLE <table_name>; Note: The LOCAL Switch specifies that the data we are loading is available in our Local File System. If the LOCAL switch is not used, the hive will consider the location as an HDFS path location.
One is INPUT__FILE__NAME , which is the input file's name for a mapper task. the other is BLOCK__OFFSET__INSIDE__FILE , which is the current global file position. For block compressed file, it is the current block's file offset, which is the current block's first byte's file offset. Since Hive 0.8.
You could make use of a virtual column called INPUT__FILE__NAME. It is one of the 2 two virtual columns provided by Hive 0.8.0 and onward and represents the input file's name for a mapper task. So you could do something like this :
select ... from input_data 
where tps="2013-05-01" 
and INPUT__FILE__NAME='file_2013_05_01_01.json';
HTH
You could make use of the following construct:
SELECT 
   *
FROM
   my_input_data
WHERE
   INPUT__FILE__NAME LIKE '%hh.json';
Here hh is your desired hour and INPUT__FILE__NAME is the virtual column available to hive queries while processing a given file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With