I have ran a map only job with 674 mappers which hive took an has generated 674 .gz files I want to merge these files to aroung 30-35 files.have tried hive megre mapfilse property by not getting the merged output
Try using TEZ execution engine and then hive.merge.tezfiles. You might also want to specify the size as well.
set hive.execution.engine=tez; -- TEZ execution engine
set hive.merge.tezfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB
If you want to go with MR engine then add following settings (I haven't tried it personally)
set hive.merge.mapredfiles=true; -- Notifying that merge step is required
set hive.merge.smallfiles.avgsize=128000000; --128MB
set hive.merge.size.per.task=128000000; -- 128MB
Above setting will spawn one more step to merge the files and approx size of each part file should be 128MB.
Reference:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With