Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging small files into single file in hdfs

In a cluster of hdfs, i receive multiple files on a daily basis which can be of 3 types :

1) product_info_timestamp

2) user_info_timestamp

3) user_activity_timestamp

The number of files received can be of any number but they will belong to one of these 3 categories only.

I want to merge all the files(after checking whether they are less than 100mb) belonging to one category into a single file. for eg: 3 files named product_info_* should be merged into one file named product_info.

How do i achieve this?

like image 632
user3829376 Avatar asked Jan 28 '26 00:01

user3829376


1 Answers

You can use getmerge toachieve this, but the result will be stored in your local node (edge node), so you need to be sure you have enough space there.

hadoop fs -getmerge /hdfs_path/product_info_* /local_path/product_inf

You can move them back to hdfs with put

hadoop fs -put  /local_path/product_inf /hdfs_path
like image 168
SCouto Avatar answered Jan 31 '26 16:01

SCouto