I am having a 2 GB data in my HDFS.
Is it possible to get that data randomly. Like we do in the Unix command line
cat iris2.csv |head -n 50
Use the hdfs dfs -ls command to list files in Hadoop archives. Run the hdfs dfs -ls command by specifying the archive directory location. Note that the modified parent argument causes the files to be archived relative to /user/ .
Native head
hadoop fs -cat /your/file | head is efficient here, as cat will close the stream as soon as head will finish reading all the lines.
To get the tail there is a special effective command in hadoop:
hadoop fs -tail /your/file Unfortunately it returns last kilobyte of the data, not a given number of lines.
You can use head command in Hadoop too! Syntax would be
hdfs dfs -cat <hdfs_filename> | head -n 3 This will print only three lines from the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With