Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a file index in Linux

I have a filesystem with deeply nested directories. Inside the bottom level directory for any node in the tree is a directory whose name is the guid of a record in a database. This folder contains the binary file(s) (pdf, jpg, etc) that are attached to that record.

Two Example paths:

/g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf
/g/camm/MOUNT/raid_fs1/FOO/052014/22/321.654.987/04.20.30--27.04.2014--RJ123.pdf

In the above example, 123.456.789 and 321.654.987 are guids

I want to build an index of the complete filesystem so that I can create a lookup table in my database to easily map the guid of the record to the absolute path(s) of its attached file(s).

I can easily generate a straight list of files with:

find /g/camm/MOUNT -type f > /g/camm/MOUNT/files.index

but I want to parse the output of each file path into a CSV file which looks like:

GUID    ABSOLUTEPATH    FILENAME
123.456.789 /g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf    04.20.30--27.04.2014--RJ123.pdf
321.654.987 /g/camm/MOUNT/raid_fs1/FOO/052014/22/321.654.987/04.20.30--27.04.2014--RJ123.pdf    04.20.30--27.04.2014--RJ123.pdf

I think I need to pipe the output of my find command into xargs and again into awk to process each line of the output into the desired format for the CSV output... but I can't make it work...

like image 863
Adam Avatar asked Dec 13 '25 02:12

Adam


1 Answers

Wait for your long-running find to finish, then you can pass the list of filenames through awk:

awk -F/ '{printf "%s,%s,%s\n",$(NF-1),$0,$NF}' /g/camm/MOUNT/files.index

and this will convert lines like

/g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf

into

123.456.789,/g/camm/MOUNT/raid_fs0/FOO/042014/27/123.456.789/04.20.30--27.04.2014--RJ123.pdf,04.20.30--27.04.2014--RJ123.pdf

The -F/ splits the line into fields using "/" as separator, NF is the number of fields, so $NF means the last field, and $(NF-1) the next-to-last, which seems to be the directory you want in the first column of the output. I used "," in the printf to separate the output columns, as is typical in a csv; you can replace it by any character such as space or ";".

like image 50
meuh Avatar answered Dec 14 '25 16:12

meuh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!