I have a data frame with >100 columns each labeled with a unique string. Column 1 represents the index variable. I would like to use a basic UNIX command to extract the index column (column 1) + a specific column string using grep. 
For example, if my data frame looks like the following:
Index  A  B  C...D  E  F p1     1  7  4   2  5  6 p2     2  2  1   2  .  3 p3     3  3  1   5  6  1 I would like to use some command to extract only column "X" which I will specify with grep, and display both column 1 & the column I grep'd. I know that I can use cut -f1 myfile for the first bit, but need help with the grep per column. As a more concrete example, if my grep phrase were "B", I would like the output to be:
Index  B p1     7 p2     2 p3     3 I am new to UNIX, and have not found much in similar examples. Any help would be much appreciated!!
If applicable, you may consider caret ^: grep -E '^foo|^bar' it will match text at the beginning of the string. Column one is always located at the beginning of the string. ^ Matches the starting position within the string. In line-based tools, it matches the starting position of any line.
To search multiple files with the grep command, insert the filenames you want to search, separated with a space character. The terminal prints the name of every file that contains the matching lines, and the actual lines that include the required string of characters. You can append as many filenames as needed.
You need to use awk:
awk '{print $1,$3}' <namefile> This simple command allows printing the first ($1) and third ($3) column of the file. The software awk is actually much more powerful. I think you should have a look at the man page of awk.
A nice combo is using grep and awk with a pipe. The following code will print column 1 and 3 of only the lines of your file that contain 'p1':
grep 'p1' <namefile> | awk '{print $1,$3}' If, instead, you want to select lines by line number you can replace grep with sed:
sed 1p <namefile> | awk '{print $1,$3}' Actually, awk can be used alone in all the examples:
awk '/p1/{print $1,$3}' <namefile> # will print only lines containing p1 awk '{if(NR == 1){print $1,$3}}' <namefile> # Will print only first line First figure out the command to find the column number.
columnname=C sed -n "1 s/${columnname}.*//p" datafile | sed 's/[^\t*]//g' | wc -c Once you know the number, use cut
cut -f1,3 < datafile  Combine into one command
cut -f1,$(sed -n "1 s/${columnname}.*//p" datafile |     sed 's/[^\t*]//g' | wc -c) < datafile Finished? No, you should improve the first sed command when one header can be a substring of another header: include tabs in your match and put the tabs back in the replacement string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With