Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find part of a string in CSV and replace whole cell with new entry?

I've got a CSV file with a column which I want to sift through. I want to use a pattern file to find all entries where the pattern exists even in part of the column's value, and replace the whole cell value with this "pattern". I made a list of keywords that I want to use as my "pattern" bank; So, if a cell in this column (this case only second) has this "pattern" as part of its string, then I want to replace the whole cell with this "pattern".

so for example:

my target file:

id1,Taxidermy Equipment & Supplies,moreinfo1
id2,Taxis & Private Hire,moreinfo2
id3,Tax Services,moreinfo3
id4,Tools & Hardware,moreinfo4
id5,Tool Sharpening,moreinfo5
id6,Tool Shops,moreinfo6
id7,Video Conferencing,moreinfo7
id8,Video & DVD Shops,moreinfo8
id9,Woodworking Equipment & Supplies,moreinfo9

my "pattern" file:

Taxidermy Equipment & Supplies
Taxis
Tax Services
Tool
Video
Wood

output file:

id1,Taxidermy Equipment & Supplies,moreinfo1
id2,Taxis,moreinfo2
id3,Tax Services,moreinfo3
id4,Tool,moreinfo4
id5,Tool,moreinfo5
id6,Tool,moreinfo6
id7,Video,moreinfo7
id8,Video,moreinfo8
id9,Wood,moreinfo9

I came up with the usual "find and replace" sed:

sed -i 's/PATTERN/REPLACE/g' file.csv

but I want it to run on a specific column, so I came up with:

awk 'BEGIN{OFS=FS="|"}$2==PATTERN{$2=REPLACE}{print}' file.csv

but it doesn't work on "part of string" ([Video]:"Video & DVD Shops" -> "Video") and I can't seem to get it how awk takes input as a file for the "Pattern" block.

Is there an awk script for this? Or do I have to write something (in python with the built in csv suit for example?)

like image 415
vngrd1 Avatar asked Nov 29 '25 20:11

vngrd1


1 Answers

In awk, using index. It only prints record if a replacement is made but it's easy to modify to printing even if there is no match (for example replace the print $1,i,$3} with $0=$1 OFS i OFS $3} 1):

$ awk -F, -v OFS=, '
NR==FNR { a[$1]; next }          # store "patterns" to a arr
        { for(i in a)            # go thru whole a for each record
              if(index($2,i))    # if "pattern" matches $2
                  print $1,i,$3  # print with replacement
        }
' pattern_file target_file
id1,Taxidermy Equipment & Supplies,moreinfo1
id2,Taxis,moreinfo2
id3,Tax Services,moreinfo3
id4,Tool,moreinfo4
id5,Tool,moreinfo5
id6,Tool,moreinfo6
id7,Video,moreinfo7
id8,Video,moreinfo8
id9,Wood,moreinfo9
like image 74
James Brown Avatar answered Dec 01 '25 11:12

James Brown