Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how can I split up this string

Tags:

bash

awk

gnu

cut

I am currently trying to sanitize some log files so they are in an easier format to read, and have been trying to use the gnu cut command, which works fairly well, although I cannot really think of a good way to remove the [INFO] part of the string

logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh
logs/logs/server_1282136782.log:2010-08-18 16:27:32 [INFO] <pinguin> <pinguin>§F :/
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <TotempaaltJ> <TotempaaltJ>§F That helped A LOT
logs/logs/server_1282136782.log:2010-08-18 16:27:37 [INFO] <Rizual> §b<Rizual>§F hm?
logs/logs/server_1282136782.log:2010-08-18 16:29:10 [INFO] <pinguin> <pinguin>§F bah
logs/logs/server_1282136782.log:2010-08-18 16:29:35 [INFO] <TotempaaltJ> <TotempaaltJ>§F Finished my houses 
logs/logs/server_1282136782.log:2010-08-18 16:29:40 [INFO] <TotempaaltJ> <TotempaaltJ>§F or whatever
logs/logs/server_1282136782.log:2010-08-18 16:30:47 [INFO] <Rizual> §b<Rizual>§So much iron
logs/logs/server_1282136782.log:2010-08-18 16:30:58 [INFO] <TotempaaltJ> <TotempaaltJ>§F Ah yes, furnaces don't work.o
logs/logs/server_1282136782.log:2010-08-18 16:31:01 [INFO] <Rizual> §b<Rizual>§F They do
logs/logs/server_1282136782.log:2010-08-18 16:31:06 [INFO] <TotempaaltJ> <TotempaaltJ>§F Hm
logs/logs/server_1282136782.log:2010-08-18 16:31:08 [INFO] <Rizual> §b<Rizual>§F just need to use /lighter
logs/logs/server_1282136782.log:2010-08-18 16:31:12 [INFO] <Valrix> <Valrix>§FNotch fixed them?

I would ultimately want to get the strings down to something that resembles the following (keep in mind that the logs are in two formats, the older format which has 2 copies of the names, as can be seen in the bulk of the logs above, and also the newer format, which only has the name in there once (can be seen in the first log line, the <natemar> one))

2010-08-31 23:06:51 <NateMar> where?!    
2010-08-15 22:59:53 <BoonTheMoon> ohhhhhh (this one would require both the same editing as above, plus removal of the "extra" name §b<BoonTheMoon>§)    

How should I go about doing this? Have thought about using awk, although I'm having a difficult time getting a grip on how that would work, so not sure how to set up something to do that. Any help would be greatly appreciated, thanks!

like image 383
lacrosse1991 Avatar asked Nov 23 '25 03:11

lacrosse1991


1 Answers

More takes on this, in sed, awk and bash:

[ghoti@pc ~]$ cat text
logs/logs/server_1283258036.log:2010-08-31 23:06:51 [INFO] <NateMar> where?!
logs/logs/server_1281904775.log:2010-08-15 22:59:53 [INFO] <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

[ghoti@pc ~]$ sed 's/^[^:]*://;s/[[][^]]*[]] //' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

[ghoti@pc ~]$ awk '{sub(/^[^:]+:/,""); $3=""} 1' text
2010-08-31 23:06:51  <NateMar> where?!
2010-08-15 22:59:53  <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[*\] }"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> §b<BoonTheMoon>§ohhhhhh

While these are simple, they may be imperfect for the sake of shortness. For example, the awk script, by eliminating the third "word", leaves spaces that delimit the now-null word.

Note that as "elegant" as one-liners may seem for quick jobs, it's usually a better idea to be explicit with your code, especially when you have to deal with unknown input data or if you won't be inspecting your results immediately after you run things.

This is harder to read, but could be much safer, depending on your input:

[ghoti@pc ~]$ awk '$3~/^[[].+[]]$/{$3="";sub(/  /," ")} {sub(/^[^:]+:/,"")} 1' text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> çb<BoonTheMoon>çohhhhhh

For the bash script, you'd be safer to use a character class rather than a glob:

[ghoti@pc ~]$ shopt -s extglob
[ghoti@pc ~]$ while read line; do line=${line#*:}; echo "${line/\[+([[:upper:]])\] /}"; done < text
2010-08-31 23:06:51 <NateMar> where?!
2010-08-15 22:59:53 <BoonTheMoon> çb<BoonTheMoon>çohhhhhh

Note that the extglob shopt option lets you use more advanced pattern matching inside the parameter replacement pattern. man bash and look for Pathname Expansion for details.

UPDATE:

You've added a new requirement to your question that wasn't there originally. Here's how you can achieve your new requirement with awk:

awk '$3~/^[[].+[]]$/{$3="";sub(/  /," ")} {sub(/^[^:]+:/,"")} $3~/^<.+>$/{sub(/^(§b)?<[[:alpha:]]+>§/,"",$4)} 1' text

This simply removes coloured nicknames from the 4th string, if the 3rd string looks like a bracketed nickname. This works for the sample you posted, but only you can determine whether this will work for you.

And with bash:

shopt -s extglob
while read date time tag nick line; do
  printf "%s %s %s %s\n" "${date#*:}" "$time" "$nick" "${line/#*([^< ])$nick??}"
done < text
like image 97
ghoti Avatar answered Nov 24 '25 18:11

ghoti



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!