I am currently trying to parse some data from bash tables, and I found strange behavior in parsing data if some columns is empty for example
i have data like this
containerName ipAddress memoryMB name numberOfCpus status
--------------- --------------- ---------- ------- -------------- ----------
TEST_VM 192.168.150.111 8192 TEST_VM 4 POWERED_ON
and sometimes like this
containerName ipAddress memoryMB name numberOfCpus status
--------------- ----------- ---------- ---------------------- -------------- -----------
TEST_VM_second 3072 TEST_VM_second_renamed 1 POWERED_OFF
I tried with python and with bash, but same results, I need data "name" but when I am using bash for example awk '{print $4}' in first table it prints expected result:
name
-------
TEST_VM
but in second table in prints:
name
----------------------
1
same results with python:
df_info = pd.read_table(StringIO(table), delim_whitespace=True)
df_info = df_info.drop(0)
pd.set_option('display.max_colwidth', None)
print(df_info['name'], df_info['containerName'])
Output:
1 TEST_VM
Name: name, dtype: object 1 TEST_VM
Name: containerName, dtype: object
1 1
Name: name, dtype: object 1 TEST_VM_second
Name: containerName, dtype: object
Maybe someone knows how to play around if ipaddress is empty field ?
1st solution: With GNU awk try following solution, with your shown samples. This will even take care of VMs which have spaces in their names.
awk '
FNR>2{
NF-=2
match($0,/.*[0-9]{4}[[:space:]]+(.*)$/,arr)
print arr[1]
}
' Input_file
2nd solution: With GNU grep with your shown samples please try following. Using regex ^.*?[0-9]{4}\s\K\S+ regex along with \K option of GNU grep to forget previous match and print only match captured after \K. This considers that VM doesn't have spaces in its name.
grep -oP '^.*?[0-9]{4}\s+\K\S+' Input_file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With