I'm facing one problem which is bigger than me. I have 18 relative large text files (ca 30k lines each) and I need to count the values between the empty cells in the second column. Here is a simple example of my file:
Metabolism
line_1 10.2
line_2 10.1
line_3 10.3
TCA_cycle
line_4 10.7
line_5 10.8
Pyruvate_metab
line_6 100.8
In reality, I have circa 500 description lines (Metabolism, TCA_cycle, etc.) and the range of lines is between zero to a few hundred.
I would like to count values for each block (block starts with a description and corresponding lines are always below), e.g.
Metabolism 30.6
line_1 10.2
line_2 10.1
line_3 10.3
TCA_cycle 21.5
line_4 10.7
line_5 10.8
Pyruvate_metab 100.8
line_6 100.8
Or just
30.3
21.5
100.8
It won't be a problem if results will be printed line by line into an additional file... Or another alternative way.
There is one tricky thing and it's descriptions without lines with numbers.
Transport
line_1000 100.1
line_1001 100.2
Cell_signal
Motility
Processing
Translation
line_1002 500.1
line_1003 200.2
And even for those lines and would like to get 0 value.
Transport 200.3
line_1000 100.1
line_1001 100.2
Cell_signal 0
Motility 0
Processing 0
Translation 700.3
line_1002 500.1
line_1003 200.2
The rest of the file looks same and it's consistent - 2 columns, tab separators, descriptions in the first column, values in the second, no spaces (only underlines).
Actually I have no experience with more sophisticated coding so I really don't know how to solve it in the command line. I've already tried some Excel ways but it was painful and unsuccessful.
Because you want to count empty cells, you can use a blank text string as your criteria. To use COUNTIF, open your Google Sheets spreadsheet and click on a blank cell. Type =COUNTIF(range,"") , replacing range with your chosen cell range.
We can use the ISBLANK coupled with conditional formatting. For example, suppose we want to highlight the blank cells in the range A2:F9, we select the range and use a conditional formatting rule with the following formula: =ISBLANK(A2:F9).
With tac and any awk:
tac file | awk 'NF==2{sum+=$2; print; next} {print $1 "\t" sum; sum=0}' | tac
With two improvements proposed by kvantour and Ed Morton. See the comments.
tac file | awk '($NF+0==$NF){sum+=$2; print; next} {print $1 "\t" sum+0; sum=0}' | tac
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With