Removing a line where the first column is not numeric

Question

I have a large dataset that is a CSV file and I've already cleaned it up quite a bit. However there are instances where I have a column which is not numeric and I want to remove those rows.

So far I thought something like this would work. I am thinking the solution is probably simple. I am not sure whether I can even do something like this. It is a CSV file so I tried to tell awk that its delimited by commas. Also an example of a good value in column one is this: 323870133825187840

awk '/,/$1 != numeric'

Are there any suggestions on this one. I initially thought about doing some kind of a one liner such as, [0-9], etc. I would greatly appreciate it if anyone could help me out with this. For someone out there this is probably like child's play :)

Jonathan Leffler · Accepted Answer

Assuming that you're only dealing with unsigned integer values, you can use either:

awk -F, '$1 ~ /^[[:digit:]]+$/'

or

awk -F, '$1 !~ /[^[:digit:]]/'

Both use the 'implicit print' action. The first checks that $1 only contains digits; the second checks that $1 does not contain a non-digit.

If your numbers are more general (signed, possibly with decimal point, possibly with exponential notation — 6.0221413e+23 (Avogadro's number), for example), then you need a more complex regular expression and will use positive matching, only selecting rows where field 1 matches your regex for a valid number.

Removing a line where the first column is not numeric

Tags:

unix

csv

awk

Tastybrownies

1 Answers

Jonathan Leffler

Recent Activity

Donate For Us

Removing a line where the first column is not numeric

Tags:

unix

csv

awk

Tastybrownies

1 Answers

Jonathan Leffler

Related questions

Recent Activity

Donate For Us