I have a file with multiple entries for each ID number. The file has about 2,000 ID's with 54,000 observations per ID. I need to feed the output into an algorithm that requires ID's to be less than 6 characters. How can I replace the ID's with just the numbers one to 2000? ID in the file looks like this:
2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569
Need it to look like this (want to keep the ID):
1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569
Thanks
$ cat file
2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569
$
$ awk '!seen[$0]++{++id} {print id, $0}' file
1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569
Try following awk
awk '!($0 in id) {id[$0]=++n} {print id[$0], $0}' file
Short Description
awk '
!($0 in id) { # if line is not present in array 'id'
id[$0]=++n # assign unique ID of a line to incremental number i.e. create an array of id with line a key
}
{
print id[$0], $0 # print corresponding ID along with line content
}' file # input file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With