How to replace complex ID with number?

Question

I have a file with multiple entries for each ID number. The file has about 2,000 ID's with 54,000 observations per ID. I need to feed the output into an algorithm that requires ID's to be less than 6 characters. How can I replace the ID's with just the numbers one to 2000? ID in the file looks like this:

2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569

Need it to look like this (want to keep the ID):

1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569

Thanks

Ed Morton · Accepted Answer

$ cat file
2007I804567
2007I804567
2007I804567
2007I804568
2007I804568
2007I804568
2007I804569
2007I804569
2007I804569
$ 
$ awk '!seen[$0]++{++id} {print id, $0}' file
1 2007I804567
1 2007I804567
1 2007I804567
2 2007I804568
2 2007I804568
2 2007I804568
3 2007I804569
3 2007I804569
3 2007I804569

jkshah · Answer

Try following awk

awk '!($0 in id) {id[$0]=++n} {print id[$0], $0}' file

Short Description

awk '
    !($0 in id) {             # if line is not present in array 'id'
         id[$0]=++n           # assign unique ID of a line to incremental number i.e. create an array of id with line a key 
    } 
    {
        print id[$0], $0      # print corresponding ID along with line content
    }' file                   # input file

How to replace complex ID with number?

Tags:

awk

Justin Buchanan

2 Answers

Ed Morton

jkshah

Recent Activity

Donate For Us

How to replace complex ID with number?

Tags:

awk

Justin Buchanan

2 Answers

Ed Morton

jkshah

Related questions

Recent Activity

Donate For Us