I have data I am trying to manipulate. The only data we don't want to manipulate is the beginning and the end. The beginning is easy, as the beginning is always exon_1. The end is not always exon_12 or whatever, it varies.
My question is, does AWK have a way to store the next record without going on to the next record? The way I wanted to tackle this is to check if the next record is exon_1, and if it is then don't add +1 and +2 to the end of column 5
By the way, I am coding in a bash script. What else the script does is irrelevant to the question.
Code I have so far:
awk '{ if ($9~"exon_1;") {$3 = $3 FS "0"; $3 = $3 FS "0"; print $0 } else {$4 = $4-1 FS $4; $4 = $4-1 FS $4; print $0}}' exons.gff3 > exons2.gff3
awk '{ {$7 = $7 FS $7+1; $7 = $7 FS $7+2; print $0}}' exons2.gff3 > exons3.gff3
Code I'm thinking of but can't quiet implement:
awk '{ if ($9~"exon_1;") {$3 = $3 FS "0"; $3 = $3 FS "0"; print $0 } else {$4 = $4-1 FS $4; $4 = $4-1 FS $4; print $0}}' exons.gff3 > exons2.gff3
awk 'BEGIN {NR+1 = a} { if (a's $11~"exon_1;") {$7 = $7 FS "0"; $7 = $7 FS "0"; print $0} {$7 = $7 FS $7+1; $7 = $7 FS $7+2; print $0}}' exons2.gff3 > exons3.gff3
Input:
Chr1 MSU_osa1r7 exon 2903 3268 . + . ID=LOC_Os01g01010.1:exon_1;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 3354 3616 . + . ID=LOC_Os01g01010.1:exon_2;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 4357 4455 . + . ID=LOC_Os01g01010.1:exon_3;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 5457 5560 . + . ID=LOC_Os01g01010.1:exon_4;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 7136 7944 . + . ID=LOC_Os01g01010.1:exon_5;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8028 8150 . + . ID=LOC_Os01g01010.1:exon_6;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8232 8320 . + . ID=LOC_Os01g01010.1:exon_7;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8408 8608 . + . ID=LOC_Os01g01010.1:exon_8;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 9210 9617 . + . ID=LOC_Os01g01010.1:exon_9;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10104 10187 . + . ID=LOC_Os01g01010.1:exon_10;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10274 10430 . + . ID=LOC_Os01g01010.1:exon_11;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10504 10817 . + . ID=LOC_Os01g01010.1:exon_12;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 422527 422748 . + . ID=LOC_Os01g01800.1:exon_1;Parent=LOC_Os01g01800.1
Chr1 MSU_osa1r7 exon 422910 422972 . + . ID=LOC_Os01g01800.1:exon_2;Parent=LOC_Os01g01800.1
Chr1 MSU_osa1r7 exon 423069 423379 . + . ID=LOC_Os01g01800.1:exon_3;Parent=LOC_Os01g01800.1
Chr1 MSU_osa1r7 exon 423524 423620 . + . ID=LOC_Os01g01800.1:exon_4;Parent=LOC_Os01g01800.1
Chr1 MSU_osa1r7 exon 423697 423774 . + . ID=LOC_Os01g01800.1:exon_5;Parent=LOC_Os01g01800.1
Chr1 MSU_osa1r7 exon 423871 423930 . + . ID=LOC_Os01g01800.1:exon_6;Parent=LOC_Os01g01800.1
(Ignore that it changed to space delimited, I'll fix that later. I just copied what my code produces now and added what I wanted it to do to the last exon_.)
Output:
Chr1 MSU_osa1r7 exon 0 0 2903 3268 3269 3270 . + . ID=LOC_Os01g01010.1:exon_1;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 3352 3353 3354 3616 3617 3618 . + . ID=LOC_Os01g01010.1:exon_2;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 4355 4356 4357 4455 4456 4457 . + . ID=LOC_Os01g01010.1:exon_3;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 5455 5456 5457 5560 5561 5562 . + . ID=LOC_Os01g01010.1:exon_4;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 7134 7135 7136 7944 7945 7946 . + . ID=LOC_Os01g01010.1:exon_5;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8026 8027 8028 8150 8151 8152 . + . ID=LOC_Os01g01010.1:exon_6;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8230 8231 8232 8320 8321 8322 . + . ID=LOC_Os01g01010.1:exon_7;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8406 8407 8408 8608 8609 8610 . + . ID=LOC_Os01g01010.1:exon_8;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 9208 9209 9210 9617 9618 9619 . + . ID=LOC_Os01g01010.1:exon_9;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10102 10103 10104 10187 10188 10189 . + . ID=LOC_Os01g01010.1:exon_10;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10272 10273 10274 10430 10431 10432 . + . ID=LOC_Os01g01010.1:exon_11;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10502 10503 10504 10817 0 0 . + . ID=LOC_Os01g01010.1:exon_12;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 0 0 2984 3255 3256 3257 . + . ID=LOC_Os01g01010.2:exon_1;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 3352 3353 3354 3616 3617 3618 . + . ID=LOC_Os01g01010.2:exon_2;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 4355 4356 4357 4455 4456 4457 . + . ID=LOC_Os01g01010.2:exon_3;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 5455 5456 5457 5560 5561 5562 . + . ID=LOC_Os01g01010.2:exon_4;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 7134 7135 7136 7944 7945 7946 . + . ID=LOC_Os01g01010.2:exon_5;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 8026 8027 8028 8150 0 0 . + . ID=LOC_Os01g01010.2:exon_6;Parent=LOC_Os01g01010.2
What you're seeing is that the last exon_ does not have a +1 and +2 value added next to it, I want the last value to not change.
Awk does not look ahead, but you can always create a variable which stores the previous record and arrange to write that rather than the current record. The END section would tidy up.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With