Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK: is there a way to check the next record for a number?

Tags:

bash

awk

I have data I am trying to manipulate. The only data we don't want to manipulate is the beginning and the end. The beginning is easy, as the beginning is always exon_1. The end is not always exon_12 or whatever, it varies.

My question is, does AWK have a way to store the next record without going on to the next record? The way I wanted to tackle this is to check if the next record is exon_1, and if it is then don't add +1 and +2 to the end of column 5

By the way, I am coding in a bash script. What else the script does is irrelevant to the question.

Code I have so far:

awk '{ if ($9~"exon_1;") {$3 = $3 FS "0"; $3 = $3 FS "0"; print $0 } else {$4 = $4-1 FS $4; $4 = $4-1 FS $4; print $0}}'  exons.gff3 > exons2.gff3
awk '{ {$7 = $7 FS $7+1; $7 = $7 FS $7+2; print $0}}'  exons2.gff3 > exons3.gff3

Code I'm thinking of but can't quiet implement:

awk '{ if ($9~"exon_1;") {$3 = $3 FS "0"; $3 = $3 FS "0"; print $0 } else {$4 = $4-1 FS $4; $4 = $4-1 FS $4; print $0}}'  exons.gff3 > exons2.gff3
awk 'BEGIN {NR+1 = a} { if (a's $11~"exon_1;") {$7 = $7 FS "0"; $7 = $7 FS "0"; print $0} {$7 = $7 FS $7+1; $7 = $7 FS $7+2; print $0}}'  exons2.gff3 > exons3.gff3

Input:

Chr1    MSU_osa1r7  exon    2903    3268    .   +   .   ID=LOC_Os01g01010.1:exon_1;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    3354    3616    .   +   .   ID=LOC_Os01g01010.1:exon_2;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    4357    4455    .   +   .   ID=LOC_Os01g01010.1:exon_3;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    5457    5560    .   +   .   ID=LOC_Os01g01010.1:exon_4;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    7136    7944    .   +   .   ID=LOC_Os01g01010.1:exon_5;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    8028    8150    .   +   .   ID=LOC_Os01g01010.1:exon_6;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    8232    8320    .   +   .   ID=LOC_Os01g01010.1:exon_7;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    8408    8608    .   +   .   ID=LOC_Os01g01010.1:exon_8;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    9210    9617    .   +   .   ID=LOC_Os01g01010.1:exon_9;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    10104   10187   .   +   .   ID=LOC_Os01g01010.1:exon_10;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    10274   10430   .   +   .   ID=LOC_Os01g01010.1:exon_11;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    10504   10817   .   +   .   ID=LOC_Os01g01010.1:exon_12;Parent=LOC_Os01g01010.1
Chr1    MSU_osa1r7  exon    422527  422748  .   +   .   ID=LOC_Os01g01800.1:exon_1;Parent=LOC_Os01g01800.1
Chr1    MSU_osa1r7  exon    422910  422972  .   +   .   ID=LOC_Os01g01800.1:exon_2;Parent=LOC_Os01g01800.1
Chr1    MSU_osa1r7  exon    423069  423379  .   +   .   ID=LOC_Os01g01800.1:exon_3;Parent=LOC_Os01g01800.1
Chr1    MSU_osa1r7  exon    423524  423620  .   +   .   ID=LOC_Os01g01800.1:exon_4;Parent=LOC_Os01g01800.1
Chr1    MSU_osa1r7  exon    423697  423774  .   +   .   ID=LOC_Os01g01800.1:exon_5;Parent=LOC_Os01g01800.1
Chr1    MSU_osa1r7  exon    423871  423930  .   +   .   ID=LOC_Os01g01800.1:exon_6;Parent=LOC_Os01g01800.1

(Ignore that it changed to space delimited, I'll fix that later. I just copied what my code produces now and added what I wanted it to do to the last exon_.)

Output:

Chr1 MSU_osa1r7 exon 0 0 2903 3268 3269 3270 . + . ID=LOC_Os01g01010.1:exon_1;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 3352 3353 3354 3616 3617 3618 . + . ID=LOC_Os01g01010.1:exon_2;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 4355 4356 4357 4455 4456 4457 . + . ID=LOC_Os01g01010.1:exon_3;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 5455 5456 5457 5560 5561 5562 . + . ID=LOC_Os01g01010.1:exon_4;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 7134 7135 7136 7944 7945 7946 . + . ID=LOC_Os01g01010.1:exon_5;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8026 8027 8028 8150 8151 8152 . + . ID=LOC_Os01g01010.1:exon_6;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8230 8231 8232 8320 8321 8322 . + . ID=LOC_Os01g01010.1:exon_7;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 8406 8407 8408 8608 8609 8610 . + . ID=LOC_Os01g01010.1:exon_8;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 9208 9209 9210 9617 9618 9619 . + . ID=LOC_Os01g01010.1:exon_9;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10102 10103 10104 10187 10188 10189 . + . ID=LOC_Os01g01010.1:exon_10;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10272 10273 10274 10430 10431 10432 . + . ID=LOC_Os01g01010.1:exon_11;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 10502 10503 10504 10817 0 0 . + . ID=LOC_Os01g01010.1:exon_12;Parent=LOC_Os01g01010.1
Chr1 MSU_osa1r7 exon 0 0 2984 3255 3256 3257 . + . ID=LOC_Os01g01010.2:exon_1;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 3352 3353 3354 3616 3617 3618 . + . ID=LOC_Os01g01010.2:exon_2;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 4355 4356 4357 4455 4456 4457 . + . ID=LOC_Os01g01010.2:exon_3;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 5455 5456 5457 5560 5561 5562 . + . ID=LOC_Os01g01010.2:exon_4;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 7134 7135 7136 7944 7945 7946 . + . ID=LOC_Os01g01010.2:exon_5;Parent=LOC_Os01g01010.2
Chr1 MSU_osa1r7 exon 8026 8027 8028 8150 0 0 . + . ID=LOC_Os01g01010.2:exon_6;Parent=LOC_Os01g01010.2

What you're seeing is that the last exon_ does not have a +1 and +2 value added next to it, I want the last value to not change.

like image 744
adrotter Avatar asked Jan 25 '26 04:01

adrotter


1 Answers

Awk does not look ahead, but you can always create a variable which stores the previous record and arrange to write that rather than the current record. The END section would tidy up.

like image 182
Thomas Dickey Avatar answered Jan 27 '26 17:01

Thomas Dickey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!