Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to identify consecutive lines with specific features using for loop in Python?

I am now manipulating huge data set. The format is like this:

1 1 1 1 1 1 1 1 A 1 1 1 1
1 1 1 1 1 1 1 1 A 1 1 1 1
1 1 1 1 1 1 1 1 A 1 1 1 1
1 1 1 1 1 1 1 1 B 1 1 1 1
1 1 1 1 1 1 1 1 B 1 1 1 1
1 1 1 1 1 1 1 1 C 1 1 1 1
1 1 1 1 1 1 1 1 C 1 1 1 1
1 1 1 1 1 1 1 1 C 1 1 1 1

'1' can be different. My goal is to identify the two lines with 'B' (three or four consecutive lines with 'B' is possible) and extract these lines with 'B' and their surrounding lines (e.g., the prior two lines with 'A' and the following two lines with 'C'). There are several blocks of this kind and I was considering using for loop to read the file line by line. Every time when I meet an 'A' followed by a 'B' the position is identified. I tried using

for line in file:
    if 'A' in line and if 'B' in file.next():

But it seemed some lines were lost. My question is how can I exactly identify A-B (or B-C) line pair using for loop? And after that, how can I easily go backwards (or forwards) several lines to extract all of them within the loop?

like image 212
liyue84 Avatar asked Jan 31 '26 03:01

liyue84


1 Answers

The linecache module can get lines from a file by line number. You can use this to mark boundary points (A-B, B-C) as you go through the file, and then loop through the lines to get the output that you want.

import linecache

final_lines = []
with open("file.txt") as f:
    for i, line in enumerate(f, 1):
        if "B" in line:
            if "A" in linecache.getline("file.txt", i-1):
                linestart = i - 2  ##2 lines before
            if "C" in linecache.getline("file.txt", i+1):
                lineend = i + 2  ##2 lines after
                for j in range(linestart, lineend+1):
                    final_lines.append(linecache.getline("file.txt", j))
print(final_lines)
like image 51
twasbrillig Avatar answered Feb 02 '26 17:02

twasbrillig