Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Get the line and column number of string index?

Tags:

python

file

text

Say I have a text file I'm operating on. Something like this (hopefully this isn't too unreadable):

data_raw = open('my_data_file.dat').read()
matches = re.findall(my_regex, data_raw, re.MULTILINE)
for match in matches:
    try:
        parse(data_raw, from_=match.start(), to=match.end())
    except Exception:
        print("Error parsing data starting on line {}".format(what_do_i_put_here))
        raise

Notice in the exception handler there's a certain variable named what_do_i_put_here. My question is: how can I assign to that name so that my script will print the line number that contains the start of the 'bad region' I'm trying to work with? I don't mind re-reading the file, I just don't know what I'd do...

like image 244
Dan Passaro Avatar asked Oct 25 '25 06:10

Dan Passaro


1 Answers

Here's something a bit cleaner, and in my opinion easier to understand than your own answer:

def index_to_coordinates(s, index):
    """Returns (line_number, col) of `index` in `s`."""
    if not len(s):
        return 1, 1
    sp = s[:index+1].splitlines(keepends=True)
    return len(sp), len(sp[-1])

It works essentially the same way as your own answer, but by utilizing string slicing splitlines() actually calculates all the information you need for you without the need for any post processing.

Using the keepends=True is necessary to give correct column counts for end of line characters.

The only extra problem is the edge case of an empty string, which can easily be handled by a guard-clause.

I tested it in Python 3.8, but it probably works correctly after about version 3.4 (in some older versions len() counts code units instead of code points, and I assume it would break for any string containing characters outside of the BMP)

like image 122
Tim Seguine Avatar answered Oct 26 '25 20:10

Tim Seguine