Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does `'{x[1:3]}'.format(x="asd")` cause a TypeError?

Consider this:

>>> '{x[1]}'.format(x="asd")
's'
>>> '{x[1:3]}'.format(x="asd")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers

What could be the cause for this behavior?

like image 727
d33tah Avatar asked Sep 01 '25 20:09

d33tah


2 Answers

An experiment based on your comment, checking what value the object's __getitem__ method actually receives:

class C:
    def __getitem__(self, index):
        print(repr(index))

'{c[4]}'.format(c=C())
'{c[4:6]}'.format(c=C())
'{c[anything goes!@#$%^&]}'.format(c=C())
C()[4:6]

Output (Try it online!):

4
'4:6'
'anything goes!@#$%^&'
slice(4, 6, None)

So while the 4 gets converted to an int, the 4:6 isn't converted to slice(4, 6, None) as in usual slicing. Instead, it remains simply the string '4:6'. And that's not a valid type for indexing/slicing a string, hence the TypeError: string indices must be integers you got.

Update:

Is that documented? Well... I don't see something really clear, but @GACy20 pointed out something subtle. The grammar has these rules

field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
element_index     ::=  digit+ | index_string
index_string      ::=  <any source character except "]"> +

Our c[4:6] is the field_name, and we're interested in the element_index part 4:6. I think it would be clearer if digit+ had its own rule with meaningful name:

field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
element_index     ::=  index_integer | index_string
index_integer     ::=  digit+
index_string      ::=  <any source character except "]"> +

I'd say having index_integer and index_string would more clearly indicate that digit+ is converted to an integer (instead of staying a digit string), while <any source character except "]"> + would stay a string.

That said, looking at the rules as they are, perhaps we should think "what would be the point of separating the digits case out of the any-characters case which would match it as well?" and think that the point is to treat pure digits differently, presumably to convert them to an integer. Or maybe some other part of the documentation even states that digit or digits+ in general gets converted to an integer.

like image 70
Kelly Bundy Avatar answered Sep 04 '25 07:09

Kelly Bundy


'{x[1]}'.format(x="asd") the [1] syntax here is not the "normal" string indexing syntax, even if in this case it appears to be working the same way.

It is using the Format Specification Mini-Language. The same mechanism that allows for passing objects and accessing an arbitrary attribute inside the formatted string (eg '{x.name}'.format(x=some_object)).

This "fake" indexing syntax also allows to pass indexable objects to format and directly getting the element you want from within the formatted string:

'{x[0]}'.format(x=('a', 'tuple'))
# 'a'
'{x[1]}'.format(x=('a', 'tuple'))
# 'tuple'

The only reference (that I could find, at least) for this in the docs is this paragraph:

The field_name itself begins with an arg_name that is either a number or a keyword. If it’s a number, it refers to a positional argument, and if it’s a keyword, it refers to a named keyword argument. If the numerical arg_names in a format string are 0, 1, 2, … in sequence, they can all be omitted (not just some) and the numbers 0, 1, 2, … will be automatically inserted in that order. Because arg_name is not quote-delimited, it is not possible to specify arbitrary dictionary keys (e.g., the strings '10' or ':-]') within a format string. The arg_name can be followed by any number of index or attribute expressions. An expression of the form '.name' selects the named attribute using getattr(), while an expression of the form '[index]' does an index lookup using __getitem__().

While it mentions

while an expression of the form '[index]' does an index lookup using __getitem__().

it does not mention anything about slicing syntax not being supported.

For me this feels like an oversight in the docs, especially because '{x[1:3]}'.format(x="asd") generates such a cryptic error message, and even more so due to __getitem__ already supporting slicing.

like image 38
DeepSpace Avatar answered Sep 04 '25 05:09

DeepSpace