Consider this:
>>> '{x[1]}'.format(x="asd")
's'
>>> '{x[1:3]}'.format(x="asd")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
What could be the cause for this behavior?
An experiment based on your comment, checking what value the object's __getitem__
method actually receives:
class C:
def __getitem__(self, index):
print(repr(index))
'{c[4]}'.format(c=C())
'{c[4:6]}'.format(c=C())
'{c[anything goes!@#$%^&]}'.format(c=C())
C()[4:6]
Output (Try it online!):
4
'4:6'
'anything goes!@#$%^&'
slice(4, 6, None)
So while the 4
gets converted to an int
, the 4:6
isn't converted to slice(4, 6, None)
as in usual slicing. Instead, it remains simply the string '4:6'
. And that's not a valid type for indexing/slicing a string, hence the TypeError: string indices must be integers
you got.
Update:
Is that documented? Well... I don't see something really clear, but @GACy20 pointed out something subtle. The grammar has these rules
field_name ::= arg_name ("." attribute_name | "[" element_index "]")*
element_index ::= digit+ | index_string
index_string ::= <any source character except "]"> +
Our c[4:6]
is the field_name
, and we're interested in the element_index
part 4:6
. I think it would be clearer if digit+
had its own rule with meaningful name:
field_name ::= arg_name ("." attribute_name | "[" element_index "]")*
element_index ::= index_integer | index_string
index_integer ::= digit+
index_string ::= <any source character except "]"> +
I'd say having index_integer
and index_string would more clearly indicate that digit+
is converted to an integer (instead of staying a digit string), while <any source character except "]"> +
would stay a string.
That said, looking at the rules as they are, perhaps we should think "what would be the point of separating the digits case out of the any-characters case which would match it as well?" and think that the point is to treat pure digits differently, presumably to convert them to an integer. Or maybe some other part of the documentation even states that digit
or digits+
in general gets converted to an integer.
'{x[1]}'.format(x="asd")
the [1]
syntax here is not the "normal" string indexing syntax, even if in this case it appears to be working the same way.
It is using the Format Specification Mini-Language. The same mechanism that allows for passing objects and accessing an arbitrary attribute inside the formatted string (eg '{x.name}'.format(x=some_object)
).
This "fake" indexing syntax also allows to pass indexable objects to format
and directly getting the element you want from within the formatted string:
'{x[0]}'.format(x=('a', 'tuple'))
# 'a'
'{x[1]}'.format(x=('a', 'tuple'))
# 'tuple'
The only reference (that I could find, at least) for this in the docs is this paragraph:
The field_name itself begins with an arg_name that is either a number or a keyword. If it’s a number, it refers to a positional argument, and if it’s a keyword, it refers to a named keyword argument. If the numerical arg_names in a format string are 0, 1, 2, … in sequence, they can all be omitted (not just some) and the numbers 0, 1, 2, … will be automatically inserted in that order. Because arg_name is not quote-delimited, it is not possible to specify arbitrary dictionary keys (e.g., the strings '10' or ':-]') within a format string. The arg_name can be followed by any number of index or attribute expressions. An expression of the form '.name' selects the named attribute using getattr(), while an expression of the form '[index]' does an index lookup using
__getitem__()
.
While it mentions
while an expression of the form '[index]' does an index lookup using
__getitem__()
.
it does not mention anything about slicing syntax not being supported.
For me this feels like an oversight in the docs, especially because '{x[1:3]}'.format(x="asd")
generates such a cryptic error message, and even more so due to __getitem__
already supporting slicing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With