Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Pandas itertuples() the string 'class' gets converted to underscore ('_1') in namedtuple

Tags:

python

pandas

I am trying to do some data cleaning and using the pandas 'itertuples' function to generate named tuples for storage in a data frame. However, when I use itertuples the column named 'class' is being stored as '_1' in the named tuple, whereas all the other column names convert correctly. For instance, the 'subclass' column correctly converts to 'subclass' in the named tuple.

Code and output for one row is as follows:

ipcs.rename(columns={'ipc_section':'section',
                  'ipc_class':'class',
                  'ipc_subclass':'subclass',
                  'ipc_main_group':'group',
                  'ipc_subgroup':'subgroup',
                  'ipc_sequence':'order'}, inplace=True)

[item for item in 
ipcs[['section','class', 'subclass', 'group', 'subgroup', 'order']]
.itertuples(index=False,name='IPC')]
Out[45]: 
[IPC(section='A', _1='61', subclass='F', group='9', subgroup='00', order='0')]

What is going on here? I assume it's something to do with 'class' being a keyword in Python. Any way to get around this?

like image 485
bradchattergoon Avatar asked Oct 25 '25 19:10

bradchattergoon


2 Answers

Found the answer in the documentation for namedtuples and itertuples.

From the namedtuples documentation we find the following.

The full namedtuple function is:

collections.namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)

And the documentation states: "If rename is true, invalid fieldnames are automatically replaced with positional names. For example, ['abc', 'def', 'ghi', 'abc'] is converted to ['abc', '_1', 'ghi', '_3'], eliminating the keyword def and the duplicate fieldname abc."

In the Pandas itertuples function documentation we see the following:

if name is not None and len(self.columns) + index < 256:
            itertuple = collections.namedtuple(name, fields, rename=True)
            return map(itertuple._make, zip(*arrays))

Therefore, if we specify a name for the tuple (ergo making it a named tuple rather than normal tuple) we trigger this function and the Pandas function specifies the rename parameter as True so it automatically converts 'class' which is an invalid field name to a positional name.

Notice that this differs slightly from @chepner's comment on the question. Specifically, it IS possible to use 'class' as a column name (setting 'ipc_class' to 'class' as a column name does work) BUT the itertuples function sets the rename parameter to True so when the column names are passed to itertuples the field name changes to a positional one. If rename is set to False the namedtuple function throws an error instead.

like image 179
bradchattergoon Avatar answered Oct 28 '25 08:10

bradchattergoon


Just change the name in the columns attribute to avoid conflicting with a Python reserved keyword.

ipcs.rename(columns={'ipc_section':'section',
                  'ipc_class':'class_',  # class_, not class
                  'ipc_subclass':'subclass',
                  'ipc_main_group':'group',
                  'ipc_subgroup':'subgroup',
                  'ipc_sequence':'order'}, inplace=True)
like image 31
chepner Avatar answered Oct 28 '25 09:10

chepner



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!