I want to set the value of a pandas column as a list of strings. However, my efforts to do so didn't succeed because pandas take the column value as an iterable and I get a: ValueError: Must have equal len keys and value when setting with an iterable. 
Here is an MWE
>> df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
>> df
col1    col2
0   1   4
1   2   5
2   3   6
>> df['new_col'] = None
>> df.loc[df.col1 == 1, 'new_col'] = ['a', 'b']
ValueError: Must have equal len keys and value when setting with an iterable
I tried to set the dtype as list using df.new_col = df.new_col.astype(list) and that didn't work either. 
I am wondering what would be the correct approach here.
EDIT
The answer provided here: Python pandas insert list into a cell using at didn't work for me either. 
values. tolist() you can convert pandas DataFrame Column to List. df['Courses'] returns the DataFrame column as a Series and then use values. tolist() to convert the column values to list.
Use the tolist() Method to Convert a Dataframe Column to a List. A column in the Pandas dataframe is a Pandas Series . So if we need to convert a column to a list, we can use the tolist() method in the Series . tolist() converts the Series of pandas data-frame to a list.
One problem you will always encounter is that Pandas will read your lists as strings, not as lists. This means that you can not even loop through the lists to count unique values or frequencies. Depending on how your lists are formatted in the dataframe, there is an easy or a more complex solution.
You can set cell value of pandas dataframe using df.at[row_label, column_label] = 'Cell Value'. It is the fastest method to set the value of the cell of the pandas dataframe. Dataframe at property of the dataframe allows you to access the single value of the row/column pair using the row and column labels.
Use the tolist () Method to Convert a Dataframe Column to a List A column in the Pandas dataframe is a Pandas Series. So if we need to convert a column to a list, we can use the tolist () method in the Series. tolist () converts the Series of pandas data-frame to a list.
In this article, we will discuss how to set cell values in Pandas DataFrame in Python. This method is used to set the value for existing value or set a new record. Here we are using loc () method to set the column value based on row index and column name
Depending on your needs, you may use either of the two approaches below to set column as index in Pandas DataFrame: df.set_index ( ['column_1','column_2',...])
return list_ To apply this to your dataframe, use this pseudo code: df [col] = df [col].apply (clean_alt_list) Note that in both cases, Pandas will still assign the series an “O” datatype, which is typically used for strings.
Not easy, one possible solution is create helper Series:
df.loc[df.col1 == 1, 'new_col'] = pd.Series([['a', 'b']] * len(df))
print (df)
   col1  col2 new_col
0     1     4  [a, b]
1     2     5     NaN
2     3     6     NaN
Another solution, if need set missing values to empty list too is use list comprehension:
#df['new_col'] = [['a', 'b'] if x == 1 else np.nan for x in df['col1']]
df['new_col'] = [['a', 'b'] if x == 1 else [] for x in df['col1']]
print (df)
   col1  col2 new_col
0     1     4  [a, b]
1     2     5      []
2     3     6      []
But then you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks.
Pandas was never designed to hold lists in series / columns. You can concoct expensive workarounds, but these are not recommended.
The main reason holding lists in series is not recommended is you lose the vectorised functionality which goes with using NumPy arrays held in contiguous memory blocks. Your series will be of object dtype, which represents a sequence of pointers, much like list. You will lose benefits in terms of memory and performance, as well as access to optimized Pandas methods.
See also What are the advantages of NumPy over regular Python lists? The arguments in favour of Pandas are the same as for NumPy.
That said, since you are going against the purpose and design of Pandas, there are many who face the same problem and have asked similar questions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With