Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read in csv with to to a DASK dataframe so it will not have “Unnamed: 0” column?

Goal

I want to read in a csv to a DASK dataframe without getting “Unnamed: 0” column.

CODE

mydtype = {'col1': 'object',
           'col2': 'object',
           'col3': 'object',
           'col4': 'float32',}


do = dd.read_csv('/folder/somecsvname.csv', 
                 dtype = mydtype, 
                 low_memory=False,
                 parse_dates=['col3'],
                )

Result Columns

  • Unnamed: 0
  • col1
  • col2
  • col3
  • col4

Tried solutions

  • 1.works with pandas not with dask - pd.read_csv add column named "Unnamed: 0
  • 2.works with pandas not with dask - How to get rid of "Unnamed: 0" column in a pandas DataFrame?
  • CODE added to read in: index_col=False ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead
  • CODE added to read in: index_col=0 ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead
  • CODE that recommended by previouse 2 error messages-> DISFUCTION: this just sets up a value as an index but still generates that 'Unnamed: 0' column
do = dd.read_csv('/folder/somecsvname.csv', 
                 dtype = mydtype, 
                 low_memory=False,
                 parse_dates=['col3'],
                ).set_index('col3')
  • CODE added to read in: index_col=None ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead
  • CODE added to read in: index_col=None, header=0 ERROR message: ValueError: Keywords 'index' and 'index_col' not supported. Use dd.read_csv(...).set_index('my-index') instead
like image 726
sogu Avatar asked Oct 21 '25 22:10

sogu


1 Answers

The problem is that this column (Unnamed: 0) is present in the original csv file. It's best to address it upstream, at the time this file is generated. If that's not possible, then the best you can do with dask.dataframe is:

ddf = dd.read_csv(my_file)
ddf = ddf.drop('Unnamed: 0', axis=1)

Here's a reproducible example:

import dask.dataframe as dd
import pandas as pd

df = pd.DataFrame(range(5))
df.to_csv('abc.csv')

ddf = dd.read_csv('abc.csv')
ddf = ddf.drop('Unnamed: 0', axis=1)
like image 171
SultanOrazbayev Avatar answered Oct 23 '25 10:10

SultanOrazbayev



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!