Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load custom dataset from CSV in Huggingfaces

I would like to load a custom dataset from csv using huggingfaces-transformers

like image 895
juuso Avatar asked Sep 10 '25 14:09

juuso


2 Answers

From https://huggingface.co/docs/datasets/loading_datasets.html#loading-from-local-files

dataset = load_dataset('csv', data_files={'train': "train_set.csv",'test': "test_set.csv"})
like image 135
juuso Avatar answered Sep 13 '25 12:09

juuso


You can use load_dataset directly as shown in the official documentation.

I can't find any documentation about supported arguments, but in my experiments they seem to match those of pandas.read_csv

file_dict = {
  "train" : "train.csv",
  "test" : "test.csv"
}

load_dataset(
  'csv',
  data_files=file_dict,
  delimiter=',',
  column_names=['column01', 'column02', 'column03'],
  skiprows=1
)
like image 41
Waylon Flinn Avatar answered Sep 13 '25 13:09

Waylon Flinn