Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I save a Huggingface dataset?

How do I write a HuggingFace dataset to disk?

I have made my own HuggingFace dataset using a JSONL file:

Dataset({ features: ['id', 'text'], num_rows: 18 })

I would like to persist the dataset to disk.

Is there a preferred way to do this? Or, is the only option to use a general purpose library like joblib or pickle?

like image 512
Campbell Hutcheson Avatar asked Sep 07 '25 07:09

Campbell Hutcheson


1 Answers

You can save a HuggingFace dataset to disk using the save_to_disk() method.

For example:

from datasets import load_dataset
  
test_dataset = load_dataset("json", data_files="test.json", split="train")

test_dataset.save_to_disk("test.hf")
like image 51
Campbell Hutcheson Avatar answered Sep 11 '25 01:09

Campbell Hutcheson