I am trying to load the train and test data frame into the dataset object. The usual way to load a pandas dataframe into dataset object is:
from datasets import Dataset
import pandas as pd
df = pd.DataFrame({"a": [1, 2, 3]})
dataset = Dataset.from_pandas(df)
My question is how to load train and test both pandas dataframe into the dataset?
for example if I have two dataframes:
from datasets import Dataset
import pandas as pd
df_train = pd.DataFrame({"a": [1, 2, 3]})
df_test = pd.DataFrame({"ab": [1, 2, 3]})
How to load these two frames?
You are able to load both pandas dataframes into a dataset using DatasetDict
and Dataset.from_pandas()
. You can keep your train and test split by adding keys to to the DatasetDict.
from datasets import Dataset, DatasetDict
import datasets
import pandas as pd
df_train = pd.DataFrame({"a": [1, 2, 3]})
df_test = pd.DataFrame({"ab": [1, 2, 3]})
datasets_train_test = DatasetDict({
"train": Dataset.from_pandas(df_train),
"test": Dataset.from_pandas(df_test)
})
which results in
DatasetDict({
train: Dataset({
features: ['a'],
num_rows: 3
})
test: Dataset({
features: ['ab'],
num_rows: 3
})
})
You can use concatenate_datasets()
to concatenate a list of datasets.
dataset_train = Dataset.from_pandas(df_train)
dataset_test = Dataset.from_pandas(df_test)
datasets_all_in_one = datasets.concatenate_datasets([dataset_train, dataset_test])
which results in
Dataset({
features: ['a', 'ab'],
num_rows: 6
})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With