How can I back up dynamic content in a FiftyOne dataset? Tags are the most important data that needs to be backed up. Several of my users will spend quality time manually creating tags in the UI, and I'd like to make sure we back up their work. I do not need to back up static content such as the images themselves.
It would also be nice to back up detections and segmentations. For smaller data sets, I could regenerate these from a script, but for larger datasets, or situations where the source data (e.g. detections) change, it would be nice to not have to reconstruct these.
And, once I back up this data, how would I restore?
This workflow sounds like it would be solved with FiftyOne Teams, the enterprise version of FiftyOne designed for team-based collaboration on the same datasets. It is relevant not only because it supports multiple users working on a dataset simultaneously, but also because dataset versioning is on the near-term roadmap for FiftyOne Teams.
In FiftyOne, the current recommended method for backing up a Dataset
, DatasetView
, or Field
is to "clone" it.
For both a Dataset
and a DatasetView
, the clone() method will take the existing fields in the given samples and copy them over into a new Dataset
. When cloning a DatasetView
, only the fields that exist in the filtered view will be cloned.
You can also use clone_sample_field() to copy the contents of a view’s field into a new field of the underlying Dataset
. This applies to any sample field including tags and labels.
import fiftyone as fo
import fiftyone.zoo as foz
dataset = foz.load_zoo_dataset("quickstart")
# Create a view with only the "ground_truth" field
# and clone it into a new Dataset
view = dataset.select_fields("ground_truth")
bu_dataset = view.clone()
# Clone the "tags" field within the dataset
bu_dataset.clone_sample_field("tags", "tags_backup")
FiftyOne only stores the paths to images in the database, no media is ever copied. This means that when cloning a Dataset
, only the media file paths are duplicated, not the media itself.
Restoring a cloned field is as simple as renaming the field back to the original name.
bu_dataset.rename_sample_field("tags_backup", "tags")
For a more nuanced restoration, you can always iterate over the samples in a simple Python loop to restore exactly what you need:
for sample in bu_dataset:
backup_tags = sample.tags
if "validation" in backup_tags:
sample.tags = backup_tags
sample.save()
In order to restore a field from a cloned dataset, you can merge the samples from one dataset into another:
merge_view = bu_dataset.select_fields("ground_truth")
dataset.merge_samples(merge_view)
When planning on working with a Dataset
more than once, the
persistent
option should be set. When a Dataset
is set to be persistent
, then even when the Python kernel and backing MongoDB database are shut down, the Dataset
will not be deleted and can be reloaded at a future time.
For example, to persist a dataset:
import fiftyone as fo
import fiftyone.zoo as foz
# Create your dataset
dataset = foz.load_zoo_dataset(
"coco-2017",
split="validation",
max_samples=10,
dataset_name="my_dataset",
)
dataset.persistent = True
Now you can close Python, reopen it, and load the Dataset
.
import fiftyone as fo
print(fo.list_datasets())
# ["my_dataset"]
dataset = fo.load_dataset("my_dataset")
When a tag
is created and applied in the FiftyOne App, it is automatically backed up in the Dataset
and therefore in the backing MongoDB database.
For example, to create a custom_tag
in the Dataset
loaded above you can launch the App, select samples or labels, enter a tag, and hit "Apply":
session = fo.launch_app(dataset)
Back in Python, the Dataset
has been updated and the tags created in the App can be queried or backed up as shown above.
tagged_view = dataset.match_tags("custom_tag")
print(tagged_view)
Dataset: my_dataset
Media type: image
Num samples: 3
Tags: ['custom_tag', 'validation']
Sample fields:
id: fiftyone.core.fields.ObjectIdField
filepath: fiftyone.core.fields.StringField
tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
View stages:
1. MatchTags(tags=['custom_tag'], bool=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With