Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Train NER in spacy v3 needs dev.spacy at command line

Tags:

python

spacy-3

I am trying to prepare a custom ner model in spacy v3. V3 has changed significantly as compared to v2 from training perspective.

I am Using the default config with en_web_lg. I have prepared the training data (training.spacy) using convert command. However, the training command needs a dev.spacy file.

Not sure what data is expected there in dev.spacy. Is this asking a plain text corpus for the training.spacy file? But then is there a way to convert the plain text file in spacy format..

Command from spacy site- python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy

Can someone pls help explain on how to prep the dev.spacy.

like image 434
Milind Avatar asked Oct 18 '25 16:10

Milind


2 Answers

The train.spacy is a placeholder for collection of 'training' files - a directory of files usually using the Spacy convert utility. The dev.spacy is a placeholder for collection of 'validation' files - same format as training files, but used as a validation sample during training (for NER used to compute the prediction, recall and f-score after each training iteration). Commonly suggested 'size' of validation sample is between 10 to 20% of training sample. I tend to use 20% because my data has a large variation - but larger validation sample adds training overhead.

like image 95
mbrunecky Avatar answered Oct 21 '25 06:10

mbrunecky


The dev.spacy file should look exactly the same as the train.spacy file, but should contain new examples that the training process hasn't seen before to get a realistic evaluation of the performance of your model.

To create this dev set, you can first split your original data into train/dev parts, and then run convert separately on each of them, calling the larger one train.spacy and the smaller one dev.spacy. As @mbrunecky suggests, an 80-20 split is usually good, but it depends on the dataset.

like image 41
Sofie VL Avatar answered Oct 21 '25 05:10

Sofie VL