We can download all nltk data using:
> import nltk
> nltk.download('all')
Or specific data using:
> nltk.download('punkt')
> nltk.download('maxent_treebank_pos_tagger')
But I want to download all data except 'corpara' files, for example - all chunkers, grammers, models, stemmers, taggers, tokenizers, etc
is there any way to do so without Downloader UI? something like,
> nltk.download('all-taggers')
Download individual packages from https://www.nltk.org/nltk_data/ (see the “download” links). Unzip them to the appropriate subfolder.
List all corpora ids and set _status_cache[pkg.id] = 'installed'.
It will set status value for all corpora as 'installed' and corpora packages will be skipped when we use nltk.download().
Instead of downloading all corpora and models, if you're unsure of which corpora/package you need, use nltk.download('popular').
import nltk
dwlr = nltk.downloader.Downloader()
for pkg in dwlr.corpora():
dwlr._status_cache[pkg.id] = 'installed'
dwlr.download('popular')
To download all packages of specific folder.
import nltk
dwlr = nltk.downloader.Downloader()
# chunkers, corpora, grammars, help, misc,
# models, sentiment, stemmers, taggers, tokenizers
for pkg in dwlr.packages():
if pkg.subdir== 'taggers':
dwlr.download(pkg.id)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With