Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to download punkt tokenizer in nltk?

I installed the NLTK library using

pip install nltk

and while using the lib

from nltk.tokenize import sent_tokenize 
sent_tokenize(text)

I am getting this error

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt/english.pickle

  Searched in:
    - 'C:\\Users\\adars/nltk_data'
    - 'C:\\Users\\adars\\AppData\\Local\\Programs\\Python\\Python310\\nltk_data'
    - 'C:\\Users\\adars\\AppData\\Local\\Programs\\Python\\Python310\\share\\nltk_data'
    - 'C:\\Users\\adars\\AppData\\Local\\Programs\\Python\\Python310\\lib\\nltk_data'
    - 'C:\\Users\\adars\\AppData\\Roaming\\nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - ''

So in order to solve this error i tried

import nltk
nltk.download('punkt')

but then i am unable to download this package because everytime i run this i get error that says

[nltk_data] Error loading punkt: <urlopen error [WinError 10060] A
[nltk_data]     connection attempt failed because the connected party
[nltk_data]     did not properly respond after a period of time, or
[nltk_data]     established connection failed because connected host
[nltk_data]     has failed to respond>

please help me out here

like image 909
Adarsh Agarwal Avatar asked Oct 21 '25 00:10

Adarsh Agarwal


1 Answers

This is most likely happening due to a network issue. You can try either of the following solutions:

1. Add the following in your script:

import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download('punkt')

2. If the above solution doesn't help, then:

  • Manually download 'punkt' from here.
  • Unzip the file and then go to 'C:/Users/adars/AppData/Local/Programs/Python/Python310/lib/' where you need to create a folder named nltk_data.
  • Under nltk_data, create another folder named tokenizers and place the extracted folder punkt there so that you get a new directory tokenizers/punkt which would contain all the .pickle files.
  • Once that's done, you don't need to do nltk.download('punkt') again, just directly run your code.
like image 64
Ro.oT Avatar answered Oct 22 '25 14:10

Ro.oT