Getting started with NLTK : The First Step installing the NLTK
- May 31, 2018
- By Pawan Prasad
- 0 Comments
Natural Language Processing can be implemented in python using Natural Language Toolkit (NLTK) which is a suite of python libraries for text analysis and human language data processing.
Before Starting with NLTK lets understand what is Natural Language Processing (NLP). NLP is extracting information from unstructured data like finding the name of the places, names and named entities from given text and analyzing the linguist structure in the text like semantics analysis.
NLTK is suites of open source libraries in python and using these libraries we do Natural Language Processing in python on human language data in text form. NLTK has over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
How to install NLTK
NLTK is compatible with python versions 2.7, 3.4, 3.5, and 3.6.
Mac/Unix
- Install NLTK: run sudo pip install -U nltk
- Install Numpy (optional): run sudo pip install -U numpy
- Test installation: run python then type import nltk
Windows
These instructions assume that you do not already have Python installed on your machine.
32-bit binary installation
- Install Python 3.6: http://www.python.org/downloads/ (avoid the 64-bit versions)
- Install Numpy (optional): http://sourceforge.net/projects/numpy/files/NumPy/ (the version that specifies python3.5)
- Install NLTK: http://pypi.python.org/pypi/nltk or if you have pip then type pip install nltk
- Test installation: Start>Python35, then type import nltk
Installing NLTK data
NLTK comes with corpora in form of books, labeled training data, training models and stop words corpus. There are 106 corpus and trained model which can be downloaded to make our language processing task easier.
To download all this data run the following commands in the console.
>>import nltk
>>nltk.download()
the last command will open a window like this will be open.
as I have already done that's why the status is "installed". In case of fresh installation, this would be set to "not installed".
select all and click on download button. It will start downloading all the corpus packages and models.
However, if you wish to change the download directory click on file and then click on change the download direcroty before hitting the download.
If you are facing the issue in downloading then maybe your web connection is using the proxy. To solve this, before hitting nltk.download() you have the set the proxy for nltk. This can be done using :
>>> nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))
>>> nltk.download()
Now you have successfully installed the nltk and nltk data. If you still facing problems in installation please do comment below I will try to help you.
0 comments