nltk安裝第三方自然語言處理工具:
https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software
How NLTK Discovers Third Party Software
NLTK finds third party software through environment variables or via path arguments through api calls. This page will list installation instructions & their associated environment variables.
Java
Java is not required by nltk, however some third party software may be dependent on it. NLTK finds the java binary via the system
PATH
environment variable, or through
JAVAHOME
or
JAVA_HOME
.
To search for java binaries (jar files), nltk checks the java
CLASSPATH
variable, however there are usually independent environment variables which are also searched for each dependency individually.
Windows
- Download & Install the jdk on java's official website: http://www.oracle.com/technetwork/java/javase/downloads/index.html?ssSourceSiteId=otnjp
Linux
It is best to use the package manager to install java.
Stanford Tagger, NER, Tokenizer and Parser.
To install:
- Make sure java is installed (version 1.8+)
- Download & extract the stanford tokenizer package (contains the stanford tagger): http://nlp.stanford.edu/software/lex-parser.shtml
- Download & extract the stanford NER package http://nlp.stanford.edu/software/CRF-NER.shtml
- Download & extract the stanford POS tagger package http://nlp.stanford.edu/software/tagger.shtml
- Download & extract the stanford Parser package: http://nlp.stanford.edu/software/lex-parser.shtml
- Add the directories containing
,stanford-postagger.jar
andstanford-ner.jar
to thestanford-parser.jar
environment variableCLASSPATH
- Point the
environment variable to the directory containing the stanford tokenizer models, stanford pos models, stanford ner models, stanford parser models e.g (STANFORD_MODELS
,arabic.tagger
,arabic-train.tagger
,chinese-distsim.tagger
...)stanford-parser-x.x.x-models.jar
- e.g.
export STANFORD_MODELS=/usr/share/stanford-postagger-full-2015-01-30/models:/usr/share/stanford-ner-2015-04-20/classifier