天天看點

nltk安裝第三方自然語言處理工具



nltk安裝第三方自然語言處理工具:

https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software

How NLTK Discovers Third Party Software

NLTK finds third party software through environment variables or via path arguments through api calls. This page will list installation instructions & their associated environment variables.

Java

Java is not required by nltk, however some third party software may be dependent on it. NLTK finds the java binary via the system

PATH

environment variable, or through

JAVAHOME

or

JAVA_HOME

.

To search for java binaries (jar files), nltk checks the java

CLASSPATH

variable, however there are usually independent environment variables which are also searched for each dependency individually.

Windows

  • Download & Install the jdk on java's official website: http://www.oracle.com/technetwork/java/javase/downloads/index.html?ssSourceSiteId=otnjp

Linux

It is best to use the package manager to install java.

Stanford Tagger, NER, Tokenizer and Parser.

To install:

  • Make sure java is installed (version 1.8+)
  • Download & extract the stanford tokenizer package (contains the stanford tagger): http://nlp.stanford.edu/software/lex-parser.shtml
  • Download & extract the stanford NER package http://nlp.stanford.edu/software/CRF-NER.shtml
  • Download & extract the stanford POS tagger package http://nlp.stanford.edu/software/tagger.shtml
  • Download & extract the stanford Parser package: http://nlp.stanford.edu/software/lex-parser.shtml
  • Add the directories containing

    stanford-postagger.jar

    ,

    stanford-ner.jar

    and

    stanford-parser.jar

    to the

    CLASSPATH

    environment variable
  • Point the

    STANFORD_MODELS

    environment variable to the directory containing the stanford tokenizer models, stanford pos models, stanford ner models, stanford parser models e.g (

    arabic.tagger

    ,

    arabic-train.tagger

    ,

    chinese-distsim.tagger

    ,

    stanford-parser-x.x.x-models.jar

    ...)
  • e.g.

    export STANFORD_MODELS=/usr/share/stanford-postagger-full-2015-01-30/models:/usr/share/stanford-ner-2015-04-20/classifier