Lucene專題-索引庫維護

1.索引庫的添加

1.1.Field域的屬性

是否分析：是否對域的内容進行分詞處理。前提是我們要對域的内容進行查詢。

是否索引：将Field分析後的詞或整個Field值進行索引，隻有索引方可搜尋到。

比如：商品名稱、商品簡介分析後進行索引，訂單号、身份證号不用分析但也要索引，這些将來都要作為查詢條件。

是否存儲：将Field值存儲在文檔中，存儲在文檔中的Field才可以從Document中擷取

比如：商品名稱、訂單号，凡是将來要從Document中擷取的Field都要存儲。

是否存儲的标準：是否要将内容展示給使用者

Field類	資料類型	Analyzed是否分析	Indexed是否索引	Stored是否存儲	說明
StringField(FieldName, FieldValue,Store.YES))	字元串	N	Y	Y或N	這個Field用來建構一個字元串Field，但是不會進行分析，會将整個串存儲在索引中，比如(訂單号,姓名等)是否存儲在文檔中用Store.YES或Store.NO決定
LongPoint(String name, long… point)	Long型	Y	Y	N	可以使用LongPoint、IntPoint等類型存儲數值類型的資料。讓數值類型可以進行索引。但是不能存儲資料，如果想存儲資料還需要使用StoredField。
StoredField(FieldName, FieldValue)	重載方法，支援多種類型	N	N	Y	這個Field用來建構不同類型Field不分析，不索引，但要Field存儲在文檔中。
TextField(FieldName, FieldValue, Store.NO)或TextField(FieldName, reader))	字元串或流	Y	Y	Y或N	如果是一個Reader, lucene猜測内容比較多,會采用Unstored的政策.

1.2.添加文檔代碼實作

//添加索引
@Test
public void addDocument() throws Exception {
    //索引庫存放路徑
    Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
    IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
    //建立一個indexwriter對象
    IndexWriter indexWriter = new IndexWriter(directory, config);
    //建立一個Document對象
    Document document = new Document();
    //向document對象中添加域。
    //不同的document可以有不同的域，同一個document可以有相同的域。
    document.add(new TextField("filename", "新添加的文檔", Field.Store.YES));
    document.add(new TextField("content", "新添加的文檔的内容", Field.Store.NO));
    //LongPoint建立索引
    document.add(new LongPoint("size", 1000l));
    //StoreField存儲資料
    document.add(new StoredField("size", 1000l));
    //不需要建立索引的就使用StoreField存儲
    document.add(new StoredField("path", "d:/temp/1.txt"));
    //添加文檔到索引庫
    indexWriter.addDocument(document);
    //關閉indexwriter
    indexWriter.close();
}

2.索引庫删除

2.1.删除全部

//删除全部索引
 @Test
 public void deleteAllIndex() throws Exception {
     //索引庫存放路徑
     Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
     IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
     //建立一個indexwriter對象
     IndexWriter indexWriter = new IndexWriter(directory, config);
     //删除全部索引
     indexWriter.deleteAll();
     //關閉indexwriter
     indexWriter.close();
 }

說明：将索引目錄的索引資訊全部删除，直接徹底删除，無法恢複。

2.2.指定查詢條件删除

//根據查詢條件删除索引
@Test
public void deleteIndexByQuery() throws Exception {
    //索引庫存放路徑
    Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
    IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
    IndexWriter indexWriter = new IndexWriter(directory, config);
    //建立一個查詢條件
    Query query = new TermQuery(new Term("filename", "apache"));
    //根據查詢條件删除
    indexWriter.deleteDocuments(query);
    //關閉indexwriter
    indexWriter.close();
}

2.3.索引庫的修改

原理就是先删除後添加。

//修改索引庫
@Test
public void updateIndex() throws Exception {
    Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
    IndexWriterConfig config = new IndexWriterConfig(new IKAnalyzer());
    IndexWriter indexWriter = new IndexWriter(directory, config);
    //建立一個Document對象
    Document document = new Document();
    //向document對象中添加域。
    //不同的document可以有不同的域，同一個document可以有相同的域。
    document.add(new TextField("filename", "要更新的文檔", Field.Store.YES));
    document.add(new TextField("content", " Lucene 簡介 Lucene 是一個基于 Java 的全文資訊檢索工具包," +
            "它不是一個完整的搜尋應用程式,而是為你的應用程式提供索引和搜尋功能。",
            Field.Store.YES));
    indexWriter.updateDocument(new Term("content", "java"), document);
    //關閉indexWriter
    indexWriter.close();
}

3.Lucene索引庫查詢

對要搜尋的資訊建立Query查詢對象，Lucene會根據Query查詢對象生成最終的查詢文法，類似關系資料庫Sql文法一樣Lucene也有自己的查詢文法，比如：“name:lucene”表示查詢Field的name“lucene”的文檔資訊。

可通過兩種方法建立查詢對象：

1）使用Lucene提供Query子類
2）使用QueryParse解析查詢表達式

3.1.TermQuery

//使用Termquery查詢
@Test
public void testTermQuery() throws Exception {
    Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
    IndexReader indexReader = DirectoryReader.open(directory);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);

    //建立查詢對象
    Query query = new TermQuery(new Term("content", "lucene"));
    //執行查詢
    TopDocs topDocs = indexSearcher.search(query, 10);
    //共查詢到的document個數
    System.out.println("查詢結果總數量：" + topDocs.totalHits);
    //周遊查詢結果
    for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
        Document document = indexSearcher.doc(scoreDoc.doc);
        System.out.println(document.get("filename"));
        //System.out.println(document.get("content"));
        System.out.println(document.get("path"));
        System.out.println(document.get("size"));
    }
    //關閉indexreader
    indexSearcher.getIndexReader().close();
}

3.2.數值範圍查詢

@Test
public void testRangeQuery() throws Exception {
    Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
    IndexReader indexReader = DirectoryReader.open(directory);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);

    Query query = LongPoint.newRangeQuery("size", 0l, 10000l);
    printResult(query, indexSearcher);
}

private void printResult(Query query, IndexSearcher indexSearcher) throws Exception {
    //執行查詢
    TopDocs topDocs = indexSearcher.search(query, 10);
    //共查詢到的document個數
    System.out.println("查詢結果總數量：" + topDocs.totalHits);
    //周遊查詢結果
    for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
        Document document = indexSearcher.doc(scoreDoc.doc);
        System.out.println(document.get("filename"));
        //System.out.println(document.get("content"));
        System.out.println(document.get("path"));
        System.out.println(document.get("size"));
    }
    //關閉indexreader
    indexSearcher.getIndexReader().close();
}

3.3.使用queryparser查詢

@Test
public void testQueryParser() throws Exception {
    Directory directory = FSDirectory.open(new File("D:\\temp\\index").toPath());
    IndexReader indexReader = DirectoryReader.open(directory);
    IndexSearcher indexSearcher = new IndexSearcher(indexReader);
    //建立queryparser對象
    //第一個參數預設搜尋的域
    //第二個參數就是分析器對象
    QueryParser queryParser = new QueryParser("content", new IKAnalyzer());
    Query query = queryParser.parse("Lucene是java開發的");
    //執行查詢
    printResult(query, indexSearcher);
}

private void printResult(Query query, IndexSearcher indexSearcher) throws Exception {
    //執行查詢
    TopDocs topDocs = indexSearcher.search(query, 10);
    //共查詢到的document個數
    System.out.println("查詢結果總數量：" + topDocs.totalHits);
    //周遊查詢結果
    for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
        Document document = indexSearcher.doc(scoreDoc.doc);
        System.out.println(document.get("filename"));
        //System.out.println(document.get("content"));
        System.out.println(document.get("path"));
        System.out.println(document.get("size"));
    }
    //關閉indexreader
    indexSearcher.getIndexReader().close();
}

Lucene專題-索引庫維護

1.索引庫的添加

1.1.Field域的屬性

1.2.添加文檔代碼實作

2.索引庫删除

2.1.删除全部

2.2.指定查詢條件删除

2.3.索引庫的修改

3.Lucene索引庫查詢

3.1.TermQuery

3.2.數值範圍查詢

3.3.使用queryparser查詢

繼續閱讀

HDU5294 Tricks Device（最大流+SPFA） 2015 Multi-University Training Contest 1Tricks Device

HDU/HDOJ 1671 Phone List

仿京東收貨位址

HDU 1067（HASH + BFS）

HDU 1067 HASH判重BFS

POJ-2046---Gap （bfs+hash)

hdu 1067(bfs+hash判重)Gap

hdu 5487 Difference of Languages BFS Difference of Languages

【Python】Qt國際化ts檔案轉excel檔案（xml轉excel）

lucene 關鍵字高亮

Unable to resolve dependency for ':app@debug/compileClasspath': Could not resolve com.android.suppo

Java網絡程式設計-Socket程式設計初涉七（UDP協定，簡易提供-搜尋服務）

HDU 2533 N皇後問題（搜尋）

如何下載下傳blob:https://www.bilibili.com/的視訊

BZOJ3643 Phi的反函數（數論+搜尋）

專家訪談：搜尋開源力量：Lucene技術前景