Lucene5學習之FunctionQuery功能查詢

我猜，大家最大的疑問就是：不是已經有那麼多query實作類嗎，為什麼又設計一個functionquery,它的設計初衷是什麼，或者說它是用來解決什麼問題的？我們還是來看看源碼裡是怎麼解釋functionquery的：

意思就是基于valuesource來傳回每個文檔的評分即valuesourcescore,那valuesource又是怎麼東東？接着看看valuesource源碼裡的注釋說明：

valuesource是用來根據指定的indexreader來執行個體化functionvalues的，那functionvalues又是啥？

從接口中定義的函數可以了解到，functionvalues提供了根據文檔id擷取各種類型的docvaluesfield域的值的方法，那這些接口傳回的域值用來幹嘛的，翻看functionquery源碼，你會發現：

從上面幾張圖，我們會發現，functionquery構造的時候需要提供一個valuesource,然後在functionquery的内部類allscorer中通過valuesource執行個體化了functionvalues,然後在計算functionquery評分的時候通過functionvalues擷取docvaluesfield的域值，域值和functionquery的權重值相乘得到functionquery的評分。

float score = qweight * vals.floatval(doc);

那這裡valuesource又起什麼作用呢，為什麼不直接讓functionquery來建構functionvalues，而是要引入一個中間角色valuesource呢？

因為functionquery應該線程安全的，即允許多次查詢共用同一個functionquery執行個體，如果讓functionvalues直接依賴functionquery,那可能會導緻某個線程通過functionvalues得到的docvaluesfield域值被另一個線程修改了，是以引入了一個valuessource，讓每個functionquery對應一個valuesource,再讓valuesource去生成functionvalues,因為docvaluesfield域值的正确性會影響到最後的評分。另外出于緩存原因，因為每次通過functionvalues去加載docvaluesfield的域值，其實還是通過indexreader去讀取的，這就意味着有磁盤io行為，磁盤io次數可是程式性能殺手哦，是以設計cachingdoublevaluesource來包裝valuesource.不過cachingdoublevaluesource貌似還處在捐獻子產品，不知道下個版本是否會考慮為valuesource添加cache功能。

valuesource構造很簡單，

public doublefieldsource(string field) {

super(field);

}

你隻需要提供一個域的名稱即可，不過要注意，這裡的域必須是docvaluesfield,不能是普通的stringfield,textfield,intfield,floatfield,longfield。

那functionquery可以用來解決什麼問題？舉個例子：比如你索引了n件商品，你希望通過某個關鍵字搜尋時，出來的結果優先按最近上架的商品顯示，再按商品和搜尋關鍵字比對度高低降序顯示，即你希望最近上架的優先靠前顯示，評分高的靠前顯示。

下面是一個functionquery使用示例，模拟類似這樣的場景：

書籍的出版日期越久遠，其權重因子會按天數一天天衰減，進而實作讓新書自動靠前顯示

package com.yida.framework.lucene5.function;

import java.io.ioexception;

import java.util.map;

import org.apache.lucene.index.docvalues;

import org.apache.lucene.index.leafreadercontext;

import org.apache.lucene.index.numericdocvalues;

import org.apache.lucene.queries.function.functionvalues;

import org.apache.lucene.queries.function.valuesource.fieldcachesource;

import com.yida.framework.lucene5.util.score.scoreutils;

/**

* 自定義valuesource[計算日期遞減時的權重因子，日期越近權重值越高]

* @author lanxiaowei

public class datedampingvaluesouce extends fieldcachesource {

//目前時間

private static long now;

public datedampingvaluesouce(string field) {

super(field);

//初始化目前時間

now = system.currenttimemillis();

}

/**

* 這裡map裡存的是indexseacher,context.get("searcher");擷取

@override

public functionvalues getvalues(map context, leafreadercontext leafreadercontext)

throws ioexception {

final numericdocvalues numericdocvalues = docvalues.getnumeric(leafreadercontext.reader(), field);

return new functionvalues() {

@override

public float floatval(int doc) {

return scoreutils.getnewsscorefactor(now, numericdocvalues,doc);

}

public int intval(int doc) {

return (int) scoreutils.getnewsscorefactor(now, numericdocvalues,doc);

public string tostring(int doc) {

return description() + '=' + intval(doc);

};

}

package com.yida.framework.lucene5.util.score;

import com.yida.framework.lucene5.util.constans;

* 計算衰減因子[按天為機關]

public class scoreutils {

/**存儲衰減因子-按天為機關*/

private static float[] daysdampingfactor = new float[120];

/**降級閥值*/

private static float demoteboost = 0.9f;

static {

daysdampingfactor[0] = 1;

//第一周時權重降級處理

for (int i = 1; i < 7; i++) {

daysdampingfactor[i] = daysdampingfactor[i - 1] * demoteboost;

}

//第二周

for (int i = 7; i < 31; i++) {

daysdampingfactor[i] = daysdampingfactor[i / 7 * 7 - 1]

* demoteboost;

//第三周以後

for (int i = 31; i < daysdampingfactor.length; i++) {

daysdampingfactor[i] = daysdampingfactor[i / 31 * 31 - 1]

//根據相差天數擷取目前的權重衰減因子

private static float daydamping(int delta) {

float factor = delta < daysdampingfactor.length ? daysdampingfactor[delta]

: daysdampingfactor[daysdampingfactor.length - 1];

system.out.println("delta:" + delta + "-->" + "factor:" + factor);

return factor;

public static float getnewsscorefactor(long now, numericdocvalues numericdocvalues, int docid) {

long time = numericdocvalues.get(docid);

float factor = 1;

int day = (int) (time / constans.day_millis);

int nowday = (int) (now / constans.day_millis);

system.out.println(day + ":" + nowday + ":" + (nowday - day));

// 如果提供的日期比目前日期小，則計算相差天數，傳入daydamping計算日期衰減因子

if (day < nowday) {

factor = daydamping(nowday - day);

} else if (day > nowday) {

//如果提供的日期比目前日期還大即提供的是未來的日期

factor = float.min_value;

} else if (now - time <= constans.half_hour_millis && now >= time) {

//如果兩者是同一天且提供的日期是過去半小時之内的，則權重因子乘以2

factor = 2;

public static float getnewsscorefactor(long now, long time) {

public static float getnewsscorefactor(long time) {

long now = system.currenttimemillis();

return getnewsscorefactor(now, time);

import java.nio.file.paths;

import java.text.dateformat;

import java.text.parseexception;

import java.text.simpledateformat;

import java.util.date;

import org.apache.lucene.analysis.analyzer;

import org.apache.lucene.analysis.standard.standardanalyzer;

import org.apache.lucene.document.document;

import org.apache.lucene.document.field;

import org.apache.lucene.document.field.store;

import org.apache.lucene.document.longfield;

import org.apache.lucene.document.numericdocvaluesfield;

import org.apache.lucene.document.textfield;

import org.apache.lucene.index.directoryreader;

import org.apache.lucene.index.indexreader;

import org.apache.lucene.index.indexwriter;

import org.apache.lucene.index.indexwriterconfig;

import org.apache.lucene.index.indexwriterconfig.openmode;

import org.apache.lucene.index.term;

import org.apache.lucene.queries.customscorequery;

import org.apache.lucene.queries.function.functionquery;

import org.apache.lucene.search.indexsearcher;

import org.apache.lucene.search.scoredoc;

import org.apache.lucene.search.sort;

import org.apache.lucene.search.sortfield;

import org.apache.lucene.search.termquery;

import org.apache.lucene.search.topdocs;

import org.apache.lucene.store.directory;

import org.apache.lucene.store.fsdirectory;

* functionquery測試

public class functionquerytest {

private static final dateformat formate = new simpledateformat("yyyy-mm-dd");

public static void main(string[] args) throws exception {

string indexdir = "c:/lucenedir-functionquery";

directory directory = fsdirectory.open(paths.get(indexdir));

//system.out.println(0.001953125f * 100000000 * 0.001953125f / 100000000);

//建立測試索引[注意：隻用建立一次，第二次運作前請注釋掉這行代碼]

//createindex(directory);

indexreader reader = directoryreader.open(directory);

indexsearcher searcher = new indexsearcher(reader);

//建立一個普通的termquery

termquery termquery = new termquery(new term("title", "solr"));

//根據可以計算日期衰減因子的自定義valuesource來建立functionquery

functionquery functionquery = new functionquery(new datedampingvaluesouce("publishdate"));

//自定義評分查詢[customscorequery将普通query和functionquery組合在一起，至于兩者的query評分按什麼算法計算得到最後得分，由使用者自己去重寫來幹預評分]

//預設實作是把普通查詢評分和functionquery進階查詢評分相乘求積得到最終得分，你可以自己重寫預設的實作

customscorequery customscorequery = new customscorequery(termquery, functionquery);

//建立排序器[按評分降序排序]

sort sort = new sort(new sortfield[] {sortfield.field_score});

topdocs topdocs = searcher.search(customscorequery, null, integer.max_value, sort,true,false);

scoredoc[] docs = topdocs.scoredocs;

for (scoredoc scoredoc : docs) {

int docid = scoredoc.doc;

document document = searcher.doc(docid);

string title = document.get("title");

string publishdatestring = document.get("publishdate");

system.out.println(publishdatestring);

long publishmills = long.valueof(publishdatestring);

date date = new date(publishmills);

publishdatestring = formate.format(date);

float score = scoredoc.score;

system.out.println(docid + " " + title + " " +

publishdatestring + " " + score);

reader.close();

directory.close();

* 建立document對象

* @param title 書名

* @param publishdatestring 書籍出版日期

* @return

* @throws parseexception

public static document createdocument(string title,string publishdatestring) throws parseexception {

date publishdate = formate.parse(publishdatestring);

document doc = new document();

doc.add(new textfield("title",title,field.store.yes));

doc.add(new longfield("publishdate", publishdate.gettime(),store.yes));

doc.add(new numericdocvaluesfield("publishdate", publishdate.gettime()));

return doc;

//建立測試索引

public static void createindex(directory directory) throws parseexception, ioexception {

analyzer analyzer = new standardanalyzer();

indexwriterconfig indexwriterconfig = new indexwriterconfig(analyzer);

indexwriterconfig.setopenmode(openmode.create_or_append);

indexwriter writer = new indexwriter(directory, indexwriterconfig);

//建立測試索引

document doc1 = createdocument("lucene in action 2th edition", "2010-05-05");

document doc2 = createdocument("lucene progamming", "2008-07-11");

document doc3 = createdocument("lucene user guide", "2014-11-24");

document doc4 = createdocument("lucene5 cookbook", "2015-01-09");

document doc5 = createdocument("apache lucene api 5.0.0", "2015-02-25");

document doc6 = createdocument("apache solr 4 cookbook", "2013-10-22");

document doc7 = createdocument("administrating solr", "2015-01-20");

document doc8 = createdocument("apache solr essentials", "2013-08-16");

document doc9 = createdocument("apache solr high performance", "2014-06-28");

document doc10 = createdocument("apache solr api 5.0.0", "2015-03-02");

writer.adddocument(doc1);

writer.adddocument(doc2);

writer.adddocument(doc3);

writer.adddocument(doc4);

writer.adddocument(doc5);

writer.adddocument(doc6);

writer.adddocument(doc7);

writer.adddocument(doc8);

writer.adddocument(doc9);

writer.adddocument(doc10);

writer.close();

運作測試結果如圖：

demo代碼請在最底下的附件裡下載下傳如果你需要的話，ok，打完收工！

如果你還有什麼問題請加我Ｑ-q：7-3-6-0-3-1-3-0-5，

或者加裙

一起交流學習！

轉載：http://iamyida.iteye.com/blog/2201291

Lucene5學習之FunctionQuery功能查詢

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method