Lucene5学习之FunctionQuery功能查询

我猜，大家最大的疑问就是：不是已经有那么多query实现类吗，为什么又设计一个functionquery,它的设计初衷是什么，或者说它是用来解决什么问题的？我们还是来看看源码里是怎么解释functionquery的：

意思就是基于valuesource来返回每个文档的评分即valuesourcescore,那valuesource又是怎么东东？接着看看valuesource源码里的注释说明：

valuesource是用来根据指定的indexreader来实例化functionvalues的，那functionvalues又是啥？

从接口中定义的函数可以了解到，functionvalues提供了根据文档id获取各种类型的docvaluesfield域的值的方法，那这些接口返回的域值用来干嘛的，翻看functionquery源码，你会发现：

从上面几张图，我们会发现，functionquery构造的时候需要提供一个valuesource,然后在functionquery的内部类allscorer中通过valuesource实例化了functionvalues,然后在计算functionquery评分的时候通过functionvalues获取docvaluesfield的域值，域值和functionquery的权重值相乘得到functionquery的评分。

float score = qweight * vals.floatval(doc);

那这里valuesource又起什么作用呢，为什么不直接让functionquery来构建functionvalues，而是要引入一个中间角色valuesource呢？

因为functionquery应该线程安全的，即允许多次查询共用同一个functionquery实例，如果让functionvalues直接依赖functionquery,那可能会导致某个线程通过functionvalues得到的docvaluesfield域值被另一个线程修改了，所以引入了一个valuessource，让每个functionquery对应一个valuesource,再让valuesource去生成functionvalues,因为docvaluesfield域值的正确性会影响到最后的评分。另外出于缓存原因，因为每次通过functionvalues去加载docvaluesfield的域值，其实还是通过indexreader去读取的，这就意味着有磁盘io行为，磁盘io次数可是程序性能杀手哦，所以设计cachingdoublevaluesource来包装valuesource.不过cachingdoublevaluesource貌似还处在捐献模块，不知道下个版本是否会考虑为valuesource添加cache功能。

valuesource构造很简单，

public doublefieldsource(string field) {

super(field);

}

你只需要提供一个域的名称即可，不过要注意，这里的域必须是docvaluesfield,不能是普通的stringfield,textfield,intfield,floatfield,longfield。

那functionquery可以用来解决什么问题？举个例子：比如你索引了n件商品，你希望通过某个关键字搜索时，出来的结果优先按最近上架的商品显示，再按商品和搜索关键字匹配度高低降序显示，即你希望最近上架的优先靠前显示，评分高的靠前显示。

下面是一个functionquery使用示例，模拟类似这样的场景：

书籍的出版日期越久远，其权重因子会按天数一天天衰减，从而实现让新书自动靠前显示

package com.yida.framework.lucene5.function;

import java.io.ioexception;

import java.util.map;

import org.apache.lucene.index.docvalues;

import org.apache.lucene.index.leafreadercontext;

import org.apache.lucene.index.numericdocvalues;

import org.apache.lucene.queries.function.functionvalues;

import org.apache.lucene.queries.function.valuesource.fieldcachesource;

import com.yida.framework.lucene5.util.score.scoreutils;

/**

* 自定义valuesource[计算日期递减时的权重因子，日期越近权重值越高]

* @author lanxiaowei

public class datedampingvaluesouce extends fieldcachesource {

//当前时间

private static long now;

public datedampingvaluesouce(string field) {

super(field);

//初始化当前时间

now = system.currenttimemillis();

}

/**

* 这里map里存的是indexseacher,context.get("searcher");获取

@override

public functionvalues getvalues(map context, leafreadercontext leafreadercontext)

throws ioexception {

final numericdocvalues numericdocvalues = docvalues.getnumeric(leafreadercontext.reader(), field);

return new functionvalues() {

@override

public float floatval(int doc) {

return scoreutils.getnewsscorefactor(now, numericdocvalues,doc);

}

public int intval(int doc) {

return (int) scoreutils.getnewsscorefactor(now, numericdocvalues,doc);

public string tostring(int doc) {

return description() + '=' + intval(doc);

};

}

package com.yida.framework.lucene5.util.score;

import com.yida.framework.lucene5.util.constans;

* 计算衰减因子[按天为单位]

public class scoreutils {

/**存储衰减因子-按天为单位*/

private static float[] daysdampingfactor = new float[120];

/**降级阀值*/

private static float demoteboost = 0.9f;

static {

daysdampingfactor[0] = 1;

//第一周时权重降级处理

for (int i = 1; i < 7; i++) {

daysdampingfactor[i] = daysdampingfactor[i - 1] * demoteboost;

}

//第二周

for (int i = 7; i < 31; i++) {

daysdampingfactor[i] = daysdampingfactor[i / 7 * 7 - 1]

* demoteboost;

//第三周以后

for (int i = 31; i < daysdampingfactor.length; i++) {

daysdampingfactor[i] = daysdampingfactor[i / 31 * 31 - 1]

//根据相差天数获取当前的权重衰减因子

private static float daydamping(int delta) {

float factor = delta < daysdampingfactor.length ? daysdampingfactor[delta]

: daysdampingfactor[daysdampingfactor.length - 1];

system.out.println("delta:" + delta + "-->" + "factor:" + factor);

return factor;

public static float getnewsscorefactor(long now, numericdocvalues numericdocvalues, int docid) {

long time = numericdocvalues.get(docid);

float factor = 1;

int day = (int) (time / constans.day_millis);

int nowday = (int) (now / constans.day_millis);

system.out.println(day + ":" + nowday + ":" + (nowday - day));

// 如果提供的日期比当前日期小，则计算相差天数，传入daydamping计算日期衰减因子

if (day < nowday) {

factor = daydamping(nowday - day);

} else if (day > nowday) {

//如果提供的日期比当前日期还大即提供的是未来的日期

factor = float.min_value;

} else if (now - time <= constans.half_hour_millis && now >= time) {

//如果两者是同一天且提供的日期是过去半小时之内的，则权重因子乘以2

factor = 2;

public static float getnewsscorefactor(long now, long time) {

public static float getnewsscorefactor(long time) {

long now = system.currenttimemillis();

return getnewsscorefactor(now, time);

import java.nio.file.paths;

import java.text.dateformat;

import java.text.parseexception;

import java.text.simpledateformat;

import java.util.date;

import org.apache.lucene.analysis.analyzer;

import org.apache.lucene.analysis.standard.standardanalyzer;

import org.apache.lucene.document.document;

import org.apache.lucene.document.field;

import org.apache.lucene.document.field.store;

import org.apache.lucene.document.longfield;

import org.apache.lucene.document.numericdocvaluesfield;

import org.apache.lucene.document.textfield;

import org.apache.lucene.index.directoryreader;

import org.apache.lucene.index.indexreader;

import org.apache.lucene.index.indexwriter;

import org.apache.lucene.index.indexwriterconfig;

import org.apache.lucene.index.indexwriterconfig.openmode;

import org.apache.lucene.index.term;

import org.apache.lucene.queries.customscorequery;

import org.apache.lucene.queries.function.functionquery;

import org.apache.lucene.search.indexsearcher;

import org.apache.lucene.search.scoredoc;

import org.apache.lucene.search.sort;

import org.apache.lucene.search.sortfield;

import org.apache.lucene.search.termquery;

import org.apache.lucene.search.topdocs;

import org.apache.lucene.store.directory;

import org.apache.lucene.store.fsdirectory;

* functionquery测试

public class functionquerytest {

private static final dateformat formate = new simpledateformat("yyyy-mm-dd");

public static void main(string[] args) throws exception {

string indexdir = "c:/lucenedir-functionquery";

directory directory = fsdirectory.open(paths.get(indexdir));

//system.out.println(0.001953125f * 100000000 * 0.001953125f / 100000000);

//创建测试索引[注意：只用创建一次，第二次运行前请注释掉这行代码]

//createindex(directory);

indexreader reader = directoryreader.open(directory);

indexsearcher searcher = new indexsearcher(reader);

//创建一个普通的termquery

termquery termquery = new termquery(new term("title", "solr"));

//根据可以计算日期衰减因子的自定义valuesource来创建functionquery

functionquery functionquery = new functionquery(new datedampingvaluesouce("publishdate"));

//自定义评分查询[customscorequery将普通query和functionquery组合在一起，至于两者的query评分按什么算法计算得到最后得分，由用户自己去重写来干预评分]

//默认实现是把普通查询评分和functionquery高级查询评分相乘求积得到最终得分，你可以自己重写默认的实现

customscorequery customscorequery = new customscorequery(termquery, functionquery);

//创建排序器[按评分降序排序]

sort sort = new sort(new sortfield[] {sortfield.field_score});

topdocs topdocs = searcher.search(customscorequery, null, integer.max_value, sort,true,false);

scoredoc[] docs = topdocs.scoredocs;

for (scoredoc scoredoc : docs) {

int docid = scoredoc.doc;

document document = searcher.doc(docid);

string title = document.get("title");

string publishdatestring = document.get("publishdate");

system.out.println(publishdatestring);

long publishmills = long.valueof(publishdatestring);

date date = new date(publishmills);

publishdatestring = formate.format(date);

float score = scoredoc.score;

system.out.println(docid + " " + title + " " +

publishdatestring + " " + score);

reader.close();

directory.close();

* 创建document对象

* @param title 书名

* @param publishdatestring 书籍出版日期

* @return

* @throws parseexception

public static document createdocument(string title,string publishdatestring) throws parseexception {

date publishdate = formate.parse(publishdatestring);

document doc = new document();

doc.add(new textfield("title",title,field.store.yes));

doc.add(new longfield("publishdate", publishdate.gettime(),store.yes));

doc.add(new numericdocvaluesfield("publishdate", publishdate.gettime()));

return doc;

//创建测试索引

public static void createindex(directory directory) throws parseexception, ioexception {

analyzer analyzer = new standardanalyzer();

indexwriterconfig indexwriterconfig = new indexwriterconfig(analyzer);

indexwriterconfig.setopenmode(openmode.create_or_append);

indexwriter writer = new indexwriter(directory, indexwriterconfig);

//创建测试索引

document doc1 = createdocument("lucene in action 2th edition", "2010-05-05");

document doc2 = createdocument("lucene progamming", "2008-07-11");

document doc3 = createdocument("lucene user guide", "2014-11-24");

document doc4 = createdocument("lucene5 cookbook", "2015-01-09");

document doc5 = createdocument("apache lucene api 5.0.0", "2015-02-25");

document doc6 = createdocument("apache solr 4 cookbook", "2013-10-22");

document doc7 = createdocument("administrating solr", "2015-01-20");

document doc8 = createdocument("apache solr essentials", "2013-08-16");

document doc9 = createdocument("apache solr high performance", "2014-06-28");

document doc10 = createdocument("apache solr api 5.0.0", "2015-03-02");

writer.adddocument(doc1);

writer.adddocument(doc2);

writer.adddocument(doc3);

writer.adddocument(doc4);

writer.adddocument(doc5);

writer.adddocument(doc6);

writer.adddocument(doc7);

writer.adddocument(doc8);

writer.adddocument(doc9);

writer.adddocument(doc10);

writer.close();

运行测试结果如图：

demo代码请在最底下的附件里下载如果你需要的话，ok，打完收工！

如果你还有什么问题请加我Ｑ-q：7-3-6-0-3-1-3-0-5，

或者加裙

一起交流学习！

转载：http://iamyida.iteye.com/blog/2201291

Lucene5学习之FunctionQuery功能查询

继续阅读

关于Gradle配置的小结

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method