天天看點

超輕量級全文搜尋架構的設計和實作

Lucene是Java領域最出色的全文搜尋引擎,然而其API比較複雜,并且有嚴格的線程同步模型,直接使用不易。Compass則是封裝了Lucene的一個OSEM:Object-SearchEngine Mapping,與Hibernate封裝JDBC類似,然而過于複雜,支援的Lucene版本較低,在www.javaeedev.com發現雪峰開發一個類似Compass的簡單封裝Lucene的全文搜尋架構,支援最新版本Lucene和Java 5泛型代碼,使用者通過簡單的代碼即可對自定義Bean進行搜尋:

List<T> list = Searcher.search(Class<T>, String q, Page page);

下載下傳位址:

http://code.google.com/p/lightweight-search/downloads/list

下面介紹下簡單的用法:

1.自定義Bean,例如:Article

import com.javaeedev.lightweight.search.Index;

import com.javaeedev.lightweight.search.SearchableId;

import com.javaeedev.lightweight.search.SearchableProperty;

import com.javaeedev.lightweight.search.Store;

public class Article {

    private String id;

    private String title;

    private String author;

    private String content;

    @SearchableId

    public String getId() {

        return id;

    }

    public void setId(String id) {

        this.id = id;

    }

    @SearchableProperty(store=Store.YES, index=Index.TOKENIZED, boost=3.0f)

    public String getTitle() {

        return title;

    }

    public void setTitle(String title) {

        this.title = title;

    }

    @SearchableProperty(store=Store.YES, index=Index.TOKENIZED, boost=2.0f)

    public String getAuthor() {

        return author;

    }

    public void setAuthor(String author) {

        this.author = author;

    }

    @SearchableProperty(store=Store.YES, index=Index.TOKENIZED)

    public String getContent() {

        return content;

    }

    public void setContent(String content) {

        this.content = content;

    }

其中特别的地方就是@SearchableId和@SearchableProperty

@SearchableId是描述lucene索引的ID

@SearchableProperty是描述lucene索引域的

以上二屬性的詳細資訊請查詢lucene,應為此架構的原理僅僅是對lucene的封裝。

2.實作全文全文搜尋

有兩種方法:

        一 直接用行,所需參數在調用類中配置,例如:

import com.javaeedev.lightweight.common.Page;

import com.javaeedev.lightweight.search.Searcher;

import com.javaeedev.lightweight.search.SearcherProvider;

public class Run {

    @SuppressWarnings("unchecked")

    public static void main(String[] args) {

        // define classes that can be searched:

        Set<Class> searchableClasses = new HashSet<Class>();

        searchableClasses.add(Article.class);

        // init a provider:

        SearcherProvider provider = new SearcherProvider("./index-location", searchableClasses);

        provider.setAnalyzer(new StandardAnalyzer());

        provider.setHighlightPre("<b>");

        provider.setHighlightPost("</b>");

        // get a searcher:

        Searcher searcher = provider.get();

        // now do all search work with "searcher" object:

        // first add new Articles:

        Article a = new Article();

        a.setId(UUID.randomUUID().toString());

        a.setTitle("How to use lightweight search?");

        a.setAuthor("Liao Xuefeng");

        a.setContent("Lightweight search project is an open source project that using Lucene but simplify full text search.");

        searcher.index(a);

        // now search:

        Page page = new Page(1);

        List<Article> articles = searcher.search(Article.class, "lightweight", page);

        System.out.println("Results: " + page.getTotalCount());

        for(Article article : articles) {

            System.out.println(article.getTitle() + " by " + article.getAuthor());

        }

    }

}

(注意:請注意其代碼中的注釋部分)

       二 借助于Guice注入(相關Guice的資料)

    借助于Guice注入,首先要做的就是初始化Guice

import com.google.inject.Module;

import com.google.inject.name.Names;

import com.javaeedev.lightweight.search.Searcher;

import com.javaeedev.lightweight.search.SearcherProvider;

public class MyModule implements Module {

    @SuppressWarnings("unchecked")

    public void configure(Binder binder) {

        binder.bindConstant().annotatedWith(Names.named("HighlightPre")).to("<span class=\"highlight\">");

        binder.bindConstant().annotatedWith(Names.named("HighlightPost")).to("</span>");

        binder.bindConstant().annotatedWith(Names.named("IndexLocation")).to("./index-location");

        Set<Class> set = new HashSet<Class>();

        set.add(Article.class);

        binder.bind(new Key<Set<Class>>(Names.named("SearchableClasses")) {}).toInstance(set);

        binder.bind(Analyzer.class).to(StandardAnalyzer.class).asEagerSingleton();

        binder.bind(Searcher.class).toProvider(SearcherProvider.class).asEagerSingleton();

    }

}

其次建立調用

import java.util.List;

import java.util.UUID;

import com.google.inject.Guice;

import com.google.inject.Injector;

import com.javaeedev.lightweight.common.Page;

import com.javaeedev.lightweight.search.Searcher;

public class RunWithGuice {

    public static void main(String[] args) {

        Injector injector = Guice.createInjector(new MyModule());

        Searcher searcher = injector.getInstance(Searcher.class);

        // now do all search work with "searcher" object:

        // first add new Articles:

        Article a = new Article();

        a.setId(UUID.randomUUID().toString());

        a.setTitle("How to use lightweight search?");

        a.setAuthor("Liao Xuefeng");

        a.setContent("Lightweight search project is an open source project that using Lucene but simplify full text search.");

        searcher.index(a);

        // now search:

        Page page = new Page(1);

        List<Article> articles = searcher.search(Article.class, "lightweight", page);

        System.out.println("Results: " + page.getTotalCount());

        for(Article article : articles) {

            System.out.println(article.getTitle() + " by " + article.getAuthor());

        }

    }

繼續閱讀