用户的需求是多种多样的。比如:
现在,来介绍几种lucene的高级检索方式,来帮助我们满足各种各样的用户需求~~
用户在搜索引擎中进行搜索时,常常查找的并非是一个简单的单词,很有可能是几个不同的关键字。这些关键字之间要么是紧密相联,成为一个精确的短语,要么是可能在这几个关键字之间还插有其他无关的关键字。此时,用户希望将它们找出来。不过很显然,从评分的角度看,这些关键字之间拥有与查找内容无关短语所在的文档的分值一般会较低一些。
PhraseQuery正是Lucene所提供的满足上述需求的一种Query对象。它可以让用户往其内部添加关键字,在添加完毕后,用户还可以通过设置slop参数来设定一个称之为“坡度”的变量来确定关键字之间是否允许、允许多少个无关词汇的存在。
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.PhraseQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.spans.SpanQuery;
import org.apache.lucene.search.spans.SpanTermQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class testPhraseQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("index3"));
reader= DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
Term t1=new Term("key2","孙悟空");
Term t2=new Term("key2","猪八戒");
//slop,term...;slop represents the maximum distance between the given terms.reference:
//http://lucene.apache.org/core/6_2_1/core/org/apache/lucene/search/PhraseQuery.html
PhraseQuery query=new PhraseQuery(5,"key2",t1.bytes(),t2.bytes());
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
int cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1")+"\n"+d.get("skey2");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
indexSearch();
}
}
PhraseQuery的构造方法有四种:文档。这里介绍演示代码的构造方法:
PhraseQuery(int slop, String field, BytesRef... terms)
slop是int型,通过设置slop“坡度”来确定关键字之间是否允许、允许多少个无关词汇的存在。
filed是String,是要搜索的域。
terms是ByteRef,是用户要搜索的关键字。
因此,第一个需求可以使用PhraseQuery来满足。
RangeQuery是对字符串进行范围查询的,索引中的所有项都以字典顺序排列。它允许用户在某个范围内搜索,该范围的起始项和最终项都可以指定包含或不包含。
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermRangeQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class testRangeQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("indexrangequery"));
reader= DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
//*************测试一*******************
// Term begin = new Term("birthday","19980101");
// Term end = new Term("birthday","20040606");
// Query query = new TermRangeQuery("birthday",begin.bytes(),end.bytes(),false,false);
//*************测试二*******************
Term begin = new Term("lex","ab");
Term end = new Term("lex","cd");
Query query = new TermRangeQuery("lex",begin.bytes(),end.bytes(),false,false);
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
System.out.println(sds.length);
int cou=0;
for(ScoreDoc sd:sds)
{
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("sname")+" "+d.get("sbirthday")+" "+d.get("sid")+" "+d.get("slex");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
//9、关闭reader
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
indexSearch();
}
}
构造方法如下:
TermRangeQuery(String field, BytesRef lowerTerm, BytesRef upperTerm, boolean includeLower, boolean includeUpper)
field指明搜索的域;lowerTerm个upperTerm分别的起始项和最终项,最后两个boolean指定是开区间还是闭区间。
这样,对于用户第二个需求,就轻松解决了~~
FuzzyQuery是模糊匹配,基于编辑距离(Edit Distance)的Damerau-Levenshtein算法,编辑距离就是两个字符串有一个转变成另一个所需要的最小的操作步骤。
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.FuzzyQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class testFuzzyQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(String keywords){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("index3"));
reader= DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
FuzzyQuery query=new FuzzyQuery(new Term("key1",keywords));
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
int cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
String keywords[]={"流星","眼睛","小学生"};
for(int i=0;i<keywords.length;i++)
{
indexSearch(keywords[i]);
}
}
}
该函数有四个构造方法,参加FuzzyQuery文档。关于FuzzyQuery的构造方法,这篇博客讲得很好:Lucene5学习之FuzzyQuery使用。
WildCardQuery是通配符查询,通配符“?”代表1个字符,而“*”则代表0至多个字符。使用方法很简单:
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.WildcardQuery;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class testWildCardQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(String keywords){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("index3"));
reader= DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
WildcardQuery query=new WildcardQuery(new Term("key1",keywords));
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
int cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
String keywords[]={"流?雨","星*","小学*"};
for(int i=0;i<keywords.length;i++)
{
indexSearch(keywords[i]);
}
}
}
具体用法参考文档:WildCardQuery文档。
WildCardQuery和FuzzyQuery由于需要对字段关键字进行字符串匹配,所以,在搜索的性能上面会受到一些影响。
PrefixQuery用于匹配其索引开始以指定的字符串的文档。用法很简单:
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.PrefixQuery;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
public class testPrefixQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("index3"));
reader= DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
PrefixQuery query=new PrefixQuery(new Term("key1","中"));
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
System.out.println(sds.length);
int cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1")+"\n"+d.get("skey2");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
indexSearch();
}
}
详细说明参考官方文档:PrefixQuery文档。
上面的FuzzyQuery,WildCardQuery和PrefixQuery都是不精确查询,可以解决用户的第三个需求~~
BooleanQuery是布尔查询,通过对其他查询(如上节讲到的TermQuery,PhraseQuery或其他BooleanQuery)的组合来实现逻辑运算。
BooleanQuery的逻辑运算符是通过BooleanQuery.Occur(文档)来确定的。
Lucene6.2.1版本废弃了BooleanFilter,合并到了BooleanClause.OCCUR.FILTER中,那么Filter和Query有什么区别呢?
其实各种Query和各种Filter之间非常相似,可以互相转换,最大的区别是:Query有评分操作,返回的结果集有相关性评分;Filter的结果集无相关性评分,返回的结果是无排序的。
这四者组合,妙用无穷:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.net.ServerSocket;
import java.net.Socket;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.search.highlight.Fragmenter;
import org.apache.lucene.search.highlight.Highlighter;
import org.apache.lucene.search.highlight.InvalidTokenOffsetsException;
import org.apache.lucene.search.highlight.QueryScorer;
import org.apache.lucene.search.highlight.SimpleHTMLFormatter;
import org.apache.lucene.search.highlight.SimpleSpanFragmenter;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import IkAnalyzer.MyIkAnalyzer;
public class testBooleanQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(String keywords){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("index3"));
reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parserkey = new QueryParser("key1",new MyIkAnalyzer());//content表示搜索的域或者说字段
Query querykey = parserkey1.parse(keywords);
String ss=querykey1.toString();
System.out.println(ss);
QueryParser parserkey = new QueryParser("key2",new MyIkAnalyzer());//content表示搜索的域或者说字段
Query querykey = parserkey2.parse(keywords);//被搜索的内容
System.out.println(keywords);
BooleanQuery query=null;
String cate1="知识";
String cate2="百科";
Query querycate1=new TermQuery(new Term("category1",cate1));
Query querycate2=new TermQuery(new Term("category2",cate2));
query=new BooleanQuery.Builder().add(querycate1,BooleanClause.Occur.FILTER).add(querycate2,BooleanClause.Occur.FILTER).add(querykey1,BooleanClause.Occur.SHOULD).add(querykey2,BooleanClause.Occur.SHOULD).build();
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
int cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("skey1")+"\n"+d.get("skey2");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
//9、关闭reader
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
String keywords="为什么眼睛会流泪?";
indexSearch(keywords);
}
}
对于小明的需求,我们就可以利用BooleanQuery来设计:查询一是域为“分类”,搜索词为“游戏”的TermQuery;查询二为检索搜索关键词的某一Query;用BooleanQuery组合两个查询,查询一的运算符为过滤,查询二为MUST,这样,就能查找所有分类为“游戏”的内容了。
MultiFieldQuery是多域查询。比如用户有这样的需求:一个文档中含有“标题”,“正文”等字段,搜索一个关键词,不管它在标题中出现还是在正文中出现都算符合条件。这时,我们就用到了多域查询。
import java.nio.file.Paths;
import java.io.*;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import IkAnalyzer.MyIkAnalyzer;
public class testMultiFieldQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("index3"));//在硬盘上生成Directory
reader = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
Analyzer analyzer1=new StandardAnalyzer();
Analyzer analyzer2=new SmartChineseAnalyzer();
Analyzer analyzer3=new MyIkAnalyzer();
// //方法一:利用BooleanQuery,在两个TermQuery之间做逻辑运算
// Term t1=new Term("key1","张飞");
// Term t2=new Term("key2","刘备");
// TermQuery q1=new TermQuery(t1);
// TermQuery q2=new TermQuery(t2);
// BooleanQuery query=new BooleanQuery.Builder().add(q1,BooleanClause.Occur.FILTER).add(q2,BooleanClause.Occur.MUST).build();
//
//方法二:MultiFieldQueryParser类,实现多字段搜索,实际上只是一个封装,用起来简单,内部还是用BooleanQuery实现
String fields[]={"key1","key2"};
String kws[]={"张飞","刘备"};
//MUST:and;SHOULD:OR;MUST_NOT:NOT;FILTER:相当于MUST,但是不参与打分。
BooleanClause.Occur[] flags=new BooleanClause.Occur[]{BooleanClause.Occur.FILTER,BooleanClause.Occur.MUST};
Query query=MultiFieldQueryParser.parse(kws,fields,flags,analyzer3);
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
int cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("skey1")+"\n"+d.get("skey2");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
indexSearch(); //搜索的内容可以修改
}
}
MultiFieldQuery有两种实现方法:
方法一是利用BooleanQuery在多个TermQuery之间做逻辑运算
//方法一:利用BooleanQuery,在两个TermQuery之间做逻辑运算
Term t1=new Term("key1","张飞");
Term t2=new Term("key2","刘备");
TermQuery q1=new TermQuery(t1);
TermQuery q2=new TermQuery(t2);
BooleanQuery query=new BooleanQuery.Builder().add(q1,BooleanClause.Occur.FILTER).add(q2,BooleanClause.Occur.MUST).build();
方法二是MultiFieldQueryParser类,实现多域搜索,实际上只是一个封装,用起来简单,内部还是用BooleanQuery实现。
//方法二:MultiFieldQueryParser类,实现多字段搜索,实际上只是一个封装,用起来简单,内部还是用BooleanQuery实现
String fields[]={"key1","key2"};
String kws[]={"张飞","刘备"};
//MUST:and;SHOULD:OR;MUST_NOT:NOT;FILTER:相当于MUST,但是不参与打分。
BooleanClause.Occur[] flags=new BooleanClause.Occur[]{BooleanClause.Occur.FILTER,BooleanClause.Occur.MUST};
Query query=MultiFieldQueryParser.parse(kws,fields,flags,analyzer3);
打印query
String ss=query.toString();
System.out.println(ss);
可以利用上述代码把查询对象打印出来,便于直观感受lucene对查询的解析。
可以看出,MultiFieldQuery本质上也是BooleanQuery的应用,具体内容可以参考官方文档。
FieldQuery是域搜索,用户可以通过输入符合语法规则的查询语句指定一次查询是在哪些域上进行。例如,如果索引的文档包含两个域,Title 和Content,用户可以使用查询 “Title: Lucene AND Content: Java” 来返回所有在 Title域上包含 Lucene 并且在 Content 域上包含 Java 的文档。
import java.io.IOException;
import java.nio.file.Paths;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import IkAnalyzer.MyIkAnalyzer;
public class testFieldQuery {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(String keywords){
DirectoryReader reader = null;
try{
Directory directory = FSDirectory.open(Paths.get("index3"));
reader= DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);
QueryParser parser = new QueryParser("key1",new MyIkAnalyzer());//content表示搜索的域或者说字段
org.apache.lucene.search.Query query = parser.parse(keywords);//被搜索的内容
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
int cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1")+"\n"+d.get("skey2");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
String keywords="key1:猪八戒 AND key2:孙悟空 ";
indexSearch(keywords);
}
}
打印query
String ss=query.toString();
System.out.println(ss);
打印查询对象后发现,lucene对用户查询语句的处理与对BooleanQuery的处理结果是一样的。
MultiSearcher是多索引搜索。可以这样理解:
为了减少单个索引目录的大小,时常将索引放在许多目录中,这些索引的结构都是一致的。比如有一个城市的网站搜索引擎,随着时间的增长,我们可能会将索引的目录按照年份分成2003、2004、2005等。旧的索引目录被搜索的几率小,所以将其单独分出去,这样,可以减小新的索引目录,加快搜索速度。但是有些时候,必须实现多个索引的同时搜索,因为我们需要存放在这些索引中的信息。要实现多索引搜索,只需要对每个索引目录都用IndexSearcher搜索一遍,最后将搜索结果合并起来。
实际上,lucene6.2.1已经废弃了MultiSearcher这个类,改用MultiReader来实现。
import java.nio.file.Paths;
import java.io.*;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.MultiReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
import IkAnalyzer.MyIkAnalyzer;
public class testMultiSearcher {
public static Version luceneVersion = Version.LATEST;
public static void indexSearch(String keywords){
String res = "";
DirectoryReader reader = null;
DirectoryReader reader = null;
MultiReader mr=null;
try{
Directory directory = FSDirectory.open(Paths.get("index1"));
Directory directory = FSDirectory.open(Paths.get("index2"));
reader = DirectoryReader.open(directory1);
reader = DirectoryReader.open(directory2);
mr=new MultiReader(reader1,reader2);
IndexSearcher searcher = new IndexSearcher(mr);
IndexSearcher searcher2=new IndexSearcher(reader1);
IndexSearcher searcher3=new IndexSearcher(reader2);
QueryParser parser = new QueryParser("key2",new MyIkAnalyzer());//content表示搜索的域或者说字段
Query query = parser.parse(keywords);
String ss=query.toString();
System.out.println(ss);
TopDocs tds = searcher1.search(query, 20);
ScoreDoc[] sds = tds.scoreDocs;
int cou=0;
System.out.println("MultiSearcher的结果:");
for(ScoreDoc sd:sds){
cou++;
Document d = searcher1.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1");
System.out.println(output);
}
//*************************************************
System.out.println("只搜索百科的结果:");
tds = searcher2.search(query, 10);
sds = tds.scoreDocs;
cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher2.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1");
System.out.println(output);
}
//********************************************
System.out.println("只搜索课本的结果:");
tds = searcher3.search(query, 10);
sds = tds.scoreDocs;
cou=0;
for(ScoreDoc sd:sds){
cou++;
Document d = searcher3.doc(sd.doc);
String output=cou+". "+d.get("category2")+"\n"+d.get("skey1");
System.out.println(output);
}
}catch(Exception e){
e.printStackTrace();
}finally{
try {
mr.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException
{
String keyword="眼睛";
indexSearch(keyword);
}
}
这样,我们可以为“游戏”,“课本”等独立地建立索引,用户想搜索游戏时,我们可以指定索引目录为游戏,用户想综合所有类别搜索时,我们可以用MultiSearcher来搜索所有类目~
我们可能担心,在索引的过程中,分散地存储到多个索引目录中,是否在搜索时能够得到全局的相关度计算得分?其实Lucene的这个方法支持全局得分的计算,也就是说,虽然索引分布在多个索引目录中,在搜索的时候还会将全部的索引数据聚合在一起进行查询匹配和得分计算。
利用多域搜索还可以实现多线程搜索,这个有待研究~
QueryParse类不是一种Query,但可以通过设置QueryParser的参数,实现多字段搜索,也能实现BooleanQuery的一部分效果。
使用QueryParser解析多个关键词,比如用户搜索“love China”,打印查询对象后发现,两个关键词是或的关系。
使用下面的方法后,QueryParser就可以对两个关键词取交了:
//方式一:
parser.setDefaultOperator(QueryParser.Operator.AND);
//方式二:
parser.createBooleanQuery("key1", keywords, Occur.MUST);
方法一:
parser.setDefaultOperator(QueryParser.Operator.AND);
参数QueryParser.Operator.AND表示与,QueryParser.Operator.OR表示或。
方法二:
parser.createBooleanQuery("key1", keywords, Occur.MUST);
参数一表示域,参数二表示搜索关键词,参数三只有两种,Occur.MUST和Occur.SHOULD,而且都表示逻辑与。
https://blog.xqlee.com/article/2504291640282818.html