Apache Lucene Field / IndexField 类型和使用详解

编程教程 > Java (61) 2025-04-30 14:37:41

Lucene Field简介

在 Apache Lucene 中，Field 类是文档中存储数据的基础。不同类型的 Field 用于存储不同类型的数据（如文本、数字、二进制数据等）。

Apache Lucene Field / IndexField 类型和使用详解_图示-5207ec8cc28049599f3af14c5178f4b9.png — IndexableField

Apache Lucene Field / IndexField 类型和使用详解_图示-56ede2e593554edc829d0f14dd0e2fb2.png — Field

TextField

TextField 介绍

路径：org.apache.lucene.document.TextField
用途：用于存储（Field.Store.YES）文本数据，并对其进行分词和索引。
底层存储结构：文本数据会被分词器（Analyzer）处理，将文本分割成词项（terms）。每个词项会被存储在倒排索引（inverted index）中，映射到包含该词项的文档。

TextField 示例

static String path="src/resources/index/app2";
    public static void main(String[] args) throws IOException, ParseException {

        Document doc = new Document();
        doc.add(new TextField("fieldName", "This is a sample text.", Field.Store.YES));
        Document doc1 = new Document();
        doc1.add(new TextField("fieldName", "Sample text.", Field.Store.YES));
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));

        ){
            //创建索引
            indexWriter.addDocument(doc);
            indexWriter.addDocument(doc1);
            //提交索引写到文件
            indexWriter.commit();

            //查询索引
            QueryParser queryParser = new QueryParser("fieldName", analyzer);
            Query query = queryParser.parse("sample");

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);
            TopDocs topDocs_10 = indexSearcher.search(query, 10);
            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            List<JSONObject> list = new ArrayList<>();
            for (ScoreDoc scoreDoc : scoreDocArray) {
                JSONObject jsonDoc = new JSONObject();

                int docId = scoreDoc.doc;
                Document document = indexSearcher.doc(docId);
                jsonDoc.set("fieldName", document.get("fieldName"));
                jsonDoc.set("score", scoreDoc.score);
                jsonDoc.set("shardIndex", scoreDoc.shardIndex);

                list.add(jsonDoc);
            }
            System.out.println(JSONUtil.toJsonStr(list));

            //测试索引删除
            indexWriter.deleteAll();
        }

    }

执行结果

查询词：sample

Apache Lucene Field / IndexField 类型和使用详解_图示-3efb39e5029045ad94a635daa3385ce5.png

查询词：a sample text

Apache Lucene Field / IndexField 类型和使用详解_图示-d4dbdfc6bfc141f394038503b18bf8c1.png

StringField

StringField简介

路径：org.apache.lucene.document.StringField
用途：用于存储不需要分词的字符串数据，如唯一标识符（ID）/封面图片路径等。
底层存储结构：字符串数据作为一个整体存储在倒排索引中，不会进行分词。

StringField示例

import org.apache.lucene.document.Document;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.Field.Store;
 
Document doc = new Document();
doc.add(new StringField("fieldName", "unique_identifier", Store.YES));
doc.add(new StringField("cover_image", "abc.jpg", Store.YES));

数字类

数字类型包含：IntField 、LongField、FloatField、DoubleField

数字类型简介

路径：
- org.apache.lucene.document.IntField
- org.apache.lucene.document.LongField
- org.apache.lucene.document.FloatField
- org.apache.lucene.document.DoubleField
用途：用于存储数字型数据并支持范围查询，如唯一自增长int/long标识符（ID）/日期时间戳等。
底层存储结构：数值数据会被转换成字节数组，并按照分块（block）的方式存储，以支持高效的范围查询。

数字类型示例

    static String path="src/resources/index/app1";
    public static void main(String[] args) throws IOException, ParseException {
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
        ){
            //创建索引
            addDocument(indexWriter,"Cb Fc Zhang3",8);
            addDocument(indexWriter,"Cb Fb Lie",18);
            addDocument(indexWriter,"Cb Fb Wang",6);
            //提交索引写到文件
            indexWriter.commit();

            //数字类型 范围查询
            Query query =  LongField.newRangeQuery("age",8,30);

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);

            TopDocs topDocs_10 = indexSearcher.search(query, 10);

            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            for (ScoreDoc scoreDoc : scoreDocArray) {
                int docId = scoreDoc.doc;
                float score = scoreDoc.score;
                Document document = indexSearcher.doc(docId);
                System.out.printf("score: %s , name: %s ,age: %s%n", score, document.get("name"), document.get("age"));
            }

            //测试索引删除
            indexWriter.deleteAll();
        }


    }
    public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        doc.add(new LongField("age", age,Field.Store.YES));
        indexWriter.addDocument(doc);
    }

执行结果

Apache Lucene Field / IndexField 类型和使用详解_图示-229a8c901b824866b9e3f23f615313d7.png

组合排序使用

Apache Lucene Field / IndexField 类型和使用详解_图示-8b955c8cace84d2eae5aede7c013c079.png

Apache Lucene Field / IndexField 类型和使用详解_图示-d726b999ea2142e8b6a51cfcae33670d.png

特别提醒：有范围查询和排序需求的同一个字段，建议在索引文档创建两个类型，数字/数点类型用于范围查询，SortedDocValuesField 和 NumericDocValuesField 用于排序查询。

数点类型

数点类型包含：IntPoint、LongPoint、FloatPoint、DoublePoint、BigIntegerPoint

数点类型字段简介

路径：
- org.apache.lucene.document.IntPoint
- org.apache.lucene.document.LongPoint
- org.apache.lucene.document.FloatPoint
- org.apache.lucene.document.DoublePoint
用途：用于存储数值数据，并支持范围查询（只创建查询索引，并不存储原始数据,需要配合StoredField进行存储）。
底层存储结构：数值数据会被转换成字节数组，并按照分块（block）的方式存储，以支持高效的范围查询。

数点类型示例

static String path="src/resources/index/app1";
    public static void main(String[] args) throws IOException, ParseException {
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
        ){
            //创建索引
            addDocument(indexWriter,"Cb Fc Zhang3",8);
            addDocument(indexWriter,"Cb Fb Lie",18);
            addDocument(indexWriter,"Cb Fb Wang",6);
            //提交索引写到文件
            indexWriter.commit();

            //Point 范围查询
           Query query =  IntPoint.newRangeQuery("age",8,30);

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);

            TopDocs topDocs_10 = indexSearcher.search(query, 10);

            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            for (ScoreDoc scoreDoc : scoreDocArray) {
                int docId = scoreDoc.doc;
                float score = scoreDoc.score;
                Document document = indexSearcher.doc(docId);
                System.out.printf("score: %s , name: %s ,age: %s%n", score, document.get("name"), document.get("age"));
            }

            //测试索引删除
            indexWriter.deleteAll();
        }


    }
    public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        doc.add(new IntPoint("age", age));
        doc.add(new StoredField("age", age));
        indexWriter.addDocument(doc);
    }

示例执行结果

Apache Lucene Field / IndexField 类型和使用详解_图示-8f1edab279124f6f9de4cf6206ca0002.png

可以看到只查询出来了8-30范围值，边界值包含在内。

StoredField

StoredField简介

路径：org.apache.lucene.document.StoredField
用途：用于存储不需要索引的数据，仅用于检索时返回的字段（例如：配置数字类型索引存储原始数据）。
底层存储结构：数据以原始字节的形式存储在存储字段（stored field）中，不会被索引。

示例

import org.apache.lucene.document.Document;
import org.apache.lucene.document.StoredField;
 
Document doc = new Document();
doc.add(new StoredField("fieldName", "This is the stored content."));

BinaryField

BinaryField 简介

路径：org.apache.lucene.document.BinaryField
用途：用于存储二进制数据。
底层存储结构：二进制数据以原始字节的形式存储在存储字段中，不会被索引

示例

import org.apache.lucene.document.Document;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.util.BytesRef;
 
Document doc = new Document();
byte[] byteArray = new byte[] {1, 2, 3, 4, 5};
doc.add(new StoredField("fieldName", new BytesRef(byteArray)));

排序打分类型

包含：SortedDocValuesField 和 NumericDocValuesField

简介

路径：
- org.apache.lucene.document.SortedDocValuesField
- org.apache.lucene.document.NumericDocValuesField
用途：用于存储排序和打分时需要的字段值，（注意：默认不会存储原始值，需要配合StoredField 存储原始值）。
底层存储结构：数据以紧凑的格式存储在文档值（doc values）中，支持高效的排序和打分计算。

示例

    static String path="src/resources/index/app1";
    public static void main(String[] args) throws IOException, ParseException {
        //索引存放目录
        try (FSDirectory directory = FSDirectory.open(Paths.get(path));
             //分词器
             Analyzer analyzer = new StandardAnalyzer();
             IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(analyzer));
        ){
            //创建索引
            addDocument(indexWriter,"Cb Fc Zhang3",8);
            addDocument(indexWriter,"Cb Fb Lie",18);
            addDocument(indexWriter,"Cb Fb Wang",6);
            //提交索引写到文件
            indexWriter.commit();

            //查询索引
            Query query = new QueryParser("name", analyzer).parse("Cb");

            IndexReader indexReader = DirectoryReader.open(directory);
            IndexSearcher indexSearcher = new IndexSearcher(indexReader);


//            TopDocs topDocs_10 = indexSearcher.search(query, 10);

            Sort sort=new Sort(new SortField[]{SortField.FIELD_SCORE,new SortField("age",SortField.Type.LONG,true)});
            TopDocs topDocs_10 = indexSearcher.search(query, 10,sort);

            ScoreDoc[] scoreDocArray = topDocs_10.scoreDocs;

            for (ScoreDoc scoreDoc : scoreDocArray) {
                int docId = scoreDoc.doc;
                Document document = indexSearcher.doc(docId);
                System.out.printf("score: %s , name: %s ,age: %s%n", scoreDoc.score, document.get("name"), document.get("age"));
            }

            //测试索引删除
            indexWriter.deleteAll();
        }


    }
    
    public static void addDocument(IndexWriter indexWriter,String name,Integer age)throws IOException{
        Document doc = new Document();
        doc.add(new TextField("name", name, Field.Store.YES));
        doc.add(new NumericDocValuesField("age", age));
        doc.add(new StoredField("age", age));
        indexWriter.addDocument(doc);
    }

执行结果

普通查询，未排序

Apache Lucene Field / IndexField 类型和使用详解_图示-0c6ffd1972e945849ea4596d6757266c.png

查询+排序（降序）

Apache Lucene Field / IndexField 类型和使用详解_图示-43259b59fda2464a8e6cc6b3ae4280fc.png

`reverse` 排序字段剖析

    public SortField(String field, Type type, boolean reverse) {
        this.initFieldType(field, type);
        this.reverse = reverse;
    }

构造函数中有个参数为：reverse 默认值 false,参考下图：

Apache Lucene Field / IndexField 类型和使用详解_图示-ec6fedc66d9d4d399959e3a623a46476.png

reverse 值说明：

true - 等于SQL的DESC 降序
false - 等于SQL的ASC 升序

Score - NaN

眼尖的朋友可能已经发现排序后Score变成了NaN，原因是自定义了排序后，评分被移动到了fields里面，数组的分数和创建的排序顺序一致

Apache Lucene Field / IndexField 类型和使用详解_图示-73b3d8e3580743f4994d5d44ba8e9a71.png

不同类型字段用同一个名问题

重要提醒：

以上不同类型除开StoredField不要使用相同字段名会有问题

参考：

Apache Lucene Field / IndexField 类型和使用详解_图示-c908da22a7c34e9e80a47ede29151c0e.png

Lucene Field详解 Lucene高级查询 Lucene Lucene查询

https://blog.xqlee.com/article/250428134000069.html

昵称* 邮箱* 网站

提示：请评论与当前内容相关的回复，广告、推广或无关内容将被删除。

Apache Lucene Field / IndexField 类型和使用详解

Lucene Field简介在 Apache Lucene 中，Field 类是文档中存储数据的基础。不同类型的 Field 用于存储不同类型的数据（如文本、数

Lucene Field详解 Lucene高级查询 Lucene Lucene查询

Apache Lucene 入门篇

Apache Lucene一款 Apache托管的全文索引组件，纯Java实现。Lucene的作用用户—&gtl;服务器—&gtl;Lucene API—&gtl;索引库—&gtl;数据库/文

Lucene使用详解 Lucene 入门到精通 Apache Lucene Lucene查询

Lucene 6.x 高级查询

需求用户的需求是多种多样的。比如：用户可能对中日关系比较感兴趣，想查找‘中’和‘日’挨得比较近(5个字的距离内)的文章，超过这个距离的不予考虑。比如：“中日双方

lucene高级查询 lucene查询器 lucene 6 lucene

Apache Lucene 与JDK版本对应关系

Apache lucene 与Java jdk版本对应关系表参考：Apache Lucene版本JDK版本备注

Lucene jdk版本对应 Lucene Apache Lucene

Lucene 9.x Query 查询器使用

项目环境maven 项目<properties&gtl; <maven.compiler.source&gtl;17</maven.compiler.source&gtl; &...

lucene高级查询 lucene查询器 MultiSearcher BooleanQuery

Lucene全文索引与搜索入门篇2 - 可视化理解Lucene

背景介绍 Lucene是一款高性能、可扩展的信息检索工具库，是用于全文检索和搜寻的Java开放源码程序库，最初是由Doug Cutting所撰写，2000年发行

Lucene入门到精通 Lucene入门 Lucene

Lucene 分词器Analyzer 示例

StandardAnalyzer自带的标准分词器源码示例 public static void main(String[] args) throws IOExc

Lucene 分词器 Lucene Analyzer ik-analyzer

SQL优化 - MySQL 子查询优化

案例问题描述有这么一个SQL，外查询 where 子句的 bizCustomerIncoming_id 字段，和子查询 where 字句的 cid 字段都有高效

SQL优化 SQL子查询优化子查询优化

mybatis (plus) 自定义分页查询

接上一篇：mybatis Interceptor拦截器实现自定义扩展查询兼容mybatis plus-xqlee (blog.xqlee.com)这里进行自定义分页查询扩展，基于mybatis ...

mybatis 自定义分页查询 mybatis plus 自定义分页查询 mybatis

Java JDBC 查询结果集赋值给JSON/JSONArray

Java编程使用原生JDBC查询数据库数据，并将返回结果赋值给JSON/JSONArray对象，用于返回数据。源码参考：import java.sql.Resu

Java JDBC JDBC查询

SQL 优化-嵌套子查询和 JOIN 优化

常见优化技术我们先从工业实践角度总结，几条常见 MySQL 查询优化策略。索引优化为常用的查询条件（WHERE、JOIN、GROUP BY、ORDER BY）添

SQL优化 JOIN 子查询优化

mybatis Interceptor拦截器实现自定义扩展查询兼容mybatis plus

mybatis Interceptor拦截器实现自定义扩展查询兼容mybatis plus @Intercepts({ @Signature(type = Executor.c...

mybatis拦截器 mybatis自定义查询 mybatis扩展查询 mybatis plus

MySQL慢查询优化_MySQL慢查询排查_MySQL慢查询设置配置

mysql mysql调优

Graylog日志基本查询语法

简单介绍在graylog web界面如何查询想要的日志查询单个字符串 "your-info" 查询包含A和B的记录

graylog spring cloud

Mysql 查询逗号分隔字符串包含某个

有些时候为了方便，会把多个值存在一个字段里面，然后用逗号分隔

Mysql Mysql 逗号分隔查询 SQL

Apache Lucene Field / IndexField 类型和使用详解

Lucene Field简介

TextField

TextField 介绍

TextField 示例

执行结果

StringField

StringField简介

StringField示例

数字类

数字类型简介

数字类型示例

执行结果

数点类型

数点类型字段简介

数点类型示例

示例执行结果

StoredField

StoredField简介

示例

BinaryField

BinaryField 简介

示例

排序打分类型

简介

示例

执行结果

普通查询，未排序

查询+排序（降序）

reverse 排序字段剖析

Score - NaN

不同类型字段用同一个名问题

评论

相关文章

目录

热门文章

最近更新

`reverse` 排序字段剖析