第一次的 Elasticsearch
这篇文章是关于全文搜索引擎“Elasticsearch”的入门教程。
Elasticsearch 是什么?
(Elasticsearch is what?)
Elasticsearch 是由Elastic公司开发的开源全文搜索引擎,能够快速提取包含目标词汇的文档于大量文档中。
在Elasticsearch中,我们可以使用RESTful接口进行操作,但我们也可以使用Elasticsearch SQL来使用SQL语句编写查询。
对于习惯于Oracle或MySQL等关系型数据库的人来说,可能会觉得一开始很难入手。
然而,Elasticsearch的API非常简单,所以不用担心,没问题。
弹性堆栈是什么?
弹性堆栈是指与Elasticsearch相关的产品的总称。2.x版本之前称为“ELK”,但从5.0版本开始更名为“弹性堆栈”。
请参考另一篇文章《初识Logstash》以了解有关Logstash的内容。
操作环境
-
- Mac OS X 10.14.6
-
- Elasticsearch 8.1.0
- Kibana 8.1.0
安装
打开Mac终端并安装所需的软件。
使用wget命令来下载Elasticsearch,因此需要进行安装。
$ brew install wget
在中文中安装Elasticsearch和日语搜索插件「kuromoji」。
最新版本(8.0及更高)的软件已经兼容M1芯片的Mac电脑了,但请注意下载链接是分开的。
• 适用于Intel Mac的下载链接:
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.0-darwin-x86_64.tar.gz
• 适用于M1 Mac的下载链接:
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.0-darwin-aarch64.tar.gz
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.0-darwin-x86_64.tar.gz
$ tar -xzf elasticsearch-8.1.0-darwin-x86_64.tar.gz
$ cd elasticsearch-8.1.0
$ bin/elasticsearch-plugin install analysis-kuromoji
在8.0版本之后,默认启用了安全性,但在本教程中,我们会暂时禁用它。请编辑config/elasticsearch.yml文件,将”xpack.security.enabled”设置为false。
# Enable security features
xpack.security.enabled: false
启动 Elasticsearch。
$ bin/elasticsearch
请打开浏览器,访问 http://localhost:9200/。
如果显示以下的 JSON,则表示启动成功。
"name" : "username.local",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "qJnOX-ukSU-nX6hjUViLnA",
"version" : {
"number" : "8.1.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "3700f7679f7d95e36da0b43762189bab189bc53a",
"build_date" : "2022-03-03T14:20:00.690422633Z",
"build_snapshot" : false,
"lucene_version" : "9.0.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
如果出现“ERR_CONNECTION_REFUSED”错误,请打开 config/elasticsearch.yml,并检查是否将“xpack.security.enabled”设置为false。
接下来,我们将安装Kibana。
wget https://artifacts.elastic.co/downloads/kibana/kibana-8.1.0-darwin-x86_64.tar.gz
tar -xzf kibana-8.1.0-darwin-x86_64.tar.gz
cd kibana-8.1.0-darwin-x86_64/
我要启动Kibana。
$ bin/kibana
打开浏览器,尝试访问http://localhost:5601/。
如果显示以下类似画面,则表示启动成功。

在开始教程之前
请参考本文所使用的命令,已经在 GitHub 上公开了。
https://github.com/nskydiving/elasticsearch_examples
在这个教程中我们将使用Kibana的”Dev Tools”来操作Elasticsearch,它具有RESTful接口。
请从Kibana菜单中选择“Dev Tools”。

将会显示如下的屏幕。

在控制台的左侧区域输入命令并点击执行按钮(绿色的播放按钮),选定的命令将被执行,并在右侧区域显示执行结果。
我们来试着执行下面的命令吧。
GET _search
{
"query": {
"match_all": {}
}
}
若右侧区域显示了执行结果,则表示成功。
如果您使用的不是Mac环境,请参考以下链接:
https://www.elastic.co/jp/downloads/elasticsearch
https://www.elastic.co/jp/downloads/kibana
CRUD操作(RESTful API)
Elasticsearch使用了与关系型数据库不同的术语,但大致上可以理解为以下方式。
从版本6.0开始,已将“Type”指定视为不推荐使用,而改为使用“_doc”代替类型名称。
创建文件
要创建文档,请使用PUT方法访问“/索引/类型/文档ID”,并通过JSON传递文档内容。
– 命令
– 指令
– 操作
# +--- Index name
# | +--- Type name
# | | +--- Document ID
# | | |
# V V V
PUT /library/_doc/1
{
"title": "Norwegian Wood",
"name": {
"first": "Haruki",
"last": "Murakami"
},
"publish_date": "1987-09-04T00:00:00+0900",
"price": 19.95
}
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 15,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 22,
"_primary_term" : 1
}
获取文档
要获取文档,需要通过GET访问“/索引/类型/文档ID”。
指令
GET /library/_doc/1
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 15,
"_seq_no" : 22,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "Norwegian Wood",
"name" : {
"first" : "Haruki",
"last" : "Murakami"
},
"publish_date" : "1987-09-04T00:00:00+0900",
"price" : 19.95
}
}
创建文档时无需指定文档ID。
在创建文档时,若不指定文档ID,系统将自动分配一个文档ID。你可以通过执行结果来确认自动分配的文档ID。
指令
POST /library/_doc/
{
"title": "Kafka on the Shore",
"name": {
"first": "Haruki",
"last": "Murakami"
},
"publish_date": "2002-09-12T00:00:00+0900",
"price": 19.95
}
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "q2aZVmoBFWFSqRl8nY0k",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 23,
"_primary_term" : 1
}
确认如何创建文档而不指定文档ID。
命令
# POST /library/_doc/ で取得した id を指定してください
GET /library/_doc/q2aZVmoBFWFSqRl8nY0k
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "q2aZVmoBFWFSqRl8nY0k",
"_version" : 1,
"_seq_no" : 23,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "Kafka on the Shore",
"name" : {
"first" : "Haruki",
"last" : "Murakami"
},
"publish_date" : "2002-09-12T00:00:00+0900",
"price" : 19.95
}
}
覆盖更新文件
要想覆写更新文档,需要用PUT方法访问”/索引/类型/文档ID”。
命令
PUT /library/_doc/1
{
"title": "Norwegian Wood",
"name": {
"first": "Haruki",
"last": "Murakami"
},
"publish_date": "1987-09-04T00:00:00+0900",
"price": 29.95
}
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 18,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 26,
"_primary_term" : 1
}
请确认是否要覆盖更新文档。
命令
GET /library/_doc/1
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 18,
"_seq_no" : 26,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "Norwegian Wood",
"name" : {
"first" : "Haruki",
"last" : "Murakami"
},
"publish_date" : "1987-09-04T00:00:00+0900",
"price" : 29.95
}
}
更新部分文件的内容
要部分更新文档,可以使用POST方法访问”/索引/_update/文档ID”,并在JSON中指定”doc”查询。
命令
POST /library/_update/1
{
"doc": {
"price": 10
}
}
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 19,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 27,
"_primary_term" : 1
}
确认部分更新文档。
命令
GET /library/_doc/1
执行结果 (shí jié guǒ)
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 19,
"_seq_no" : 27,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "Norwegian Wood",
"name" : {
"first" : "Haruki",
"last" : "Murakami"
},
"publish_date" : "1987-09-04T00:00:00+0900",
"price" : 10
}
}
将项目添加到文件中。
命令
要在文档中添加项目,请使用POST方法访问”/索引/_update/文档ID”,并在JSON中指定”doc”查询。
POST /library/_update/1
{
"doc": {
"price_jpy": 1800
}
}
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 20,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 28,
"_primary_term" : 1
}
「确认在文档中添加项目」
指令 (Zhi Ling)
GET /library/_doc/1
执行结果 (shí jié guǒ)
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 20,
"_seq_no" : 28,
"_primary_term" : 1,
"found" : true,
"_source" : {
"title" : "Norwegian Wood",
"name" : {
"first" : "Haruki",
"last" : "Murakami"
},
"publish_date" : "1987-09-04T00:00:00+0900",
"price" : 10,
"price_jpy" : 1800
}
}
删除文件
要删除文档,请使用DELETE方法访问”/索引/类型/文档ID”。
命令
DELETE /library/_doc/1
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_version" : 21,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 29,
"_primary_term" : 1
}
确认要删除文件吗?
命令
GET /library/_doc/1
执行结果
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"found" : false
}
删除索引
要删除索引,请使用DELETE请求访问“/索引”。
指令
DELETE /library
执行结果。
{
"acknowledged" : true
}
确认是否要删除索引。
指令
GET /libray/_doc/2
执行结果
{
"error" : {
"root_cause" : [
{
"type" : "index_not_found_exception",
"reason" : "no such index [libray]",
"resource.type" : "index_expression",
"resource.id" : "libray",
"index_uuid" : "_na_",
"index" : "libray"
}
],
"type" : "index_not_found_exception",
"reason" : "no such index [libray]",
"resource.type" : "index_expression",
"resource.id" : "libray",
"index_uuid" : "_na_",
"index" : "libray"
},
"status" : 404
}
搜索文档
搜索保存在索引中的文档。
为了准备下一步操作,首先删除索引,然后创建测试数据的文档。
要一次性创建多个文档,请使用POST方法访问“/索引/类型/_bulk”。
指令
DELETE /library
POST /library/_bulk
{"index": {"_id": 1}}
{"title": "The quick brown fox", "price": 5}
{"index": {"_id": 2}}
{"title": "The quick brown fox jumps over the lazy dog", "price": 15}
{"index": {"_id": 3}}
{"title": "The quick brown fox jumps over the quick dog", "price": 8}
{"index": {"_id": 4}}
{"title": "Brown fox and brown dog", "price": 2}
{"index": {"_id": 5}}
{"title": "Lazy dog", "price": 9}
搜索所有文件
要搜索所有文档,请使用GET方法访问”/索引/类型/_search”。
同样,你也可以通过 size 参数来指定搜索结果的数量,例如 “/library/_search?size=3″。
命令
GET /library/_search
执行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "The quick brown fox",
"price" : 5
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"title" : "Brown fox and brown dog",
"price" : 2
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"title" : "Lazy dog",
"price" : 9
}
}
]
}
}
搜索包含指定单词的文件。
要搜索包含指定单词的文档,需要通过GET请求访问“/索引/_search”,并在JSON中指定“match”查询。
我在这里搜索包含标题为“fox”的文档。
命令
GET /library/_search
{
"query": {
"match": {
"title": "fox"
}
}
}
实施结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 0.32951736,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.32951736,
"_source" : {
"title" : "The quick brown fox",
"price" : 5
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.30488566,
"_source" : {
"title" : "Brown fox and brown dog",
"price" : 2
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.23470737,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.23470737,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
}
}
]
}
}
使用OR条件进行文档搜索。
要通过OR条件在文档中搜索,可以使用GET方法访问”/索引/_search”,然后在JSON中指定”match”查询。
在这里,我们正在搜索包含标题为“quick”或“dog”的文档。
指令
GET /library/_search
{
"query": {
"match": {
"title": "quick dog"
}
}
}
搜索结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 0.8762741,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.8762741,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.6744513,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6173784,
"_source" : {
"title" : "The quick brown fox",
"price" : 5
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.3930218,
"_source" : {
"title" : "Lazy dog",
"price" : 9
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.30488566,
"_source" : {
"title" : "Brown fox and brown dog",
"price" : 2
}
}
]
}
}
搜索包含空白单词的文档。
要搜索包含带有空格的单词的文档,可以使用GET方法访问”/索引/_search”,并使用JSON指定”match_phrase”查询。
我正在这里搜索包含“quick dog”作为标题的文档。
命令
GET /library/_search
{
"query": {
"match_phrase": {
"title": "quick dog"
}
}
}
执行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.67445135,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.67445135,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
}
}
]
}
}
展示文档搜索的得分
在搜索文档时,会计算与指定单词的相关性作为评分。
通过在命令中指定”explain”参数,可以确认分数是如何计算出来的。
命令
GET /library/_search?explain
{
"query": {
"match": {
"title": "quick"
}
}
}
执行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.64156675,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.64156675,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6173784,
"_source" : {
"title" : "The quick brown fox",
"price" : 5
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.43974394,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
}
}
]
}
}
在条件AND下搜索文档。
要通过AND条件搜索文档,需要使用GET方法访问“/索引/_search”,并使用JSON指定“bool”查询。
在这里,我们正在搜索包含“quick”和“lazy dog”两个词的文档作为“title”。
指令
GET /library/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "quick"
}
},
{
"match_phrase": {
"title": "lazy dog"
}
}
]
}
}
}
执行结果 (shí jié guǒ)
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.3887084,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.3887084,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
}
}
]
}
}
为文档搜索的分数进行加权。
要对文档搜索的评分进行加权,需要使用GET方法访问”/索引/_search”,并在JSON中指定”boost”查询。
在这里,我们将包含“quick dog”标题的文档的分数减半。
命令
GET /library/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "quick dog",
"boost": 0.5
}
}
},
{
"match_phrase": {
"title": {
"query": "lazy dog"
}
}
}
]
}
}
}
执行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.5890584,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.5890584,
"_source" : {
"title" : "Lazy dog",
"price" : 9
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9489645,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.33722568,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
}
}
]
}
}
将文档搜索结果以突出显示的方式展示
要将文档搜索结果以高亮方式显示,需要使用GET方法,访问”/索引/_search”,并指定JSON中的”highlight”查询。
搜索结果中的字符串将由 em 标签包围并输出。
命令
GET /library/_search
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"title": {
"query": "quick dog",
"boost": 0.5
}
}
},
{
"match_phrase": {
"title": {
"query": "lazy dog"
}
}
}
]
}
},
"highlight": {
"fields": {
"title": {}
}
}
}
执行结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.5890584,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.5890584,
"_source" : {
"title" : "Lazy dog",
"price" : 9
},
"highlight" : {
"title" : [
"<em>Lazy</em> <em>dog</em>"
]
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9489645,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
},
"highlight" : {
"title" : [
"The quick brown fox jumps over the <em>lazy</em> <em>dog</em>"
]
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.33722568,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
},
"highlight" : {
"title" : [
"The quick brown fox jumps over the <em>quick</em> <em>dog</em>"
]
}
}
]
}
}
进行文档筛选并搜索。
要使用过滤器搜索文档,可以使用GET方法访问”/索引/_search”,并在JSON中指定”filter”查询。
这里正在搜索价格为5到10的文件。
命令
GET /library/_search
{
"query": {
"bool": {
"filter": {
"range": {
"price": {
"gte": 5,
"lte": 10
}
}
}
}
}
}
执行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.0,
"_source" : {
"title" : "The quick brown fox",
"price" : 5
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"title" : "The quick brown fox jumps over the quick dog",
"price" : 8
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "5",
"_score" : 0.0,
"_source" : {
"title" : "Lazy dog",
"price" : 9
}
}
]
}
}
使用其他查询和筛选功能来搜索文档。
要通过组合其他查询和筛选来搜索文档,则需要使用 GET 方法访问“/索引/_search”,并在JSON中同时指定其他查询和“filter”查询。
在这里,我们正在搜索标题包含“懒狗”且价格大于等于5的文档。
命令
GET /library/_search
{
"query": {
"bool": {
"must": [{
"match_phrase": {
"title": "lazy dog"
}
}],
"filter": {
"range": {
"price": {
"gte": 5
}
}
}
}
}
}
执行结果
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.5890584,
"hits" : [
{
"_index" : "library",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.5890584,
"_source" : {
"title" : "Lazy dog",
"price" : 9
}
},
{
"_index" : "library",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.9489645,
"_source" : {
"title" : "The quick brown fox jumps over the lazy dog",
"price" : 15
}
}
]
}
}
映射 shè)
Elasticsearch 是一种无需预先设置映射的无模式(schema-less)数据库。
作为准备工作,首先删除索引,然后创建测试数据文档。
指令 (Zhi Ling)
DELETE /library
POST /library/_bulk
{"index": {"_id": 1}}
{"title": "The quick brown fox", "price": 5}
{"index": {"_id": 2}}
{"title": "The quick brown fox jumps over the lazy dog", "price": 15}
{"index": {"_id": 3}}
{"title": "The quick brown fox jumps over the quick dog", "price": 8}
{"index": {"_id": 4}}
{"title": "Brown fox and brown dog", "price": 2}
{"index": {"_id": 5}}
{"title": "Lazy dog", "price": 9}
获取映射
命令
GET /library/_mapping
执行结果
{
"library" : {
"mappings" : {
"properties" : {
"price" : {
"type" : "long"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
添加映射
我要添加一个名为”my_new_field”的新映射。
命令
PUT /library/_mapping
{
"properties": {
"my_new_field": {
"type": "text"
}
}
}
执行结果
{
"acknowledged" : true
}
请确认是否要添加地图映射。
命令
GET /library/_mapping
执行结果
{
"library" : {
"mappings" : {
"properties" : {
"my_new_field" : {
"type" : "text"
},
"price" : {
"type" : "long"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
设置分析器并添加映射
在中文中,将以下内容进行释义:设置映射的分析器需指定”analyzer”查询。
命令
PUT /library/_mapping
{
"properties": {
"english_field": {
"type": "text",
"analyzer": "english"
}
}
}
运行结果
{
"acknowledged" : true
}
确认设置分析器并添加映射。
指令 (ZhiLing)
GET /library/_mapping
执行结果
{
"library" : {
"mappings" : {
"properties" : {
"english_field" : {
"type" : "text",
"analyzer" : "english"
},
"my_new_field" : {
"type" : "text"
},
"price" : {
"type" : "long"
},
"title" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
地图无法更改。
由于无法更改一次性追加的映射,以下命令将会报错。
命令
PUT /library/_mapping
{
"properties": {
"english_field": {
"type": "double"
}
}
}
执行结果
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "mapper [english_field] of different type, current_type [text], merged_type [double]"
}
],
"type": "illegal_argument_exception",
"reason": "mapper [english_field] of different type, current_type [text], merged_type [double]"
},
"status": 400
}
不同类型对搜索结果的影响
向「/log」添加了两个文档,其id分别为「234571」和「1392.223」,并通过指定搜索条件「id 大于等于 1392」进行了搜索。
期望能够同时搜索到「234571」和「1392.223」,但实际上只有「234571」能够被搜索到。
指令 (Zhi Ling)
POST /log/_doc
{
"id": 234571
}
POST /log/_doc
{
"id": 1392.223
}
GET /log/_search
{
"query": {
"bool": {
"filter": {
"range": {
"id": {
"gt": 1392
}
}
}
}
}
}
程序结果
# POST /log/_doc
{
"_index" : "log",
"_type" : "_doc",
"_id" : "r2axVmoBFWFSqRl86Y1N",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
# POST /log/_doc
{
"_index" : "log",
"_type" : "_doc",
"_id" : "sGayVmoBFWFSqRl8S414",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
# GET /log/_search
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "log",
"_type" : "_doc",
"_id" : "r2axVmoBFWFSqRl86Y1N",
"_score" : 0.0,
"_source" : {
"id" : 234571
}
}
]
}
}
确认 “型的差异对搜索结果的影响”
获取映射时,“id”的类型是长整型。
此外,当搜索所有的“log”索引时,可以发现还添加了类型不匹配的“id”为“1392.223”的文档。
指令 (Zhi Ling)
GET /log/_mapping
GET /log/_search
执行结果
# GET /log/_mapping
{
"log" : {
"mappings" : {
"properties" : {
"id" : {
"type" : "long"
}
}
}
}
}
# GET /log/_search
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "log",
"_type" : "_doc",
"_id" : "r2axVmoBFWFSqRl86Y1N",
"_score" : 1.0,
"_source" : {
"id" : 234571
}
},
{
"_index" : "log",
"_type" : "_doc",
"_id" : "sGayVmoBFWFSqRl8S414",
"_score" : 1.0,
"_source" : {
"id" : 1392.223
}
}
]
}
}
分析
您可以查看Elasticsearch如何分析字符串。
为了准备下一步操作,我们需要先删除索引,然后创建测试数据的文档。
命令
DELETE /library
POST /library/_bulk
{"index": {"_id": 1}}
{"title": "The quick brown fox", "price": 5}
{"index": {"_id": 2}}
{"title": "The quick brown fox jumps over the lazy dog", "price": 15}
{"index": {"_id": 3}}
{"title": "The quick brown fox jumps over the quick dog", "price": 8}
{"index": {"_id": 4}}
{"title": "Brown fox and brown dog", "price": 2}
{"index": {"_id": 5}}
{"title": "Lazy dog", "price": 9}
显示文本字符串的分析结果
展示”棕色的狐狸棕色的狗”的分析结果。
能够确定这四个单词被分解为”棕色”、”狐狸”、”棕色”、”狗”。
指令
GET /library/_analyze
{
"tokenizer": "standard",
"text": "Brown fox brown dog"
}
运行结果
{
"tokens" : [
{
"token" : "Brown",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "fox",
"start_offset" : 6,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "brown",
"start_offset" : 10,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "dog",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
请使用指定的过滤器来显示字符串的分析结果。
如果在”filter”查询中指定”lowercase”,则会将字符串转换为小写形式进行分析。
可以看出,最初的”Brown”被分析为”brown”。
指令 (Zhi Ling)
GET /library/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase"],
"text": "Brown fox brown dog"
}
执行结果 (shí jié guǒ)
{
"tokens" : [
{
"token" : "brown",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "fox",
"start_offset" : 6,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "brown",
"start_offset" : 10,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "dog",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
指定多个过滤器以显示字符串分析结果。
如果在”filter”查询中指定”lowercase”和”unique”,将会将字符串转换为小写,并且删除重复的单词进行分析。
我们可以看到”Brown”被转换成小写,并且重复的第二个”brown”被删除了。
命令
GET /library/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase","unique"],
"text": "Brown brown brown fox brown dog"
}
执行结果
{
"tokens" : [
{
"token" : "brown",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "fox",
"start_offset" : 18,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "dog",
"start_offset" : 28,
"end_offset" : 31,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
显示基于分词器的不同分析结果
当指定标准为 “standard” 和 “letter” 时,将比较分析结果。
可以发现,“quick.brown_fox”部分的分析结果是不同的。
请执行指令
GET /library/_analyze
{
"tokenizer": "standard",
"filter": ["lowercase"],
"text": "THE quick.brown_FOx Jumped! $19.95 @ 3.0"
}
GET /library/_analyze
{
"tokenizer": "letter",
"filter": ["lowercase"],
"text": "THE quick.brown_FOx Jumped! $19.95 @ 3.0"
}
执行结果
# GET /library/_analyze
{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "quick.brown_fox",
"start_offset" : 4,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "jumped",
"start_offset" : 20,
"end_offset" : 26,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "19.95",
"start_offset" : 29,
"end_offset" : 34,
"type" : "<NUM>",
"position" : 3
},
{
"token" : "3.0",
"start_offset" : 37,
"end_offset" : 40,
"type" : "<NUM>",
"position" : 4
}
]
}
# GET /library/_analyze
{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "quick",
"start_offset" : 4,
"end_offset" : 9,
"type" : "word",
"position" : 1
},
{
"token" : "brown",
"start_offset" : 10,
"end_offset" : 15,
"type" : "word",
"position" : 2
},
{
"token" : "fox",
"start_offset" : 16,
"end_offset" : 19,
"type" : "word",
"position" : 3
},
{
"token" : "jumped",
"start_offset" : 20,
"end_offset" : 26,
"type" : "word",
"position" : 4
}
]
}
展示包含日语的文件分析结果。
由于Elasticsearch默认没有安装支持日文的分词器,因此需要预先安装kuromoji插件。
命令
GET /library/_analyze
{
"tokenizer": "kuromoji_tokenizer",
"text": "記者が汽車で帰社した"
}
执行结果
{
"tokens" : [
{
"token" : "記者",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
},
{
"token" : "が",
"start_offset" : 2,
"end_offset" : 3,
"type" : "word",
"position" : 1
},
{
"token" : "汽車",
"start_offset" : 3,
"end_offset" : 5,
"type" : "word",
"position" : 2
},
{
"token" : "で",
"start_offset" : 5,
"end_offset" : 6,
"type" : "word",
"position" : 3
},
{
"token" : "帰社",
"start_offset" : 6,
"end_offset" : 8,
"type" : "word",
"position" : 4
},
{
"token" : "し",
"start_offset" : 8,
"end_offset" : 9,
"type" : "word",
"position" : 5
},
{
"token" : "た",
"start_offset" : 9,
"end_offset" : 10,
"type" : "word",
"position" : 6
}
]
}