第一次的 Elasticsearch

2 年 ago

文, 翔

13 minutes

这篇文章是关于全文搜索引擎“Elasticsearch”的入门教程。

Elasticsearch 是什么？
(Elasticsearch is what?)

Elasticsearch 是由Elastic公司开发的开源全文搜索引擎，能够快速提取包含目标词汇的文档于大量文档中。

在Elasticsearch中，我们可以使用RESTful接口进行操作，但我们也可以使用Elasticsearch SQL来使用SQL语句编写查询。

对于习惯于Oracle或MySQL等关系型数据库的人来说，可能会觉得一开始很难入手。

然而，Elasticsearch的API非常简单，所以不用担心，没问题。

弹性堆栈是什么？

弹性堆栈是指与Elasticsearch相关的产品的总称。2.x版本之前称为“ELK”，但从5.0版本开始更名为“弹性堆栈”。

製品名機能Elasticsearchドキュメントを保存・検索します。Kibanaデータを可視化します。Logstashデータソースからデータを取り込み・変換します。Beatsデータソースからデータを取り込みます。X-Packセキュリティ、モニタリング、ウォッチ、レポート、グラフの機能を拡張します。

请参考另一篇文章《初识Logstash》以了解有关Logstash的内容。

操作环境

Kibana 8.1.0

安装

打开Mac终端并安装所需的软件。

使用wget命令来下载Elasticsearch，因此需要进行安装。

$ brew install wget

在中文中安装Elasticsearch和日语搜索插件「kuromoji」。

最新版本(8.0及更高)的软件已经兼容M1芯片的Mac电脑了，但请注意下载链接是分开的。
• 适用于Intel Mac的下载链接：
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.0-darwin-x86_64.tar.gz
• 适用于M1 Mac的下载链接：
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.0-darwin-aarch64.tar.gz

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.0-darwin-x86_64.tar.gz
$ tar -xzf elasticsearch-8.1.0-darwin-x86_64.tar.gz
$ cd elasticsearch-8.1.0
$ bin/elasticsearch-plugin install analysis-kuromoji

在8.0版本之后，默认启用了安全性，但在本教程中，我们会暂时禁用它。请编辑config/elasticsearch.yml文件，将”xpack.security.enabled”设置为false。

# Enable security features
xpack.security.enabled: false

启动 Elasticsearch。

$ bin/elasticsearch

请打开浏览器，访问 http://localhost:9200/。
如果显示以下的 JSON，则表示启动成功。


  "name" : "username.local",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "qJnOX-ukSU-nX6hjUViLnA",
  "version" : {
    "number" : "8.1.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "3700f7679f7d95e36da0b43762189bab189bc53a",
    "build_date" : "2022-03-03T14:20:00.690422633Z",
    "build_snapshot" : false,
    "lucene_version" : "9.0.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

如果出现“ERR_CONNECTION_REFUSED”错误，请打开 config/elasticsearch.yml，并检查是否将“xpack.security.enabled”设置为false。

接下来，我们将安装Kibana。

wget https://artifacts.elastic.co/downloads/kibana/kibana-8.1.0-darwin-x86_64.tar.gz
tar -xzf kibana-8.1.0-darwin-x86_64.tar.gz
cd kibana-8.1.0-darwin-x86_64/

我要启动Kibana。

$ bin/kibana

打开浏览器，尝试访问http://localhost:5601/。
如果显示以下类似画面，则表示启动成功。

在开始教程之前

请参考本文所使用的命令，已经在 GitHub 上公开了。
https://github.com/nskydiving/elasticsearch_examples

在这个教程中我们将使用Kibana的”Dev Tools”来操作Elasticsearch，它具有RESTful接口。

请从Kibana菜单中选择“Dev Tools”。

将会显示如下的屏幕。

在控制台的左侧区域输入命令并点击执行按钮（绿色的播放按钮），选定的命令将被执行，并在右侧区域显示执行结果。

我们来试着执行下面的命令吧。

GET _search
{
  "query": {
    "match_all": {}
  }
}

若右侧区域显示了执行结果，则表示成功。

如果您使用的不是Mac环境，请参考以下链接：
https://www.elastic.co/jp/downloads/elasticsearch
https://www.elastic.co/jp/downloads/kibana

CRUD操作（RESTful API）

Elasticsearch使用了与关系型数据库不同的术语，但大致上可以理解为以下方式。

ElasticsearchリレーショナルデータベースIndexデータベースTypeテーブルDocumentレコード

从版本6.0开始，已将“Type”指定视为不推荐使用，而改为使用“_doc”代替类型名称。

创建文件

要创建文档，请使用PUT方法访问“/索引/类型/文档ID”，并通过JSON传递文档内容。

– 命令
– 指令
– 操作

#    +--- Index name
#    |       +--- Type name
#    |       |     +--- Document ID
#    |       |     |
#    V       V     V
PUT /library/_doc/1
{
  "title": "Norwegian Wood",
  "name": {
    "first": "Haruki",
    "last": "Murakami"
  },
  "publish_date": "1987-09-04T00:00:00+0900",
  "price": 19.95
}

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 15,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 22,
  "_primary_term" : 1
}

获取文档

要获取文档，需要通过GET访问“/索引/类型/文档ID”。

指令

GET /library/_doc/1

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 15,
  "_seq_no" : 22,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Norwegian Wood",
    "name" : {
      "first" : "Haruki",
      "last" : "Murakami"
    },
    "publish_date" : "1987-09-04T00:00:00+0900",
    "price" : 19.95
  }
}

创建文档时无需指定文档ID。

在创建文档时，若不指定文档ID，系统将自动分配一个文档ID。你可以通过执行结果来确认自动分配的文档ID。

指令

POST /library/_doc/
{
  "title": "Kafka on the Shore",
  "name": {
    "first": "Haruki",
    "last": "Murakami"
  },
  "publish_date": "2002-09-12T00:00:00+0900",
  "price": 19.95
}

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "q2aZVmoBFWFSqRl8nY0k",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 23,
  "_primary_term" : 1
}

确认如何创建文档而不指定文档ID。

命令

# POST /library/_doc/ で取得した id を指定してください
GET /library/_doc/q2aZVmoBFWFSqRl8nY0k

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "q2aZVmoBFWFSqRl8nY0k",
  "_version" : 1,
  "_seq_no" : 23,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Kafka on the Shore",
    "name" : {
      "first" : "Haruki",
      "last" : "Murakami"
    },
    "publish_date" : "2002-09-12T00:00:00+0900",
    "price" : 19.95
  }
}

覆盖更新文件

要想覆写更新文档，需要用PUT方法访问”/索引/类型/文档ID”。

命令

PUT /library/_doc/1
{
  "title": "Norwegian Wood",
  "name": {
    "first": "Haruki",
    "last": "Murakami"
  },
  "publish_date": "1987-09-04T00:00:00+0900",
  "price": 29.95
}

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 18,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 26,
  "_primary_term" : 1
}

请确认是否要覆盖更新文档。

命令

GET /library/_doc/1

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 18,
  "_seq_no" : 26,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Norwegian Wood",
    "name" : {
      "first" : "Haruki",
      "last" : "Murakami"
    },
    "publish_date" : "1987-09-04T00:00:00+0900",
    "price" : 29.95
  }
}

更新部分文件的内容

要部分更新文档，可以使用POST方法访问”/索引/_update/文档ID”，并在JSON中指定”doc”查询。

命令

POST /library/_update/1
{
  "doc": {
    "price": 10
  }
}

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 19,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 27,
  "_primary_term" : 1
}

确认部分更新文档。

命令

GET /library/_doc/1

执行结果 (shí jié guǒ)

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 19,
  "_seq_no" : 27,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Norwegian Wood",
    "name" : {
      "first" : "Haruki",
      "last" : "Murakami"
    },
    "publish_date" : "1987-09-04T00:00:00+0900",
    "price" : 10
  }
}

将项目添加到文件中。

命令

要在文档中添加项目，请使用POST方法访问”/索引/_update/文档ID”，并在JSON中指定”doc”查询。

POST /library/_update/1
{
  "doc": {
    "price_jpy": 1800
  }
}

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 20,
  "result" : "updated",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 28,
  "_primary_term" : 1
}

「确认在文档中添加项目」

指令 (Zhi Ling)

GET /library/_doc/1

执行结果 (shí jié guǒ)

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 20,
  "_seq_no" : 28,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Norwegian Wood",
    "name" : {
      "first" : "Haruki",
      "last" : "Murakami"
    },
    "publish_date" : "1987-09-04T00:00:00+0900",
    "price" : 10,
    "price_jpy" : 1800
  }
}

删除文件

要删除文档，请使用DELETE方法访问”/索引/类型/文档ID”。

命令

DELETE /library/_doc/1

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 21,
  "result" : "deleted",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 29,
  "_primary_term" : 1
}

确认要删除文件吗？

命令

GET /library/_doc/1

执行结果

{
  "_index" : "library",
  "_type" : "_doc",
  "_id" : "1",
  "found" : false
}

删除索引

要删除索引，请使用DELETE请求访问“/索引”。

指令

DELETE /library

执行结果。

{
  "acknowledged" : true
}

确认是否要删除索引。

指令

GET /libray/_doc/2

执行结果

{
  "error" : {
    "root_cause" : [
      {
        "type" : "index_not_found_exception",
        "reason" : "no such index [libray]",
        "resource.type" : "index_expression",
        "resource.id" : "libray",
        "index_uuid" : "_na_",
        "index" : "libray"
      }
    ],
    "type" : "index_not_found_exception",
    "reason" : "no such index [libray]",
    "resource.type" : "index_expression",
    "resource.id" : "libray",
    "index_uuid" : "_na_",
    "index" : "libray"
  },
  "status" : 404
}

搜索文档

搜索保存在索引中的文档。

为了准备下一步操作，首先删除索引，然后创建测试数据的文档。

要一次性创建多个文档，请使用POST方法访问“/索引/类型/_bulk”。

指令

DELETE /library

POST /library/_bulk
{"index": {"_id": 1}}
{"title": "The quick brown fox", "price": 5}
{"index": {"_id": 2}}
{"title": "The quick brown fox jumps over the lazy dog", "price": 15}
{"index": {"_id": 3}}
{"title": "The quick brown fox jumps over the quick dog", "price": 8}
{"index": {"_id": 4}}
{"title": "Brown fox and brown dog", "price": 2}
{"index": {"_id": 5}}
{"title": "Lazy dog", "price": 9}

搜索所有文件

要搜索所有文档，请使用GET方法访问”/索引/类型/_search”。

同样，你也可以通过 size 参数来指定搜索结果的数量，例如 “/library/_search?size=3″。

命令

GET /library/_search

执行结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "title" : "The quick brown fox",
          "price" : 5
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "title" : "Brown fox and brown dog",
          "price" : 2
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "title" : "Lazy dog",
          "price" : 9
        }
      }
    ]
  }
}

搜索包含指定单词的文件。

要搜索包含指定单词的文档，需要通过GET请求访问“/索引/_search”，并在JSON中指定“match”查询。

我在这里搜索包含标题为“fox”的文档。

命令

GET /library/_search
{
  "query": {
    "match": {
      "title": "fox"
    }
  }
}

实施结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.32951736,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.32951736,
        "_source" : {
          "title" : "The quick brown fox",
          "price" : 5
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.30488566,
        "_source" : {
          "title" : "Brown fox and brown dog",
          "price" : 2
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.23470737,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.23470737,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        }
      }
    ]
  }
}

使用OR条件进行文档搜索。

要通过OR条件在文档中搜索，可以使用GET方法访问”/索引/_search”，然后在JSON中指定”match”查询。

在这里，我们正在搜索包含标题为“quick”或“dog”的文档。

指令

GET /library/_search
{
  "query": {
    "match": {
      "title": "quick dog"
    }
  }
}

搜索结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 0.8762741,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.8762741,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.6744513,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6173784,
        "_source" : {
          "title" : "The quick brown fox",
          "price" : 5
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.3930218,
        "_source" : {
          "title" : "Lazy dog",
          "price" : 9
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.30488566,
        "_source" : {
          "title" : "Brown fox and brown dog",
          "price" : 2
        }
      }
    ]
  }
}

搜索包含空白单词的文档。

要搜索包含带有空格的单词的文档，可以使用GET方法访问”/索引/_search”，并使用JSON指定”match_phrase”查询。

我正在这里搜索包含“quick dog”作为标题的文档。

命令

GET /library/_search
{
  "query": {
    "match_phrase": {
      "title": "quick dog"
    }
  }
}

执行结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.67445135,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.67445135,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        }
      }
    ]
  }
}

展示文档搜索的得分

在搜索文档时，会计算与指定单词的相关性作为评分。

通过在命令中指定”explain”参数，可以确认分数是如何计算出来的。

命令

GET /library/_search?explain
{
  "query": {
    "match": {
      "title": "quick"
    }
  }
}

执行结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.64156675,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.64156675,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.6173784,
        "_source" : {
          "title" : "The quick brown fox",
          "price" : 5
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.43974394,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        }
      }
    ]
  }
}

在条件AND下搜索文档。

要通过AND条件搜索文档，需要使用GET方法访问“/索引/_search”，并使用JSON指定“bool”查询。

在这里，我们正在搜索包含“quick”和“lazy dog”两个词的文档作为“title”。

指令

GET /library/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "quick"
          }
        },
        {
          "match_phrase": {
            "title": "lazy dog"
          }
        }
      ]
    }
  }
}

执行结果 (shí jié guǒ)

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.3887084,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.3887084,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        }
      }
    ]
  }
}

为文档搜索的分数进行加权。

要对文档搜索的评分进行加权，需要使用GET方法访问”/索引/_search”，并在JSON中指定”boost”查询。

在这里，我们将包含“quick dog”标题的文档的分数减半。

命令

GET /library/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "title": {
              "query": "quick dog",
              "boost": 0.5
            }
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "lazy dog"
            }
          }
        }
      ]
    }
  }
}

执行结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.5890584,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.5890584,
        "_source" : {
          "title" : "Lazy dog",
          "price" : 9
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.9489645,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.33722568,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        }
      }
    ]
  }
}

将文档搜索结果以突出显示的方式展示

要将文档搜索结果以高亮方式显示，需要使用GET方法，访问”/索引/_search”，并指定JSON中的”highlight”查询。

搜索结果中的字符串将由 em 标签包围并输出。

命令

GET /library/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match_phrase": {
            "title": {
              "query": "quick dog",
              "boost": 0.5
            }
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "lazy dog"
            }
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "title": {}
    }
  }
}

执行结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.5890584,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.5890584,
        "_source" : {
          "title" : "Lazy dog",
          "price" : 9
        },
        "highlight" : {
          "title" : [
            "<em>Lazy</em> <em>dog</em>"
          ]
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.9489645,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        },
        "highlight" : {
          "title" : [
            "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>"
          ]
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.33722568,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        },
        "highlight" : {
          "title" : [
            "The quick brown fox jumps over the <em>quick</em> <em>dog</em>"
          ]
        }
      }
    ]
  }
}

进行文档筛选并搜索。

要使用过滤器搜索文档，可以使用GET方法访问”/索引/_search”，并在JSON中指定”filter”查询。

这里正在搜索价格为5到10的文件。

命令

GET /library/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "price": {
            "gte": 5,
            "lte": 10
          }
        }
      }
    }
  }
}

执行结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "title" : "The quick brown fox",
          "price" : 5
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "title" : "The quick brown fox jumps over the quick dog",
          "price" : 8
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.0,
        "_source" : {
          "title" : "Lazy dog",
          "price" : 9
        }
      }
    ]
  }
}

使用其他查询和筛选功能来搜索文档。

要通过组合其他查询和筛选来搜索文档，则需要使用 GET 方法访问“/索引/_search”，并在JSON中同时指定其他查询和“filter”查询。

在这里，我们正在搜索标题包含“懒狗”且价格大于等于5的文档。

命令

GET /library/_search
{
  "query": {
    "bool": {
      "must": [{
        "match_phrase": {
          "title": "lazy dog"
        }
        }],
      "filter": {
        "range": {
          "price": {
            "gte": 5
          }
        }
      }
    }
  }
}

执行结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.5890584,
    "hits" : [
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.5890584,
        "_source" : {
          "title" : "Lazy dog",
          "price" : 9
        }
      },
      {
        "_index" : "library",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.9489645,
        "_source" : {
          "title" : "The quick brown fox jumps over the lazy dog",
          "price" : 15
        }
      }
    ]
  }
}

映射 shè)

Elasticsearch 是一种无需预先设置映射的无模式（schema-less）数据库。

作为准备工作，首先删除索引，然后创建测试数据文档。

指令 (Zhi Ling)

DELETE /library

POST /library/_bulk
{"index": {"_id": 1}}
{"title": "The quick brown fox", "price": 5}
{"index": {"_id": 2}}
{"title": "The quick brown fox jumps over the lazy dog", "price": 15}
{"index": {"_id": 3}}
{"title": "The quick brown fox jumps over the quick dog", "price": 8}
{"index": {"_id": 4}}
{"title": "Brown fox and brown dog", "price": 2}
{"index": {"_id": 5}}
{"title": "Lazy dog", "price": 9}

获取映射

命令

GET /library/_mapping

执行结果

{
  "library" : {
    "mappings" : {
      "properties" : {
        "price" : {
          "type" : "long"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

添加映射

我要添加一个名为”my_new_field”的新映射。

命令

PUT /library/_mapping
{
  "properties": {
    "my_new_field": {
      "type": "text"
    }
  }
}

执行结果

{
  "acknowledged" : true
}

请确认是否要添加地图映射。

命令

GET /library/_mapping

执行结果

{
  "library" : {
    "mappings" : {
      "properties" : {
        "my_new_field" : {
          "type" : "text"
        },
        "price" : {
          "type" : "long"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

设置分析器并添加映射

在中文中，将以下内容进行释义：设置映射的分析器需指定”analyzer”查询。

命令

PUT /library/_mapping
{
  "properties": {
    "english_field": {
      "type": "text",
      "analyzer": "english"
    }
  }
}

运行结果

{
  "acknowledged" : true
}

确认设置分析器并添加映射。

指令 (ZhiLing)

GET /library/_mapping

执行结果

{
  "library" : {
    "mappings" : {
      "properties" : {
        "english_field" : {
          "type" : "text",
          "analyzer" : "english"
        },
        "my_new_field" : {
          "type" : "text"
        },
        "price" : {
          "type" : "long"
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

地图无法更改。

由于无法更改一次性追加的映射，以下命令将会报错。

命令

PUT /library/_mapping
{
  "properties": {
    "english_field": {
      "type": "double"
    }
  }
}

执行结果

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "mapper [english_field] of different type, current_type [text], merged_type [double]"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "mapper [english_field] of different type, current_type [text], merged_type [double]"
  },
  "status": 400
}

不同类型对搜索结果的影响

向「/log」添加了两个文档，其id分别为「234571」和「1392.223」，并通过指定搜索条件「id 大于等于 1392」进行了搜索。

期望能够同时搜索到「234571」和「1392.223」，但实际上只有「234571」能够被搜索到。

指令 (Zhi Ling)

POST /log/_doc
{
  "id": 234571
}

POST /log/_doc
{
  "id": 1392.223
}

GET /log/_search
{
  "query": {
    "bool": {
      "filter": {
        "range": {
          "id": {
            "gt": 1392
          }
        }
      }
    }
  }
}

程序结果

# POST /log/_doc
{
  "_index" : "log",
  "_type" : "_doc",
  "_id" : "r2axVmoBFWFSqRl86Y1N",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

# POST /log/_doc
{
  "_index" : "log",
  "_type" : "_doc",
  "_id" : "sGayVmoBFWFSqRl8S414",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 1,
  "_primary_term" : 1
}

# GET /log/_search
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "log",
        "_type" : "_doc",
        "_id" : "r2axVmoBFWFSqRl86Y1N",
        "_score" : 0.0,
        "_source" : {
          "id" : 234571
        }
      }
    ]
  }
}

确认 “型的差异对搜索结果的影响”

获取映射时，“id”的类型是长整型。

此外，当搜索所有的“log”索引时，可以发现还添加了类型不匹配的“id”为“1392.223”的文档。

指令 (Zhi Ling)

GET /log/_mapping

GET /log/_search

执行结果

# GET /log/_mapping
{
  "log" : {
    "mappings" : {
      "properties" : {
        "id" : {
          "type" : "long"
        }
      }
    }
  }
}

# GET /log/_search
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "log",
        "_type" : "_doc",
        "_id" : "r2axVmoBFWFSqRl86Y1N",
        "_score" : 1.0,
        "_source" : {
          "id" : 234571
        }
      },
      {
        "_index" : "log",
        "_type" : "_doc",
        "_id" : "sGayVmoBFWFSqRl8S414",
        "_score" : 1.0,
        "_source" : {
          "id" : 1392.223
        }
      }
    ]
  }
}

分析

您可以查看Elasticsearch如何分析字符串。

为了准备下一步操作，我们需要先删除索引，然后创建测试数据的文档。

命令

DELETE /library

POST /library/_bulk
{"index": {"_id": 1}}
{"title": "The quick brown fox", "price": 5}
{"index": {"_id": 2}}
{"title": "The quick brown fox jumps over the lazy dog", "price": 15}
{"index": {"_id": 3}}
{"title": "The quick brown fox jumps over the quick dog", "price": 8}
{"index": {"_id": 4}}
{"title": "Brown fox and brown dog", "price": 2}
{"index": {"_id": 5}}
{"title": "Lazy dog", "price": 9}

显示文本字符串的分析结果

展示”棕色的狐狸棕色的狗”的分析结果。

能够确定这四个单词被分解为”棕色”、”狐狸”、”棕色”、”狗”。

指令

GET /library/_analyze
{
  "tokenizer": "standard",
  "text": "Brown fox brown dog"
}

运行结果

{
  "tokens" : [
    {
      "token" : "Brown",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "fox",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "dog",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

请使用指定的过滤器来显示字符串的分析结果。

如果在”filter”查询中指定”lowercase”，则会将字符串转换为小写形式进行分析。

可以看出，最初的”Brown”被分析为”brown”。

指令 (Zhi Ling)

GET /library/_analyze
{
  "tokenizer": "standard",
  "filter": ["lowercase"],
  "text": "Brown fox brown dog"
}

执行结果 (shí jié guǒ)

{
  "tokens" : [
    {
      "token" : "brown",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "fox",
      "start_offset" : 6,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "dog",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

指定多个过滤器以显示字符串分析结果。

如果在”filter”查询中指定”lowercase”和”unique”，将会将字符串转换为小写，并且删除重复的单词进行分析。

我们可以看到”Brown”被转换成小写，并且重复的第二个”brown”被删除了。

命令

GET /library/_analyze
{
  "tokenizer": "standard",
  "filter": ["lowercase","unique"],
  "text": "Brown brown brown fox brown dog"
}

执行结果

{
  "tokens" : [
    {
      "token" : "brown",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "fox",
      "start_offset" : 18,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "dog",
      "start_offset" : 28,
      "end_offset" : 31,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

显示基于分词器的不同分析结果

当指定标准为 “standard” 和 “letter” 时，将比较分析结果。

可以发现，“quick.brown_fox”部分的分析结果是不同的。

请执行指令

GET /library/_analyze
{
  "tokenizer": "standard",
  "filter": ["lowercase"],
  "text": "THE quick.brown_FOx Jumped! $19.95 @ 3.0"
}

GET /library/_analyze
{
  "tokenizer": "letter",
  "filter": ["lowercase"],
  "text": "THE quick.brown_FOx Jumped! $19.95 @ 3.0"
}

执行结果

# GET /library/_analyze
{
  "tokens" : [
    {
      "token" : "the",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "quick.brown_fox",
      "start_offset" : 4,
      "end_offset" : 19,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "jumped",
      "start_offset" : 20,
      "end_offset" : 26,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "19.95",
      "start_offset" : 29,
      "end_offset" : 34,
      "type" : "<NUM>",
      "position" : 3
    },
    {
      "token" : "3.0",
      "start_offset" : 37,
      "end_offset" : 40,
      "type" : "<NUM>",
      "position" : 4
    }
  ]
}

# GET /library/_analyze
{
  "tokens" : [
    {
      "token" : "the",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "quick",
      "start_offset" : 4,
      "end_offset" : 9,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "brown",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "fox",
      "start_offset" : 16,
      "end_offset" : 19,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "jumped",
      "start_offset" : 20,
      "end_offset" : 26,
      "type" : "word",
      "position" : 4
    }
  ]
}

展示包含日语的文件分析结果。

由于Elasticsearch默认没有安装支持日文的分词器，因此需要预先安装kuromoji插件。

命令

GET /library/_analyze
{
  "tokenizer": "kuromoji_tokenizer",
  "text": "記者が汽車で帰社した"
}

执行结果

{
  "tokens" : [
    {
      "token" : "記者",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "が",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "word",
      "position" : 1
    },
    {
      "token" : "汽車",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "word",
      "position" : 2
    },
    {
      "token" : "で",
      "start_offset" : 5,
      "end_offset" : 6,
      "type" : "word",
      "position" : 3
    },
    {
      "token" : "帰社",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "word",
      "position" : 4
    },
    {
      "token" : "し",
      "start_offset" : 8,
      "end_offset" : 9,
      "type" : "word",
      "position" : 5
    },
    {
      "token" : "た",
      "start_offset" : 9,
      "end_offset" : 10,
      "type" : "word",
      "position" : 6
    }
  ]
}

Elasticsearch 是什么？ (Elasticsearch is what?)

弹性堆栈是什么？

操作环境

安装

在开始教程之前

CRUD操作（RESTful API）

搜索文档

映射 shè)

分析

Elasticsearch 是什么？
(Elasticsearch is what?)