使用elasticsearch-loader工具将CSV文件导入ElasticSearch
概述
以下是使用elasticsearch-loader将CSV文件上传到ElasticSearch的步骤。相对于Logstash,这种方法更加方便快捷。
elasticsearch-loader 是什么?
用于将数据文件(json、parquet、csv、tsv)批量加载到ElasticSearch的Python工具。
GitHub –> GitHub 仓库 (GitHub Repository).
支援环境
python/es5.6.166.8.07.1.12.7VVV3.7VVV
安装
$ sudo pip install elasticsearch-loader
如何使用
使用这种CSV文件
$ cat test.csv
id,name,age,address
01,taro,12,tokyo
02,hanako,13,kyoto
03,ichiro,16,osaka
执行以下命令,将CSV文件注册到ElasticSearch中。
$ elasticsearch_loader --es-host <host:port> --index <IndexName> --type <TypeName> csv <FileName>
$ elasticsearch_loader --es-host 192.168.1.1:9200 --index student --type type csv test.csv
{'index': u'student', 'bulk_size': 500, 'http_auth': None, 'es_conn': <Elasticsearch([{u'host': u'192.168.1.1', u'port': 9200}])>, 'encoding': u'utf-8', 'keys': [], 'use_ssl': False, 'update': False, 'id_field': None, 'as_child': False, 'index_settings_file': None, 'timeout': 10.0, 'progress': False, 'ca_certs': None, 'with_retry': False, 'verify_certs': False, 'type': u'type', 'es_host': (u'192.168.1.1:9200',), 'delete': False}
[####################################]
成果
如果指定的索引和类型不存在,系统会自动创建并成功注册。
$ curl -H "Content-Type: application/json" -XGET 'http://192.168.1.1:9200/student/type/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "student",
"_type" : "type",
"_id" : "gECMZXABW66WYIIZTexw",
"_score" : 1.0,
"_source" : {
"age" : "12",
"address" : "tokyo",
"id" : "01",
"name" : "taro"
}
},
{
"_index" : "student",
"_type" : "type",
"_id" : "gkCMZXABW66WYIIZTexw",
"_score" : 1.0,
"_source" : {
"age" : "16",
"address" : "osaka",
"id" : "03",
"name" : "ichiro"
}
},
{
"_index" : "student",
"_type" : "type",
"_id" : "gUCMZXABW66WYIIZTexw",
"_score" : 1.0,
"_source" : {
"age" : "13",
"address" : "kyoto",
"id" : "02",
"name" : "hanako"
}
}
]
}
}
帮忙
$ elasticsearch_loader -h
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...
Options:
-c, --config-file TEXT Load default configuration file from esl.yml
--bulk-size INTEGER How many docs to collect before writing to
Elasticsearch (default 500)
--es-host TEXT Elasticsearch cluster entry point. (default
http://localhost:9200)
--verify-certs Make sure we verify SSL certificates
(default false)
--use-ssl Turn on SSL (default false)
--ca-certs TEXT Provide a path to CA certs on disk
--http-auth TEXT Provide username and password for basic auth
in the format of username:password
--index TEXT Destination index name [required]
--delete Delete index before import? (default false)
--update Merge and update existing doc instead of
overwrite
--progress Enable progress bar - NOTICE: in order to
show progress the entire input should be
collected and can consume more memory than
without progress bar
--type TEXT Docs type. TYPES WILL BE DEPRECATED IN APIS
IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
IN 8. [required]
--id-field TEXT Specify field name that be used as document
id
--as-child Insert _parent, _routing field, the value is
same as _id. Note: must specify --id-field
explicitly
--with-retry Retry if ES bulk insertion failed
--index-settings-file FILENAME Specify path to json file containing index
mapping and settings, creates index if
missing
--timeout FLOAT Specify request timeout in seconds for
Elasticsearch client
--encoding TEXT Specify content encoding for input files
--keys TEXT Comma separated keys to pick from each
document
-h, --help Show this message and exit.
Commands:
csv
json FILES with the format of [{"a": "1"}, {"b": "2"}]
parquet
以上所述