{"id":40603,"date":"2023-03-03T12:43:14","date_gmt":"2024-02-22T03:09:34","guid":{"rendered":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/"},"modified":"2024-04-29T17:13:46","modified_gmt":"2024-04-29T09:13:46","slug":"pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c","status":"publish","type":"post","link":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/","title":{"rendered":"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c"},"content":{"rendered":"<h2>\u9996\u5148<\/h2>\n<ul class=\"post-ul\">pyspark \u304b\u3089 Elasticsearch\u3092\u89e6\u308a\u305f\u3044<\/ul>\n<h2>\u73af\u5883\uff08\uff09<\/h2>\n<ul class=\"post-ul\">\n<li style=\"list-style-type: none;\">\n<ul class=\"post-ul\">Elasticsearch 2.1.0<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<ul class=\"post-ul\">Spark 1.5.2<\/ul>\n<h2>\u5b89\u88c5Spark<\/h2>\n<p>\u6211\u4f1a\u8df3\u8fc7\u3002\u4eca\u5929\u53d1\u5e03\u4e86Spark 1.6\u7248\u672c\uff0c\u4f46\u662f\u6211\u4f1a\u9009\u62e91.5.2\u7248\u672c\u3002<\/p>\n<h2>\u4e0b\u8f7d Elasticsearch+hadoop\u3002<\/h2>\n<ul class=\"post-ul\">2016\/1\/6\u6642\u70b9\u3067\u306f\u3001Elasticsearch 2.1.0\u3067\u306f\u3001elasticsearch-hadoop-2.2.0-beta1\u304c\u5fc5\u8981\u3067\u3059\u3002<\/ul>\n<p>\u53ea\u9700\u4ece\u5b98\u65b9\u7f51\u9875\u4e0b\u8f7d\u5e76\u89e3\u538b\u7f29\u5373\u53ef\u3002<\/p>\n<pre class=\"post-pre\"><code>$ wget http:\/\/download.elastic.co\/hadoop\/elasticsearch-hadoop-2.2.0-beta1.zip\r\n$ unzip elasticsearch-hadoop-2.2.0-beta1.zip\r\n<\/code><\/pre>\n<h2>\u4f7f\u7528pyspark + elasticsearch\u542f\u52a8<\/h2>\n<pre class=\"post-pre\"><code>\/usr\/local\/share\/spark\/bin\/pyspark --master local[4] --driver-class-path=elasticsearch-hadoop-2.2.0-beta1\/dist\/elasticsearch-spark_2.11-2.2.0-beta1.jar\r\n<\/code><\/pre>\n<h2>\u751f\u6210RDD<\/h2>\n<pre class=\"post-pre\"><code>&gt;&gt;&gt; conf = {\"es.nodes\" : \"XXX.XXX.XXX.XXX:[port]\", \"es.resource\" : \"[index name]\/[type]\"}\r\n&gt;&gt;&gt; rdd = sc.newAPIHadoopRDD(\"org.elasticsearch.hadoop.mr.EsInputFormat\",\"org.apache.hadoop.io.NullWritable\", \"org.elasticsearch.hadoop.mr.LinkedMapWritable\", conf=conf)\r\n<\/code><\/pre>\n<h2>\u57fa\u672c\u64cd\u4f5c &#8211; \u57fa\u7840\u64cd\u4f5c<\/h2>\n<pre class=\"post-pre\"><code>&gt;&gt;&gt; rdd.first()\r\n&gt;&gt;&gt; rdd.count()\r\n&gt;&gt;&gt; rdd.filter(lambda s: 'aaa' in s).count()\r\n<\/code><\/pre>\n<h2>\u5730\u56fe\/\u5f52\u7eb3<\/h2>\n<pre class=\"post-pre\"><code># name\u5225\u306b\u3044\u304f\u3064\u306e\u30ec\u30b3\u30fc\u30c9\u304c\u3042\u308b\u304b\u30ab\u30a6\u30f3\u30c8\r\ncounts = rdd.map(lambda item: item[1][\"name\"])\r\ncounts = counts.map(lambda ip: (ip, 1))\r\ncounts = counts.reduceByKey(lambda a, b: a+b)\r\n\r\n# \u5b9f\u884c\r\n&gt;&gt;&gt; counts.collect()\r\n<\/code><\/pre>\n<h2>\u4fdd\u5b58\u5230ES<\/h2>\n<pre class=\"post-pre\"><code>rdd.saveToEs('test\/docs')\r\n<\/code><\/pre>\n<h2>\u8ff7\u604b\u4e0a\u8fd9\u4e2a\u5730\u65b9<\/h2>\n<ul class=\"post-ul\">\n<li style=\"list-style-type: none;\">\n<ul class=\"post-ul\">Elasticsearch\u5074\u3067\u306e\u3001Network\u8a2d\u5b9a\u306f\u6ce8\u610f\u3002network.publish_host\u304c\u6b63\u3057\u304f\u306a\u3044\u3068\u3001\u63a5\u7d9a\u304c\u62d2\u5426\u3055\u308c\u307e\u3057\u305f\u7cfb\u306e\u30a8\u30e9\u30fc\u304c\u51fa\u3066\u306f\u307e\u3063\u305f\u3002<\/ul>\n<\/li>\n<\/ul>\n<p>Remote access about Spark and Elasticsearch\u304c\u53c2\u8003\u306b\u306a\u3063\u305f\u3002<\/p>\n<pre class=\"post-pre\"><code>&lt;snip&gt;\r\nFile \"\/usr\/local\/share\/spark\/python\/lib\/py4j-0.8.2.1-src.zip\/py4j\/protocol.py\", line 300, in get_return_value\r\npy4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.\r\n: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [GET] on [_nodes\/http] failed; server[hostname\/XXX.XXX.XXX.XXX:Ports] returned [400|Bad Request:]\r\n&lt;snip&gt;\r\n<\/code><\/pre>\n<h2>\u8bf7\u4ee5\u4e2d\u6587\u4e3a\u6bcd\u8bed\u8fdb\u884c\u6539\u5199\uff0c\u8bf7\u53ea\u63d0\u4f9b\u4e00\u79cd\u9009\u62e9\uff1a<\/h2>\n<p>\u5f15\u7528<\/p>\n<h3>\u95ea\u8000<\/h3>\n<ul class=\"post-ul\">\n<li style=\"list-style-type: none;\">\n<ul class=\"post-ul\">\u697d\u3057\u3044\u53ef\u8996\u5316 \uff1a elasticsearch\u3068Spark Streaming\u306e\u51fa\u4f1a\u3044<\/ul>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<ul class=\"post-ul\">Elasticsearch in Apache Spark with Python\u2014Machine Learning Series, Part 2<\/ul>\n<h3>Elasticsearch \u5f39\u6027\u641c\u7d22<\/h3>\n<ul class=\"post-ul\">\n<li style=\"list-style-type: none;\">\n<ul class=\"post-ul\">Apache Spark support<\/ul>\n<\/li>\n<\/ul>\n<p>Remote access about Spark and Elasticsearch<\/p>\n<p>AWS\u306eDiscovery\u306e\u554f\u984c\u3002network.publish_host \u306e\u8a2d\u5b9a\u304c\u91cd\u8981\u3002<\/p>\n<p>elasticsearch.yml\u306e\u8a2d\u5b9a<br \/>\nElasticsearch Unplugged &#8211; 2.0\u306b\u304a\u3051\u308b\u30cd\u30c3\u30c8\u30ef\u30fc\u30af\u306e\u5909\u66f4(\u65e5\u672c\u8a9e\u8a33)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u9996\u5148 pyspark \u304b\u3089 Elasticsearch\u3092\u89e6\u308a\u305f\u3044 \u73af\u5883\uff08\uff09 Elasticsearch 2.1 [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-40603","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v21.5 (Yoast SEO v21.5) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c - Blog - Silicon Cloud<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-\u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c\" \/>\n<meta property=\"og:description\" content=\"\u9996\u5148 pyspark \u304b\u3089 Elasticsearch\u3092\u89e6\u308a\u305f\u3044 \u73af\u5883\uff08\uff09 Elasticsearch 2.1 [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-\u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c\/\" \/>\n<meta property=\"og:site_name\" content=\"Blog - Silicon Cloud\" \/>\n<meta property=\"article:published_time\" content=\"2024-02-22T03:09:34+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-04-29T09:13:46+00:00\" \/>\n<meta name=\"author\" content=\"\u79d1, \u9896\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"\u79d1, \u9896\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/\",\"url\":\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/\",\"name\":\"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c - Blog - Silicon Cloud\",\"isPartOf\":{\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#website\"},\"datePublished\":\"2024-02-22T03:09:34+00:00\",\"dateModified\":\"2024-04-29T09:13:46+00:00\",\"author\":{\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/8ca01ba7f7362ad4edb7da206a12f29e\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.silicloud.com\/zh\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#website\",\"url\":\"https:\/\/www.silicloud.com\/zh\/blog\/\",\"name\":\"Blog - Silicon Cloud\",\"description\":\"\",\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/8ca01ba7f7362ad4edb7da206a12f29e\",\"name\":\"\u79d1, \u9896\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8a6fb3cc7ba2f69d2189ba532aec4633ea7ed75ac0af162ec367cb3abc0fb2af?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8a6fb3cc7ba2f69d2189ba532aec4633ea7ed75ac0af162ec367cb3abc0fb2af?s=96&d=mm&r=g\",\"caption\":\"\u79d1, \u9896\"},\"url\":\"https:\/\/www.silicloud.com\/zh\/blog\/author\/keying\/\"},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/#local-main-organization-logo\",\"url\":\"\",\"contentUrl\":\"\",\"caption\":\"Blog - Silicon Cloud\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c - Blog - Silicon Cloud","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-\u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c\/","og_locale":"zh_CN","og_type":"article","og_title":"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c","og_description":"\u9996\u5148 pyspark \u304b\u3089 Elasticsearch\u3092\u89e6\u308a\u305f\u3044 \u73af\u5883\uff08\uff09 Elasticsearch 2.1 [&hellip;]","og_url":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-\u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c\/","og_site_name":"Blog - Silicon Cloud","article_published_time":"2024-02-22T03:09:34+00:00","article_modified_time":"2024-04-29T09:13:46+00:00","author":"\u79d1, \u9896","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"\u79d1, \u9896","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"1 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/","url":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/","name":"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c - Blog - Silicon Cloud","isPartOf":{"@id":"https:\/\/www.silicloud.com\/zh\/blog\/#website"},"datePublished":"2024-02-22T03:09:34+00:00","dateModified":"2024-04-29T09:13:46+00:00","author":{"@id":"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/8ca01ba7f7362ad4edb7da206a12f29e"},"breadcrumb":{"@id":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.silicloud.com\/zh\/blog\/"},{"@type":"ListItem","position":2,"name":"PySpark 1.5.2 + Elasticsearch 2.1.0 \u7684\u5f15\u5165\u6b65\u9aa4\u548c\u6267\u884c"}]},{"@type":"WebSite","@id":"https:\/\/www.silicloud.com\/zh\/blog\/#website","url":"https:\/\/www.silicloud.com\/zh\/blog\/","name":"Blog - Silicon Cloud","description":"","inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/8ca01ba7f7362ad4edb7da206a12f29e","name":"\u79d1, \u9896","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/8a6fb3cc7ba2f69d2189ba532aec4633ea7ed75ac0af162ec367cb3abc0fb2af?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8a6fb3cc7ba2f69d2189ba532aec4633ea7ed75ac0af162ec367cb3abc0fb2af?s=96&d=mm&r=g","caption":"\u79d1, \u9896"},"url":"https:\/\/www.silicloud.com\/zh\/blog\/author\/keying\/"},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.silicloud.com\/zh\/blog\/pyspark-1-5-2-elasticsearch-2-1-0-%e7%9a%84%e5%bc%95%e5%85%a5%e6%ad%a5%e9%aa%a4%e5%92%8c%e6%89%a7%e8%a1%8c\/#local-main-organization-logo","url":"","contentUrl":"","caption":"Blog - Silicon Cloud"}]}},"_links":{"self":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts\/40603","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/comments?post=40603"}],"version-history":[{"count":2,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts\/40603\/revisions"}],"predecessor-version":[{"id":86568,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts\/40603\/revisions\/86568"}],"wp:attachment":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/media?parent=40603"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/categories?post=40603"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/tags?post=40603"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}