{"id":41100,"date":"2023-12-23T04:18:14","date_gmt":"2023-12-07T10:49:31","guid":{"rendered":"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/"},"modified":"2024-05-04T14:56:45","modified_gmt":"2024-05-04T06:56:45","slug":"python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8","status":"publish","type":"post","link":"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/","title":{"rendered":"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b"},"content":{"rendered":"<h1>\u5c1d\u8bd5\u4f7f\u7528\u65e0\u670d\u52a1\u5668\u548c\u4e8b\u4ef6\u9a71\u52a8\u7684\u65b9\u5f0f\u521b\u5efa\u7f51\u9875\u722c\u866b\u3002<\/h1>\n<p>\u56e0\u4e3a\u6211\u6b63\u5728\u5b66\u4e60ElasticSearch\uff0c\u6240\u4ee5\u6211\u60f3\u901a\u8fc7\u4f7f\u7528ES\u6765\u505a\u4e9b\u4ec0\u4e48\u3002\u4e8e\u662f\uff0c\u6211\u5c1d\u8bd5\u7f16\u5199\u4e86\u4e00\u4e2a\u57fa\u4e8eKinesis+Lambda\u7684\u4e8b\u4ef6\u9a71\u52a8\u7684\u7f51\u7edc\u722c\u866b\u3002<\/p>\n<ul class=\"post-ul\">\n<li style=\"list-style-type: none;\">\n<ul class=\"post-ul\">\u5b9f\u884c\u74b0\u5883<\/ul>\n<\/li>\n<\/ul>\n<p>CentOS7<br \/>\npython 2.7<\/p>\n<h2>\u5de5\u4f5c\u6d41\u7a0b<\/h2>\n<div><img decoding=\"async\" class=\"post-images\" title=\"\" src=\"https:\/\/cdn.silicloud.com\/blog-img\/blog\/img\/657d44f637434c4406ca1198\/4-0.png\" alt=\"serverless-crawler.png\" \/><\/div>\n<p>\u5927\u81f4\u7684\u6d41\u7a0b\u5982\u4e0b\uff1a<br \/>\n1. \u4f7f\u7528Scrapy\uff08ScrapingHub\u6216AWS Lambda\uff09\u63d0\u53d6URL\uff0c\u5e76\u5c06\u5176\u653e\u5165Kinesis\u6d41\u4e2d\u3002<br \/>\n2. \u4eceKinesis\u6d41\u4e2d\u89e6\u53d1AWS Lambda\u3002<br \/>\n3. Lambda\u51fd\u6570\u901a\u8fc7URL\u8fdb\u884c\u722c\u53d6\uff0c\u5e76\u5c06\u6570\u636e\u4f20\u8f93\u81f3ElasticSearch Service\u3002<\/p>\n<h2>\u521b\u5efaIAM\u7528\u6237<\/h2>\n<p>\u5728\u4f7f\u7528Kinesis\u548cElasticSearch\u65f6\uff0c\u9700\u8981\u76f8\u5e94\u7684\u6743\u9650\u3002\u6240\u4ee5\u9700\u8981\u51c6\u5907\u597d\u6bcf\u4e2a\u670d\u52a1\u7684\u8bbf\u95ee\u5bc6\u94a5ID\u548c\u79d8\u5bc6\u8bbf\u95ee\u5bc6\u94a5\u3002<\/p>\n<p>\u6b64\u5916\uff0c\u9700\u8981\u7262\u8bb0\u7528\u6237\u7684ARN (arn:aws:iam::**********:user\/*********)\u3002<\/p>\n<h2>\u4f7f\u7528AWS Kinesis\u521b\u5efa\u4e00\u4e2a\u6570\u636e\u6d41\u3002<\/h2>\n<div><img decoding=\"async\" class=\"post-images\" title=\"\" src=\"https:\/\/cdn.silicloud.com\/blog-img\/blog\/img\/657d44f637434c4406ca1198\/10-0.png\" alt=\"\u30b9\u30af\u30ea\u30fc\u30f3\u30b7\u30e7\u30c3\u30c8 2017-04-06 12.45.52.png\" \/><\/div>\n<h2>\u521b\u5efa AWS ElasticSearch Service<\/h2>\n<p>\u63a5\u4e0b\u6765\u6211\u4eec\u5c06\u901a\u8fc7Amazon ElasticSearch Service\u521b\u5efa\u4e00\u4e2aES\u5b9e\u4f8b\u3002<\/p>\n<h3>\u5728AWS\u4e0a\u7684\u64cd\u4f5c<\/h3>\n<div><img decoding=\"async\" class=\"post-images\" title=\"\" src=\"https:\/\/cdn.silicloud.com\/blog-img\/blog\/img\/657d44f637434c4406ca1198\/14-4.png\" alt=\"\u30b9\u30af\u30ea\u30fc\u30f3\u30b7\u30e7\u30c3\u30c8 2017-04-06 13.30.18.png\" \/><\/div>\n<h3>\u4f7f\u7528ElasticSearch\u8fdb\u884c\u7d22\u5f15\u521b\u5efa\u548c\u6620\u5c04\u3002<\/h3>\n<p>\u521b\u5efa\u4e00\u4e2a\u7528\u4e8e\u4fdd\u5b58URL\u3001\u6807\u9898\u548c\u6587\u7ae0\u5185\u5bb9\u7684\u6620\u5c04\u6570\u636e\uff0c\u7528\u4e8e\u6587\u7ae0\u4fdd\u5b58\u3002<br \/>\nmapping.json<br \/>\n{<br \/>\n&#8220;mappings&#8221;: {<br \/>\n&#8220;article&#8221;: {<br \/>\n&#8220;properties&#8221; : {<br \/>\n&#8220;url&#8221; : {<br \/>\n&#8220;type&#8221;: &#8220;string&#8221;,<br \/>\n&#8220;index&#8221; : &#8220;not_analyzed&#8221;<br \/>\n},<br \/>\n&#8220;title&#8221; : {<br \/>\n&#8220;type&#8221;: &#8220;string&#8221;,<br \/>\n&#8220;index&#8221; : &#8220;analyzed&#8221;<br \/>\n},<br \/>\n&#8220;contents&#8221; : {<br \/>\n&#8220;type&#8221;: &#8220;string&#8221;,<br \/>\n&#8220;index&#8221; : &#8220;analyzed&#8221;<br \/>\n}<br \/>\n}<br \/>\n}<br \/>\n}<br \/>\n}<\/p>\n<p>\u63a5\u4e0b\u6765\uff0c\u5c06\u751f\u6210\u4e00\u4e2a\u811a\u672c\u6765\u521b\u5efa\u4e0a\u8ff0\u6620\u5c04\u6570\u636e\u548c\u7d22\u5f15\u3002<\/p>\n<p>\u4e8b\u5148\u5728\u672c\u5730\u5b89\u88c5\u4ee5\u4e0b\u5305\uff1a<br \/>\n$ pip \u5b89\u88c5 requests_aws4auth elasticsearch<\/p>\n<pre class=\"post-pre\"><code><span class=\"c1\"># -*- coding: utf-8 -*-\r\n<\/span><span class=\"kn\">import<\/span> <span class=\"nn\">elasticsearch<\/span>\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">requests_aws4auth<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">AWS4Auth<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">json<\/span>\r\n\r\n<span class=\"k\">if<\/span> <span class=\"n\">__name__<\/span> <span class=\"o\">==<\/span> <span class=\"s\">'__main__'<\/span><span class=\"p\">:<\/span>\r\n    <span class=\"c1\"># ES\u306e\u30a8\u30f3\u30c9\u30dd\u30a4\u30f3\u30c8\u3092\u6307\u5b9a\r\n<\/span>    <span class=\"n\">host<\/span><span class=\"o\">=<\/span><span class=\"s\">'search-***************.ap-northeast-1.es.amazonaws.com'<\/span>\r\n    <span class=\"n\">awsauth<\/span> <span class=\"o\">=<\/span> <span class=\"n\">AWS4Auth<\/span><span class=\"p\">(<\/span>\r\n            <span class=\"c1\"># AWS\u30e6\u30fc\u30b6\u30fc\u306e\u30a2\u30af\u30bb\u30b9\u30ad\u30fcID\u3068\u30b7\u30fc\u30af\u30ec\u30c3\u30c8\u30a2\u30af\u30bb\u30b9\u30ad\u30fc\r\n<\/span>            <span class=\"s\">'ACCESS_KRY_ID'<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"s\">'SECRET_ACCESS_KEY'<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"s\">'ap-northeast-1'<\/span><span class=\"p\">,<\/span> <span class=\"s\">'es'<\/span><span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">es<\/span> <span class=\"o\">=<\/span> <span class=\"n\">elasticsearch<\/span><span class=\"p\">.<\/span><span class=\"n\">Elasticsearch<\/span><span class=\"p\">(<\/span>\r\n            <span class=\"n\">hosts<\/span><span class=\"o\">=<\/span><span class=\"p\">[{<\/span><span class=\"s\">'host'<\/span><span class=\"p\">:<\/span> <span class=\"n\">host<\/span><span class=\"p\">,<\/span> <span class=\"s\">'port'<\/span><span class=\"p\">:<\/span> <span class=\"mi\">443<\/span><span class=\"p\">}],<\/span>\r\n            <span class=\"n\">http_auth<\/span><span class=\"o\">=<\/span><span class=\"n\">awsauth<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"n\">use_ssl<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"n\">verify_certs<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"n\">connection_class<\/span><span class=\"o\">=<\/span><span class=\"n\">elasticsearch<\/span><span class=\"p\">.<\/span><span class=\"n\">connection<\/span><span class=\"p\">.<\/span><span class=\"n\">RequestsHttpConnection<\/span>\r\n            <span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">f<\/span> <span class=\"o\">=<\/span> <span class=\"nb\">open<\/span><span class=\"p\">(<\/span><span class=\"s\">'mapping.json'<\/span><span class=\"p\">,<\/span> <span class=\"s\">'r'<\/span><span class=\"p\">)<\/span>\r\n    <span class=\"n\">mapping<\/span> <span class=\"o\">=<\/span> <span class=\"n\">json<\/span><span class=\"p\">.<\/span><span class=\"n\">load<\/span><span class=\"p\">(<\/span><span class=\"n\">f<\/span><span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">es<\/span><span class=\"p\">.<\/span><span class=\"n\">indices<\/span><span class=\"p\">.<\/span><span class=\"n\">create<\/span><span class=\"p\">(<\/span><span class=\"n\">index<\/span><span class=\"o\">=<\/span><span class=\"s\">'website'<\/span><span class=\"p\">)<\/span>                    \r\n    <span class=\"n\">es<\/span><span class=\"p\">.<\/span><span class=\"n\">indices<\/span><span class=\"p\">.<\/span><span class=\"n\">put_mapping<\/span><span class=\"p\">(<\/span><span class=\"n\">index<\/span><span class=\"o\">=<\/span><span class=\"s\">'website'<\/span><span class=\"p\">,<\/span> <span class=\"n\">doc_type<\/span><span class=\"o\">=<\/span><span class=\"s\">'article'<\/span><span class=\"p\">,<\/span> <span class=\"n\">body<\/span><span class=\"o\">=<\/span><span class=\"n\">mapping<\/span><span class=\"p\">[<\/span><span class=\"s\">'mappings'<\/span><span class=\"p\">])<\/span>\r\n<\/code><\/pre>\n<div><img decoding=\"async\" class=\"post-images\" title=\"\" src=\"https:\/\/cdn.silicloud.com\/blog-img\/blog\/img\/657d44f637434c4406ca1198\/19-0.png\" alt=\"\u30b9\u30af\u30ea\u30fc\u30f3\u30b7\u30e7\u30c3\u30c8 2017-04-06 15.32.52.png\" \/><\/div>\n<h2>\u521b\u5efa AWS Lambda<\/h2>\n<p>\u6211\u521b\u5efa\u4e86Elasticsearch\uff0c\u63a5\u4e0b\u6765\u6211\u5c06\u521b\u5efaLambda\u51fd\u6570\u3002<\/p>\n<h3>\u521b\u5efaLambda\u51fd\u6570<\/h3>\n<p>\u5728\u672c\u5730\u521b\u5efa\u4e00\u4e2aLambda\u51fd\u6570\u3002<br \/>\n$ mkdir web_crawler<br \/>\n$ cd web_crawler<br \/>\n$ vim lambda_function.py<\/p>\n<p>\u5728\u672c\u5730\u521b\u5efa\u4e00\u4e2aLambda\u51fd\u6570\u3002<br \/>\n$ \u521b\u5efa\u4e00\u4e2a\u540d\u4e3aweb_crawler\u7684\u6587\u4ef6\u5939<br \/>\n$ \u8fdb\u5165web_crawler\u6587\u4ef6\u5939<br \/>\n$ \u4f7f\u7528vim\u7f16\u8f91\u5668\u521b\u5efalambda_function.py\u6587\u4ef6<\/p>\n<pre class=\"post-pre\"><code>\r\n<span class=\"c1\"># -*- coding: utf-8 -*-                    \r\n<\/span><span class=\"kn\">import<\/span> <span class=\"nn\">os<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">base64<\/span>\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">readability<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">Document<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">html2text<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">requests<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">elasticsearch<\/span>\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">elasticsearch<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">helpers<\/span>\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">requests_aws4auth<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">AWS4Auth<\/span>\r\n\r\n<span class=\"k\">def<\/span> <span class=\"nf\">lambda_handler<\/span><span class=\"p\">(<\/span><span class=\"n\">event<\/span><span class=\"p\">,<\/span> <span class=\"n\">context<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"n\">host<\/span> <span class=\"o\">=<\/span> <span class=\"n\">os<\/span><span class=\"p\">.<\/span><span class=\"n\">environ<\/span><span class=\"p\">[<\/span><span class=\"s\">'ES_HOST'<\/span><span class=\"p\">]<\/span>\r\n    <span class=\"c1\"># ElasticSearch Service\u3078\u306e\u8a8d\u8a3c\u306bIAM Role\u3092\u5229\u7528\u3059\u308b\r\n<\/span>    <span class=\"n\">awsauth<\/span> <span class=\"o\">=<\/span> <span class=\"n\">AWS4Auth<\/span><span class=\"p\">(<\/span>\r\n            <span class=\"n\">os<\/span><span class=\"p\">.<\/span><span class=\"n\">environ<\/span><span class=\"p\">[<\/span><span class=\"s\">'ACCESS_ID'<\/span><span class=\"p\">],<\/span>\r\n            <span class=\"n\">os<\/span><span class=\"p\">.<\/span><span class=\"n\">environ<\/span><span class=\"p\">[<\/span><span class=\"s\">'SECRET_KEY'<\/span><span class=\"p\">],<\/span> <span class=\"s\">'ap-northeast-1'<\/span><span class=\"p\">,<\/span> <span class=\"s\">'es'<\/span><span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">es<\/span> <span class=\"o\">=<\/span> <span class=\"n\">elasticsearch<\/span><span class=\"p\">.<\/span><span class=\"n\">Elasticsearch<\/span><span class=\"p\">(<\/span>\r\n            <span class=\"n\">hosts<\/span><span class=\"o\">=<\/span><span class=\"p\">[{<\/span><span class=\"s\">'host'<\/span><span class=\"p\">:<\/span> <span class=\"n\">host<\/span><span class=\"p\">,<\/span> <span class=\"s\">'port'<\/span><span class=\"p\">:<\/span> <span class=\"mi\">443<\/span><span class=\"p\">}],<\/span>\r\n            <span class=\"n\">http_auth<\/span><span class=\"o\">=<\/span><span class=\"n\">awsauth<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"n\">use_ssl<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"n\">verify_certs<\/span><span class=\"o\">=<\/span><span class=\"bp\">True<\/span><span class=\"p\">,<\/span>\r\n            <span class=\"n\">connection_class<\/span><span class=\"o\">=<\/span><span class=\"n\">elasticsearch<\/span><span class=\"p\">.<\/span><span class=\"n\">connection<\/span><span class=\"p\">.<\/span><span class=\"n\">RequestsHttpConnection<\/span>\r\n    <span class=\"p\">)<\/span>\r\n\r\n    <span class=\"n\">articles<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[]<\/span>\r\n\r\n    <span class=\"c1\"># Kinesis Stream\u304b\u3089\u30a4\u30d9\u30f3\u30c8\u3092\u53d6\u5f97\r\n<\/span>    <span class=\"k\">for<\/span> <span class=\"n\">record<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">event<\/span><span class=\"p\">[<\/span><span class=\"s\">'Records'<\/span><span class=\"p\">]:<\/span>\r\n        <span class=\"n\">payload<\/span> <span class=\"o\">=<\/span> <span class=\"n\">base64<\/span><span class=\"p\">.<\/span><span class=\"n\">b64decode<\/span><span class=\"p\">(<\/span><span class=\"n\">record<\/span><span class=\"p\">[<\/span><span class=\"s\">'kinesis'<\/span><span class=\"p\">][<\/span><span class=\"s\">'data'<\/span><span class=\"p\">])<\/span>\r\n        <span class=\"k\">try<\/span><span class=\"p\">:<\/span>\r\n            <span class=\"n\">response<\/span> <span class=\"o\">=<\/span> <span class=\"n\">requests<\/span><span class=\"p\">.<\/span><span class=\"n\">get<\/span><span class=\"p\">(<\/span><span class=\"n\">payload<\/span><span class=\"p\">)<\/span>\r\n            <span class=\"k\">if<\/span> <span class=\"n\">response<\/span><span class=\"p\">.<\/span><span class=\"n\">ok<\/span><span class=\"p\">:<\/span>\r\n                <span class=\"n\">article<\/span> <span class=\"o\">=<\/span> <span class=\"n\">Document<\/span><span class=\"p\">(<\/span><span class=\"n\">response<\/span><span class=\"p\">.<\/span><span class=\"n\">content<\/span><span class=\"p\">).<\/span><span class=\"n\">summary<\/span><span class=\"p\">()<\/span>\r\n                <span class=\"n\">titleText<\/span> <span class=\"o\">=<\/span> <span class=\"n\">html2text<\/span><span class=\"p\">.<\/span><span class=\"n\">html2text<\/span><span class=\"p\">(<\/span><span class=\"n\">Document<\/span><span class=\"p\">(<\/span><span class=\"n\">response<\/span><span class=\"p\">.<\/span><span class=\"n\">content<\/span><span class=\"p\">).<\/span><span class=\"n\">title<\/span><span class=\"p\">())<\/span>\r\n                <span class=\"n\">contentsText<\/span> <span class=\"o\">=<\/span> <span class=\"n\">html2text<\/span><span class=\"p\">.<\/span><span class=\"n\">html2text<\/span><span class=\"p\">(<\/span><span class=\"n\">article<\/span><span class=\"p\">)<\/span>\r\n                <span class=\"n\">res<\/span> <span class=\"o\">=<\/span> <span class=\"n\">es<\/span><span class=\"p\">.<\/span><span class=\"n\">search<\/span><span class=\"p\">(<\/span><span class=\"n\">index<\/span><span class=\"o\">=<\/span><span class=\"s\">\"website\"<\/span><span class=\"p\">,<\/span> <span class=\"n\">body<\/span><span class=\"o\">=<\/span><span class=\"p\">{<\/span><span class=\"s\">\"query\"<\/span><span class=\"p\">:<\/span> <span class=\"p\">{<\/span><span class=\"s\">\"match\"<\/span><span class=\"p\">:<\/span> <span class=\"p\">{<\/span><span class=\"s\">\"url\"<\/span><span class=\"p\">:<\/span> <span class=\"n\">payload<\/span><span class=\"p\">}}})<\/span>\r\n                <span class=\"c1\"># ES\u306bURL\u304c\u65e2\u306b\u767b\u9332\u3055\u308c\u3066\u3044\u308b\u304b\r\n<\/span>                <span class=\"k\">if<\/span> <span class=\"n\">res<\/span><span class=\"p\">[<\/span><span class=\"s\">'hits'<\/span><span class=\"p\">][<\/span><span class=\"s\">'total'<\/span><span class=\"p\">]<\/span> <span class=\"ow\">is<\/span> <span class=\"mi\">0<\/span><span class=\"p\">:<\/span>\r\n                    <span class=\"n\">doc<\/span> <span class=\"o\">=<\/span> <span class=\"p\">{<\/span>\r\n                        <span class=\"s\">'url'<\/span><span class=\"p\">:<\/span> <span class=\"n\">payload<\/span><span class=\"p\">,<\/span>\r\n                        <span class=\"s\">'title'<\/span><span class=\"p\">:<\/span> <span class=\"n\">titleText<\/span><span class=\"p\">.<\/span><span class=\"n\">encode<\/span><span class=\"p\">(<\/span><span class=\"s\">'utf-8'<\/span><span class=\"p\">),<\/span>\r\n                        <span class=\"s\">'contents'<\/span><span class=\"p\">:<\/span> <span class=\"n\">contentsText<\/span><span class=\"p\">.<\/span><span class=\"n\">encode<\/span><span class=\"p\">(<\/span><span class=\"s\">'utf-8'<\/span><span class=\"p\">)<\/span>\r\n                    <span class=\"p\">}<\/span>\r\n                    <span class=\"n\">articles<\/span><span class=\"p\">.<\/span><span class=\"n\">append<\/span><span class=\"p\">({<\/span><span class=\"s\">'_index'<\/span><span class=\"p\">:<\/span><span class=\"s\">'website'<\/span><span class=\"p\">,<\/span> <span class=\"s\">'_type'<\/span><span class=\"p\">:<\/span><span class=\"s\">'scraper'<\/span><span class=\"p\">,<\/span> <span class=\"s\">'_source'<\/span><span class=\"p\">:<\/span><span class=\"n\">doc<\/span><span class=\"p\">})<\/span>\r\n        <span class=\"k\">except<\/span> <span class=\"n\">requests<\/span><span class=\"p\">.<\/span><span class=\"n\">exceptions<\/span><span class=\"p\">.<\/span><span class=\"n\">HTTPError<\/span> <span class=\"k\">as<\/span> <span class=\"n\">err<\/span><span class=\"p\">:<\/span>\r\n            <span class=\"k\">print<\/span><span class=\"p\">(<\/span><span class=\"s\">\"HTTPError: \"<\/span> <span class=\"o\">+<\/span> <span class=\"n\">err<\/span><span class=\"p\">)<\/span>                                                                                                                                                       \r\n    <span class=\"c1\"># Bulk Insert\r\n<\/span>    <span class=\"n\">helpers<\/span><span class=\"p\">.<\/span><span class=\"n\">bulk<\/span><span class=\"p\">(<\/span><span class=\"n\">es<\/span><span class=\"p\">,<\/span> <span class=\"n\">articles<\/span><span class=\"p\">)<\/span>\r\n<\/code><\/pre>\n<p>\u521b\u5efaLambda\u51fd\u6570\u540e\uff0c\u9700\u8981\u5c06\u6240\u9700\u7684\u5e93\u5b89\u88c5\u5230\u540c\u4e00\u5c42\u3002<br \/>\n$ pip install readability-lxml html2text elasticsearch requests_aws4auth requests -t \/path\/to\/web_crawler<br \/>\n\u7136\u540e\u5c06\u5b83\u4eec\u6253\u5305\u6210zip\u6587\u4ef6\u3002<br \/>\n$ zip -r web_crawler.zip .<\/p>\n<h3>\u5c06Lambda\u51fd\u6570\u90e8\u7f72\u5230AWS<\/h3>\n<div><img decoding=\"async\" class=\"post-images\" title=\"\" src=\"https:\/\/cdn.silicloud.com\/blog-img\/blog\/img\/657d44f637434c4406ca1198\/27-1.png\" alt=\"\u30b9\u30af\u30ea\u30fc\u30f3\u30b7\u30e7\u30c3\u30c8 2017-04-06 16.24.30.png\" \/><\/div>\n<h2>\u4f7f\u7528Scrapy\u8fdb\u884cURL\u62bd\u53d6\uff0c\u5e76\u5c06\u5176\u53d1\u9001\u5230Kinesis\u6570\u636e\u6d41\u4e2d\u3002<\/h2>\n<p>\u4e0b\u4e00\u6b65\u662f\u6700\u540e\u9636\u6bb5\uff0c\u6211\u4eec\u5c06\u4f7f\u7528Scrapy\u4ece\u5217\u8868\u9875\u9762\u4e2d\u63d0\u53d6URL\uff0c\u5e76\u5c06\u6570\u636e\u53d1\u9001\u5230Kinesis\u6d41\u4e2d\u8bd5\u4e00\u8bd5\u3002<\/p>\n<p>\u4e00\u89bd\u9801\u9762\u4f7f\u7528\u4e86\u300c\u306f\u3066\u306a\u30d6\u30c3\u30af\u30de\u30fc\u30af\u300d\u7684\u71b1\u9580\u6587\u7ae0\u3002\u96d6\u7136\u4f7f\u7528Scrapy\u53ef\u4ee5\u66f4\u8f15\u9b06\u5730\u5f9eRSS\u4e2d\u7372\u53d6\u6578\u64da\uff0c\u4f46\u6211\u6545\u610f\u9078\u64c7\u4e86\u5f9e\u7db2\u9801\u4e0a\u9032\u884c\u722c\u87f2\u3002Scrapy\u662f\u4e00\u500b\u65b9\u4fbf\u4e14\u5f37\u5927\u7684\u6846\u67b6\uff0c\u7576\u4f60\u6709\u8208\u8da3\u6642\u53ef\u4ee5\u5617\u8a66\u4f7f\u7528\u5b83\u4f86\u5efa\u7acb\u9ad8\u7d1a\u7684\u7db2\u7d61\u722c\u87f2\u3002<\/p>\n<ul class=\"post-ul\">Scrapy<\/ul>\n<h3>\u521b\u5efa\u9879\u76ee<\/h3>\n<p>\u9996\u5148\u5b89\u88c5Scrapy<br \/>\n$ pip\u5b89\u88c5scrapy<br \/>\n$ scrapy startproject hotentry<br \/>\n$ vim hotentry\/hotentry\/spiders\/hotentry.py<br \/>\n\u8f93\u5165\u4ee5\u4e0b\u4ee3\u7801\u3002<\/p>\n<pre class=\"post-pre\"><code><span class=\"c1\"># -*- coding: utf-8 -*-\r\n<\/span><span class=\"kn\">import<\/span> <span class=\"nn\">scrapy<\/span>\r\n<span class=\"kn\">from<\/span> <span class=\"nn\">scrapy.conf<\/span> <span class=\"kn\">import<\/span> <span class=\"n\">settings<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">boto3<\/span>\r\n<span class=\"kn\">import<\/span> <span class=\"nn\">json<\/span>\r\n\r\n<span class=\"n\">kinesis<\/span> <span class=\"o\">=<\/span> <span class=\"n\">boto3<\/span><span class=\"p\">.<\/span><span class=\"n\">client<\/span><span class=\"p\">(<\/span>\r\n        <span class=\"s\">'kinesis'<\/span><span class=\"p\">,<\/span>                                                                                                                                                                           \r\n        <span class=\"n\">aws_access_key_id<\/span><span class=\"o\">=<\/span><span class=\"n\">settings<\/span><span class=\"p\">[<\/span><span class=\"s\">'AWS_ACCESS_KEY_ID'<\/span><span class=\"p\">],<\/span>\r\n        <span class=\"n\">aws_secret_access_key<\/span><span class=\"o\">=<\/span><span class=\"n\">settings<\/span><span class=\"p\">[<\/span><span class=\"s\">'AWS_SECRET_ACCESS_KEY'<\/span><span class=\"p\">],<\/span>\r\n        <span class=\"n\">region_name<\/span><span class=\"o\">=<\/span><span class=\"s\">'ap-northeast-1'<\/span><span class=\"p\">)<\/span>\r\n\r\n<span class=\"k\">class<\/span> <span class=\"nc\">HotEntrySpider<\/span><span class=\"p\">(<\/span><span class=\"n\">scrapy<\/span><span class=\"p\">.<\/span><span class=\"n\">Spider<\/span><span class=\"p\">):<\/span>\r\n    <span class=\"n\">name<\/span> <span class=\"o\">=<\/span> <span class=\"s\">\"hotentry\"<\/span>\r\n    <span class=\"n\">allowed_domains<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"s\">\"b.hatena.ne.jp\"<\/span><span class=\"p\">]<\/span>\r\n    <span class=\"n\">start_urls<\/span> <span class=\"o\">=<\/span> <span class=\"p\">[<\/span><span class=\"s\">'http:\/\/b.hatena.ne.jp\/hotentry\/general'<\/span><span class=\"p\">]<\/span>\r\n\r\n    <span class=\"k\">def<\/span> <span class=\"nf\">parse<\/span><span class=\"p\">(<\/span><span class=\"bp\">self<\/span><span class=\"p\">,<\/span> <span class=\"n\">response<\/span><span class=\"p\">):<\/span>\r\n        <span class=\"k\">for<\/span> <span class=\"n\">sel<\/span> <span class=\"ow\">in<\/span> <span class=\"n\">response<\/span><span class=\"p\">.<\/span><span class=\"n\">css<\/span><span class=\"p\">(<\/span><span class=\"s\">\"li.hb-entry-unit-with-favorites\"<\/span><span class=\"p\">):<\/span>\r\n            <span class=\"n\">url<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sel<\/span><span class=\"p\">.<\/span><span class=\"n\">css<\/span><span class=\"p\">(<\/span><span class=\"s\">\"a.entry-link::attr('href')\"<\/span><span class=\"p\">).<\/span><span class=\"n\">extract_first<\/span><span class=\"p\">()<\/span>\r\n            <span class=\"k\">if<\/span> <span class=\"n\">url<\/span> <span class=\"ow\">is<\/span> <span class=\"bp\">None<\/span><span class=\"p\">:<\/span>\r\n                <span class=\"k\">continue<\/span>\r\n            <span class=\"n\">kinesis<\/span><span class=\"p\">.<\/span><span class=\"n\">put_record<\/span><span class=\"p\">(<\/span>\r\n                    <span class=\"n\">StreamName<\/span> <span class=\"o\">=<\/span> <span class=\"s\">\"scraping_url\"<\/span><span class=\"p\">,<\/span>\r\n                    <span class=\"n\">Data<\/span> <span class=\"o\">=<\/span> <span class=\"n\">sel<\/span><span class=\"p\">.<\/span><span class=\"n\">css<\/span><span class=\"p\">(<\/span><span class=\"s\">\"a.entry-link::attr('href')\"<\/span><span class=\"p\">).<\/span><span class=\"n\">extract_first<\/span><span class=\"p\">(),<\/span>\r\n                    <span class=\"n\">PartitionKey<\/span> <span class=\"o\">=<\/span> <span class=\"s\">\"scraper\"<\/span>\r\n            <span class=\"p\">)<\/span>\r\n<\/code><\/pre>\n<p>\u5728hotentry\/hotentry\/settings.py\u6587\u4ef6\u4e2d\u6dfb\u52a0Access Key ID\u548cSecret Access Key\uff1a<\/p>\n<p>AWS_ACCESS_KEY_ID = &#8216;AKI******************&#8217;<br \/>\nAWS_SECRET_ACCESS_KEY = &#8216;************************************&#8217;<\/p>\n<p>\u6211\u73b0\u5728\u53ef\u4ee5\u5c06\u8fd9\u6bb5\u4ee3\u7801\u653e\u5165Kinesis\u6d41\u4e2d\u8fdb\u884cPUT\u64cd\u4f5c\u3002\u8ba9\u6211\u4eec\u5c1d\u8bd5\u6267\u884c\u4e00\u4e0b\u6765\u6d4b\u8bd5\u4e00\u4e0b\u3002<\/p>\n<p>\u7528Scrapy\u5c06\u6570\u636e\u53d1\u9001\u5230Kinesis\uff0c\u7136\u540e\u901a\u8fc7AWS Lambda\u5c06\u6570\u636e\u53d1\u9001\u5230ElasticSearch\uff0c\u5c31\u5e94\u8be5\u53ef\u4ee5\u5b8c\u6210\u8fd9\u4e2a\u4efb\u52a1\u4e86\u3002<\/p>\n<h3>\u5c06Scrapy\u90e8\u7f72\u5230Scrapinghub\u3002<\/h3>\n<p>\u6211\u80fd\u591f\u4f7f\u7528Scrapy\u63d0\u53d6URL\u5e76\u53d1\u9001\u5230Kinesis\uff0c\u4f46\u5982\u679c\u4fdd\u6301\u8fd9\u6837\u7684\u8bdd\uff0c\u5b83\u5c06\u6210\u4e3a\u672c\u5730\u6279\u5904\u7406\u3002\u56e0\u6b64\uff0c\u6211\u5c06Scrapy\u7684\u4ee3\u7801\u90e8\u7f72\u5230\u540d\u4e3aScrapinghub\u7684\u4e91\u670d\u52a1\u4e0a\u3002<\/p>\n<p>\u8bf7\u53c2\u8003\u4ee5\u4e0b\u8be6\u7ec6\u7684\u6587\u7ae0\u6765\u4e86\u89e3\u5bfc\u5165\u65b9\u6cd5\uff1a<br \/>\n* \u901a\u8fc7Scrapy + Scrapy Cloud\u6765\u4eab\u53d7\u8212\u9002\u7684Python\u722c\u866b\u548c\u7f51\u9875\u6293\u53d6\u751f\u6d3b\u3002<\/p>\n<p>\u7531\u4e8e\u7528\u6237\u6ce8\u518c\u5230\u90e8\u7f72\u975e\u5e38\u7b80\u5355\uff0c\u6211\u5c06\u7b80\u8981\u6982\u8ff0\u3002<\/p>\n<h2>\u6700\u540e<\/h2>\n<p>\u6700\u521d\uff0c\u6211\u5011\u5c07SQS\u548cDynamoDB\u4f7f\u7528Lambda\u51fd\u6578\u5206\u958b\uff0c\u7136\u800c\u7531\u65bc\u8b8a\u5f97\u8907\u96dc\u800c\u7121\u6cd5\u8ffd\u8e64\u932f\u8aa4\uff0c\u6700\u7d42\u5931\u6557\u4e86\u3002\u679c\u7136\u7c21\u55ae\u5c31\u662f\u6700\u597d\u7684\u3002\u5e0c\u671bLambda\u7684\u89f8\u767c\u5668\u80fd\u5920\u652f\u6301\u66f4\u591a\u7684\u670d\u52d9\u3002<\/p>\n<p>\u56e0\u4e3a\u8fd9\u6bb5\u4ee3\u7801\u53ea\u662f\u7528\u4e8e\u6d4b\u8bd5\uff0c\u6240\u4ee5\u5e76\u6ca1\u6709\u4e25\u683c\u8fdb\u884c\u9519\u8bef\u5904\u7406\u3002\u5373\u4f7f\u8fd9\u6bb5\u4ee3\u7801\u5e26\u6765\u4e86\u4efb\u4f55\u4e0d\u5229\u5f71\u54cd\uff0c\u4e5f\u8bf7\u81ea\u884c\u627f\u62c5\u8d23\u4efb\u3002<\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u5c1d\u8bd5\u4f7f\u7528\u65e0\u670d\u52a1\u5668\u548c\u4e8b\u4ef6\u9a71\u52a8\u7684\u65b9\u5f0f\u521b\u5efa\u7f51\u9875\u722c\u866b\u3002 \u56e0\u4e3a\u6211\u6b63\u5728\u5b66\u4e60ElasticSearch\uff0c\u6240\u4ee5\u6211\u60f3\u901a\u8fc7\u4f7f\u7528ES [&hellip;]<\/p>\n","protected":false},"author":12,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-41100","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v21.5 (Yoast SEO v21.5) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b - Blog - Silicon Cloud<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.silicloud.com\/zh\/blog\/python-\u5c1d\u8bd5\u4f7f\u7528-aws-\u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\/\" \/>\n<meta property=\"og:locale\" content=\"zh_CN\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b\" \/>\n<meta property=\"og:description\" content=\"\u5c1d\u8bd5\u4f7f\u7528\u65e0\u670d\u52a1\u5668\u548c\u4e8b\u4ef6\u9a71\u52a8\u7684\u65b9\u5f0f\u521b\u5efa\u7f51\u9875\u722c\u866b\u3002 \u56e0\u4e3a\u6211\u6b63\u5728\u5b66\u4e60ElasticSearch\uff0c\u6240\u4ee5\u6211\u60f3\u901a\u8fc7\u4f7f\u7528ES [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.silicloud.com\/zh\/blog\/python-\u5c1d\u8bd5\u4f7f\u7528-aws-\u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\/\" \/>\n<meta property=\"og:site_name\" content=\"Blog - Silicon Cloud\" \/>\n<meta property=\"article:published_time\" content=\"2023-12-07T10:49:31+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-04T06:56:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cdn.silicloud.com\/blog-img\/blog\/img\/657d44f637434c4406ca1198\/4-0.png\" \/>\n<meta name=\"author\" content=\"\u9038, \u79d1\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"\u4f5c\u8005\" \/>\n\t<meta name=\"twitter:data1\" content=\"\u9038, \u79d1\" \/>\n\t<meta name=\"twitter:label2\" content=\"\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 \u5206\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/\",\"url\":\"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/\",\"name\":\"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b - Blog - Silicon Cloud\",\"isPartOf\":{\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#website\"},\"datePublished\":\"2023-12-07T10:49:31+00:00\",\"dateModified\":\"2024-05-04T06:56:45+00:00\",\"author\":{\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/85c1dae56e6ea1e695c73d33c684d487\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/#breadcrumb\"},\"inLanguage\":\"zh-Hans\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"\u9996\u9875\",\"item\":\"https:\/\/www.silicloud.com\/zh\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#website\",\"url\":\"https:\/\/www.silicloud.com\/zh\/blog\/\",\"name\":\"Blog - Silicon Cloud\",\"description\":\"\",\"inLanguage\":\"zh-Hans\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/85c1dae56e6ea1e695c73d33c684d487\",\"name\":\"\u9038, \u79d1\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/c94f6d9cbbfbca863fab309840bd690c153c95f8490c290ad2ed54dd693dad16?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/c94f6d9cbbfbca863fab309840bd690c153c95f8490c290ad2ed54dd693dad16?s=96&d=mm&r=g\",\"caption\":\"\u9038, \u79d1\"},\"url\":\"https:\/\/www.silicloud.com\/zh\/blog\/author\/keyi\/\"},{\"@type\":\"ImageObject\",\"inLanguage\":\"zh-Hans\",\"@id\":\"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/#local-main-organization-logo\",\"url\":\"\",\"contentUrl\":\"\",\"caption\":\"Blog - Silicon Cloud\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b - Blog - Silicon Cloud","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.silicloud.com\/zh\/blog\/python-\u5c1d\u8bd5\u4f7f\u7528-aws-\u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\/","og_locale":"zh_CN","og_type":"article","og_title":"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b","og_description":"\u5c1d\u8bd5\u4f7f\u7528\u65e0\u670d\u52a1\u5668\u548c\u4e8b\u4ef6\u9a71\u52a8\u7684\u65b9\u5f0f\u521b\u5efa\u7f51\u9875\u722c\u866b\u3002 \u56e0\u4e3a\u6211\u6b63\u5728\u5b66\u4e60ElasticSearch\uff0c\u6240\u4ee5\u6211\u60f3\u901a\u8fc7\u4f7f\u7528ES [&hellip;]","og_url":"https:\/\/www.silicloud.com\/zh\/blog\/python-\u5c1d\u8bd5\u4f7f\u7528-aws-\u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\/","og_site_name":"Blog - Silicon Cloud","article_published_time":"2023-12-07T10:49:31+00:00","article_modified_time":"2024-05-04T06:56:45+00:00","og_image":[{"url":"https:\/\/cdn.silicloud.com\/blog-img\/blog\/img\/657d44f637434c4406ca1198\/4-0.png"}],"author":"\u9038, \u79d1","twitter_card":"summary_large_image","twitter_misc":{"\u4f5c\u8005":"\u9038, \u79d1","\u9884\u8ba1\u9605\u8bfb\u65f6\u95f4":"3 \u5206"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/","url":"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/","name":"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b - Blog - Silicon Cloud","isPartOf":{"@id":"https:\/\/www.silicloud.com\/zh\/blog\/#website"},"datePublished":"2023-12-07T10:49:31+00:00","dateModified":"2024-05-04T06:56:45+00:00","author":{"@id":"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/85c1dae56e6ea1e695c73d33c684d487"},"breadcrumb":{"@id":"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/#breadcrumb"},"inLanguage":"zh-Hans","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"\u9996\u9875","item":"https:\/\/www.silicloud.com\/zh\/blog\/"},{"@type":"ListItem","position":2,"name":"[Python] \u5c1d\u8bd5\u4f7f\u7528 AWS \u7684\u65e0\u670d\u52a1\u5668\u67b6\u6784\u6765\u521b\u5efa\u4e00\u4e2a\u4e8b\u4ef6\u9a71\u52a8\u7684Web\u722c\u866b"}]},{"@type":"WebSite","@id":"https:\/\/www.silicloud.com\/zh\/blog\/#website","url":"https:\/\/www.silicloud.com\/zh\/blog\/","name":"Blog - Silicon Cloud","description":"","inLanguage":"zh-Hans"},{"@type":"Person","@id":"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/85c1dae56e6ea1e695c73d33c684d487","name":"\u9038, \u79d1","image":{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.silicloud.com\/zh\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/c94f6d9cbbfbca863fab309840bd690c153c95f8490c290ad2ed54dd693dad16?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/c94f6d9cbbfbca863fab309840bd690c153c95f8490c290ad2ed54dd693dad16?s=96&d=mm&r=g","caption":"\u9038, \u79d1"},"url":"https:\/\/www.silicloud.com\/zh\/blog\/author\/keyi\/"},{"@type":"ImageObject","inLanguage":"zh-Hans","@id":"https:\/\/www.silicloud.com\/zh\/blog\/python-%e5%b0%9d%e8%af%95%e4%bd%bf%e7%94%a8-aws-%e7%9a%84%e6%97%a0%e6%9c%8d%e5%8a%a1%e5%99%a8%e6%9e%b6%e6%9e%84%e6%9d%a5%e5%88%9b%e5%bb%ba%e4%b8%80%e4%b8%aa%e4%ba%8b%e4%bb%b6%e9%a9%b1%e5%8a%a8\/#local-main-organization-logo","url":"","contentUrl":"","caption":"Blog - Silicon Cloud"}]}},"_links":{"self":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts\/41100","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/comments?post=41100"}],"version-history":[{"count":2,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts\/41100\/revisions"}],"predecessor-version":[{"id":98999,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/posts\/41100\/revisions\/98999"}],"wp:attachment":[{"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/media?parent=41100"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/categories?post=41100"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.silicloud.com\/zh\/blog\/wp-json\/wp\/v2\/tags?post=41100"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}