{"id":19785,"date":"2024-03-15T19:27:22","date_gmt":"2024-03-15T19:27:22","guid":{"rendered":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/"},"modified":"2024-03-21T16:31:51","modified_gmt":"2024-03-21T16:31:51","slug":"how-can-flink-process-hdfs-data-offline","status":"publish","type":"post","link":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/","title":{"rendered":"How can Flink process HDFS data offline?"},"content":{"rendered":"<p>To perform offline processing in Flink and read data from HDFS, you can follow these steps:<\/p>\n<ol>\n<li>Firstly, make sure to include the necessary dependencies in your Flink application. You can do this by adding the following dependencies in the pom.xml file to import the relevant libraries for Hadoop and HDFS.<\/li>\n<\/ol>\n<pre class=\"post-pre\"><code><span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">dependencies<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>org.apache.flink<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>flink-java<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">version<\/span>&gt;<\/span>${flink.version}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">version<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>org.apache.flink<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>flink-streaming-java_${scala.binary.version}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">version<\/span>&gt;<\/span>${flink.version}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">version<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>org.apache.flink<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>flink-clients_${scala.binary.version}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">version<\/span>&gt;<\/span>${flink.version}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">version<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>org.apache.hadoop<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">groupId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>hadoop-hdfs<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">artifactId<\/span>&gt;<\/span>\r\n    <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">version<\/span>&gt;<\/span>${hadoop.version}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">version<\/span>&gt;<\/span>\r\n  <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">dependency<\/span>&gt;<\/span>\r\n<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">dependencies<\/span>&gt;<\/span>\r\n<\/code><\/pre>\n<p>Please make sure to replace ${flink.version} with the Flink version you are using, and replace ${scala.binary.version} with the Scala version you are using.<\/p>\n<ol>\n<li>Environment for executing data streaming tasks.<\/li>\n<\/ol>\n<pre class=\"post-pre\"><code><span class=\"hljs-type\">StreamExecutionEnvironment<\/span> <span class=\"hljs-variable\">env<\/span> <span class=\"hljs-operator\">=<\/span> StreamExecutionEnvironment.getExecutionEnvironment();\r\n<\/code><\/pre>\n<ol>\n<li>import the contents of a text file<\/li>\n<li>Stream of data<\/li>\n<\/ol>\n<pre class=\"post-pre\"><code>DataStream&lt;String&gt; dataStream = env.readTextFile(<span class=\"hljs-string\">\"hdfs:\/\/path\/to\/file\"<\/span>);\r\n<\/code><\/pre>\n<p>Please replace hdfs:\/\/path\/to\/file with the path of the HDFS file you want to read.<\/p>\n<ol>\n<li>output or display<\/li>\n<\/ol>\n<pre class=\"post-pre\"><code>dataStream.print();\r\n<\/code><\/pre>\n<ol>\n<li>.perform()<\/li>\n<\/ol>\n<pre class=\"post-pre\"><code>env.execute(<span class=\"hljs-string\">\"Read HDFS Data\"<\/span>);\r\n<\/code><\/pre>\n<p>After completing the above steps, your Flink application will be able to read data from HDFS and perform offline processing. You can further process and transform the data according to your own needs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>To perform offline processing in Flink and read data from HDFS, you can follow these steps: Firstly, make sure to include the necessary dependencies in your Flink application. You can do this by adding the following dependencies in the pom.xml file to import the relevant libraries for Hadoop and HDFS. &lt;dependencies&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;\/groupId&gt; &lt;artifactId&gt;flink-java&lt;\/artifactId&gt; &lt;version&gt;${flink.version}&lt;\/version&gt; [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","footnotes":""},"categories":[1],"tags":[],"class_list":["post-19785","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v21.5 (Yoast SEO v21.5) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How can Flink process HDFS data offline? - Blog - Silicon Cloud<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How can Flink process HDFS data offline?\" \/>\n<meta property=\"og:description\" content=\"To perform offline processing in Flink and read data from HDFS, you can follow these steps: Firstly, make sure to include the necessary dependencies in your Flink application. You can do this by adding the following dependencies in the pom.xml file to import the relevant libraries for Hadoop and HDFS. &lt;dependencies&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;\/groupId&gt; &lt;artifactId&gt;flink-java&lt;\/artifactId&gt; &lt;version&gt;${flink.version}&lt;\/version&gt; [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/\" \/>\n<meta property=\"og:site_name\" content=\"Blog - Silicon Cloud\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/SiliCloudGlobal\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-15T19:27:22+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-21T16:31:51+00:00\" \/>\n<meta name=\"author\" content=\"Ava Mitchell\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SiliCloudGlobal\" \/>\n<meta name=\"twitter:site\" content=\"@SiliCloudGlobal\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ava Mitchell\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/\"},\"author\":{\"name\":\"Ava Mitchell\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/a3e2658c2cb9fb2be95ae0a8861f4a64\"},\"headline\":\"How can Flink process HDFS data offline?\",\"datePublished\":\"2024-03-15T19:27:22+00:00\",\"dateModified\":\"2024-03-21T16:31:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/\"},\"wordCount\":154,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#organization\"},\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/\",\"url\":\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/\",\"name\":\"How can Flink process HDFS data offline? - Blog - Silicon Cloud\",\"isPartOf\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#website\"},\"datePublished\":\"2024-03-15T19:27:22+00:00\",\"dateModified\":\"2024-03-21T16:31:51+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.silicloud.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How can Flink process HDFS data offline?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#website\",\"url\":\"https:\/\/www.silicloud.com\/blog\/\",\"name\":\"Silicon Cloud Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#organization\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#organization\",\"name\":\"Silicon Cloud Blog\",\"url\":\"https:\/\/www.silicloud.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png\",\"contentUrl\":\"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png\",\"width\":1024,\"height\":1024,\"caption\":\"Silicon Cloud Blog\"},\"image\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/SiliCloudGlobal\/\",\"https:\/\/twitter.com\/SiliCloudGlobal\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/a3e2658c2cb9fb2be95ae0a8861f4a64\",\"name\":\"Ava Mitchell\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/15c63cd0564b4a2e07d611bcdffa296f6ea80e8db07c3091f43a84010514899d?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/15c63cd0564b4a2e07d611bcdffa296f6ea80e8db07c3091f43a84010514899d?s=96&d=mm&r=g\",\"caption\":\"Ava Mitchell\"},\"url\":\"https:\/\/www.silicloud.com\/blog\/author\/avamitchell\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"How can Flink process HDFS data offline? - Blog - Silicon Cloud","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/","og_locale":"en_US","og_type":"article","og_title":"How can Flink process HDFS data offline?","og_description":"To perform offline processing in Flink and read data from HDFS, you can follow these steps: Firstly, make sure to include the necessary dependencies in your Flink application. You can do this by adding the following dependencies in the pom.xml file to import the relevant libraries for Hadoop and HDFS. &lt;dependencies&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;\/groupId&gt; &lt;artifactId&gt;flink-java&lt;\/artifactId&gt; &lt;version&gt;${flink.version}&lt;\/version&gt; [&hellip;]","og_url":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/","og_site_name":"Blog - Silicon Cloud","article_publisher":"https:\/\/www.facebook.com\/SiliCloudGlobal\/","article_published_time":"2024-03-15T19:27:22+00:00","article_modified_time":"2024-03-21T16:31:51+00:00","author":"Ava Mitchell","twitter_card":"summary_large_image","twitter_creator":"@SiliCloudGlobal","twitter_site":"@SiliCloudGlobal","twitter_misc":{"Written by":"Ava Mitchell","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/#article","isPartOf":{"@id":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/"},"author":{"name":"Ava Mitchell","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/a3e2658c2cb9fb2be95ae0a8861f4a64"},"headline":"How can Flink process HDFS data offline?","datePublished":"2024-03-15T19:27:22+00:00","dateModified":"2024-03-21T16:31:51+00:00","mainEntityOfPage":{"@id":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/"},"wordCount":154,"commentCount":0,"publisher":{"@id":"https:\/\/www.silicloud.com\/blog\/#organization"},"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/","url":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/","name":"How can Flink process HDFS data offline? - Blog - Silicon Cloud","isPartOf":{"@id":"https:\/\/www.silicloud.com\/blog\/#website"},"datePublished":"2024-03-15T19:27:22+00:00","dateModified":"2024-03-21T16:31:51+00:00","breadcrumb":{"@id":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.silicloud.com\/blog\/how-can-flink-process-hdfs-data-offline\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.silicloud.com\/blog\/"},{"@type":"ListItem","position":2,"name":"How can Flink process HDFS data offline?"}]},{"@type":"WebSite","@id":"https:\/\/www.silicloud.com\/blog\/#website","url":"https:\/\/www.silicloud.com\/blog\/","name":"Silicon Cloud Blog","description":"","publisher":{"@id":"https:\/\/www.silicloud.com\/blog\/#organization"},"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.silicloud.com\/blog\/#organization","name":"Silicon Cloud Blog","url":"https:\/\/www.silicloud.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png","contentUrl":"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png","width":1024,"height":1024,"caption":"Silicon Cloud Blog"},"image":{"@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/SiliCloudGlobal\/","https:\/\/twitter.com\/SiliCloudGlobal"]},{"@type":"Person","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/a3e2658c2cb9fb2be95ae0a8861f4a64","name":"Ava Mitchell","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/15c63cd0564b4a2e07d611bcdffa296f6ea80e8db07c3091f43a84010514899d?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/15c63cd0564b4a2e07d611bcdffa296f6ea80e8db07c3091f43a84010514899d?s=96&d=mm&r=g","caption":"Ava Mitchell"},"url":"https:\/\/www.silicloud.com\/blog\/author\/avamitchell\/"}]}},"_links":{"self":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts\/19785","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/comments?post=19785"}],"version-history":[{"count":1,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts\/19785\/revisions"}],"predecessor-version":[{"id":53545,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts\/19785\/revisions\/53545"}],"wp:attachment":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/media?parent=19785"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/categories?post=19785"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/tags?post=19785"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}