{"id":5424,"date":"2024-03-14T02:49:36","date_gmt":"2024-03-14T02:49:36","guid":{"rendered":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/"},"modified":"2025-08-01T14:55:55","modified_gmt":"2025-08-01T14:55:55","slug":"what-are-shared-variables-in-spark","status":"publish","type":"post","link":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/","title":{"rendered":"Spark Shared Variables: Broadcast &#038; Accumulators"},"content":{"rendered":"<p>In Spark, shared variables are mutable variables that are shared among all tasks in the cluster. Spark supports two types of shared variables: broadcast variables and accumulators.<\/p>\n<ol>\n<li>Broadcast Variables: Broadcast variables allow programmers to cache a read-only variable on all nodes in the cluster, so that it can be used in each task. This helps reduce the cost of fetching the variable in each task and improves runtime efficiency.<\/li>\n<\/ol>\n<pre class=\"post-pre\"><code><span class=\"hljs-comment\"># \u5728Python\u4e2d\u521b\u5efa\u5e7f\u64ad\u53d8\u91cf<\/span>\r\nbroadcast_var = sc.broadcast([<span class=\"hljs-number\">1<\/span>, <span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">3<\/span>])\r\n\r\n<span class=\"hljs-comment\"># \u5728\u4efb\u52a1\u4e2d\u4f7f\u7528\u5e7f\u64ad\u53d8\u91cf<\/span>\r\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">my_func<\/span>(<span class=\"hljs-params\">value<\/span>):\r\n    <span class=\"hljs-keyword\">for<\/span> num <span class=\"hljs-keyword\">in<\/span> broadcast_var.value:\r\n        <span class=\"hljs-built_in\">print<\/span>(num * value)\r\n\r\nrdd.<span class=\"hljs-built_in\">map<\/span>(my_func).collect()\r\n<\/code><\/pre>\n<ol>\n<li>Accumulators: Accumulators allow multiple tasks to share a writable variable in the cluster for accumulating counts or other aggregation operations. They are typically used to record statistical information during task execution.<\/li>\n<\/ol>\n<pre class=\"post-pre\"><code><span class=\"hljs-comment\"># \u5728Python\u4e2d\u521b\u5efa\u7d2f\u52a0\u5668<\/span>\r\naccum = sc.accumulator(<span class=\"hljs-number\">0<\/span>)\r\n\r\n<span class=\"hljs-comment\"># \u5728\u4efb\u52a1\u4e2d\u4f7f\u7528\u7d2f\u52a0\u5668<\/span>\r\n<span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title function_\">my_func<\/span>(<span class=\"hljs-params\">value<\/span>):\r\n    accum.add(value)\r\n    <span class=\"hljs-keyword\">return<\/span> value\r\n\r\nrdd.<span class=\"hljs-built_in\">map<\/span>(my_func).collect()\r\n<span class=\"hljs-built_in\">print<\/span>(accum.value)\r\n<\/code><\/pre>\n<p>Be cautious when using shared variables as they can lead to problems such as concurrent access and inconsistent states, especially when multiple tasks are simultaneously modifying a shared variable. It is advisable to carefully consider the use cases of shared variables to ensure thread safety and reliability.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In Spark, shared variables are mutable variables that are shared among all tasks in the cluster. Spark supports two types of shared variables: broadcast variables and accumulators. Broadcast Variables: Broadcast variables allow programmers to cache a read-only variable on all nodes in the cluster, so that it can be used in each task. This helps [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_import_markdown_pro_load_document_selector":0,"_import_markdown_pro_submit_text_textarea":"","footnotes":""},"categories":[1],"tags":[5868,964,5867,2138,5866],"class_list":["post-5424","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-accumulators","tag-apache-spark","tag-broadcast-variables","tag-distributed-computing","tag-spark-shared-variables"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v21.5 (Yoast SEO v21.5) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Spark Shared Variables: Broadcast &amp; Accumulators - Blog - Silicon Cloud<\/title>\n<meta name=\"description\" content=\"Learn about Spark shared variables: broadcast variables and accumulators. Understand how they optimize data processing in distributed computing.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Spark Shared Variables: Broadcast &amp; Accumulators\" \/>\n<meta property=\"og:description\" content=\"Learn about Spark shared variables: broadcast variables and accumulators. Understand how they optimize data processing in distributed computing.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/\" \/>\n<meta property=\"og:site_name\" content=\"Blog - Silicon Cloud\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/SiliCloudGlobal\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-14T02:49:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-01T14:55:55+00:00\" \/>\n<meta name=\"author\" content=\"Olivia Parker\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@SiliCloudGlobal\" \/>\n<meta name=\"twitter:site\" content=\"@SiliCloudGlobal\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Olivia Parker\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/\"},\"author\":{\"name\":\"Olivia Parker\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/3ff7b3da0e45ac5dbbef2502f3cea8d9\"},\"headline\":\"Spark Shared Variables: Broadcast &#038; Accumulators\",\"datePublished\":\"2024-03-14T02:49:36+00:00\",\"dateModified\":\"2025-08-01T14:55:55+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/\"},\"wordCount\":152,\"publisher\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#organization\"},\"keywords\":[\"Accumulators\",\"Apache Spark\",\"Broadcast variables\",\"Distributed computing\",\"Spark shared variables\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/\",\"url\":\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/\",\"name\":\"Spark Shared Variables: Broadcast & Accumulators - Blog - Silicon Cloud\",\"isPartOf\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#website\"},\"datePublished\":\"2024-03-14T02:49:36+00:00\",\"dateModified\":\"2025-08-01T14:55:55+00:00\",\"description\":\"Learn about Spark shared variables: broadcast variables and accumulators. Understand how they optimize data processing in distributed computing.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.silicloud.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Spark Shared Variables: Broadcast &#038; Accumulators\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#website\",\"url\":\"https:\/\/www.silicloud.com\/blog\/\",\"name\":\"Silicon Cloud Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#organization\"},\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#organization\",\"name\":\"Silicon Cloud Blog\",\"url\":\"https:\/\/www.silicloud.com\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png\",\"contentUrl\":\"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png\",\"width\":1024,\"height\":1024,\"caption\":\"Silicon Cloud Blog\"},\"image\":{\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/SiliCloudGlobal\/\",\"https:\/\/twitter.com\/SiliCloudGlobal\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/3ff7b3da0e45ac5dbbef2502f3cea8d9\",\"name\":\"Olivia Parker\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/56c66f189ba32a6f9eb50f31a38fe774e2a725c213d4070835ccc51b8fbbc54b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/56c66f189ba32a6f9eb50f31a38fe774e2a725c213d4070835ccc51b8fbbc54b?s=96&d=mm&r=g\",\"caption\":\"Olivia Parker\"},\"url\":\"https:\/\/www.silicloud.com\/blog\/author\/oliviaparker\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Spark Shared Variables: Broadcast & Accumulators - Blog - Silicon Cloud","description":"Learn about Spark shared variables: broadcast variables and accumulators. Understand how they optimize data processing in distributed computing.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/","og_locale":"en_US","og_type":"article","og_title":"Spark Shared Variables: Broadcast & Accumulators","og_description":"Learn about Spark shared variables: broadcast variables and accumulators. Understand how they optimize data processing in distributed computing.","og_url":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/","og_site_name":"Blog - Silicon Cloud","article_publisher":"https:\/\/www.facebook.com\/SiliCloudGlobal\/","article_published_time":"2024-03-14T02:49:36+00:00","article_modified_time":"2025-08-01T14:55:55+00:00","author":"Olivia Parker","twitter_card":"summary_large_image","twitter_creator":"@SiliCloudGlobal","twitter_site":"@SiliCloudGlobal","twitter_misc":{"Written by":"Olivia Parker","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/#article","isPartOf":{"@id":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/"},"author":{"name":"Olivia Parker","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/3ff7b3da0e45ac5dbbef2502f3cea8d9"},"headline":"Spark Shared Variables: Broadcast &#038; Accumulators","datePublished":"2024-03-14T02:49:36+00:00","dateModified":"2025-08-01T14:55:55+00:00","mainEntityOfPage":{"@id":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/"},"wordCount":152,"publisher":{"@id":"https:\/\/www.silicloud.com\/blog\/#organization"},"keywords":["Accumulators","Apache Spark","Broadcast variables","Distributed computing","Spark shared variables"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/","url":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/","name":"Spark Shared Variables: Broadcast & Accumulators - Blog - Silicon Cloud","isPartOf":{"@id":"https:\/\/www.silicloud.com\/blog\/#website"},"datePublished":"2024-03-14T02:49:36+00:00","dateModified":"2025-08-01T14:55:55+00:00","description":"Learn about Spark shared variables: broadcast variables and accumulators. Understand how they optimize data processing in distributed computing.","breadcrumb":{"@id":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.silicloud.com\/blog\/what-are-shared-variables-in-spark\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.silicloud.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Spark Shared Variables: Broadcast &#038; Accumulators"}]},{"@type":"WebSite","@id":"https:\/\/www.silicloud.com\/blog\/#website","url":"https:\/\/www.silicloud.com\/blog\/","name":"Silicon Cloud Blog","description":"","publisher":{"@id":"https:\/\/www.silicloud.com\/blog\/#organization"},"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.silicloud.com\/blog\/#organization","name":"Silicon Cloud Blog","url":"https:\/\/www.silicloud.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png","contentUrl":"https:\/\/www.silicloud.com\/blog\/wp-content\/uploads\/2023\/11\/EN-SILICON-Full.png","width":1024,"height":1024,"caption":"Silicon Cloud Blog"},"image":{"@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/SiliCloudGlobal\/","https:\/\/twitter.com\/SiliCloudGlobal"]},{"@type":"Person","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/3ff7b3da0e45ac5dbbef2502f3cea8d9","name":"Olivia Parker","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.silicloud.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/56c66f189ba32a6f9eb50f31a38fe774e2a725c213d4070835ccc51b8fbbc54b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/56c66f189ba32a6f9eb50f31a38fe774e2a725c213d4070835ccc51b8fbbc54b?s=96&d=mm&r=g","caption":"Olivia Parker"},"url":"https:\/\/www.silicloud.com\/blog\/author\/oliviaparker\/"}]}},"_links":{"self":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts\/5424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/comments?post=5424"}],"version-history":[{"count":2,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts\/5424\/revisions"}],"predecessor-version":[{"id":150172,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/posts\/5424\/revisions\/150172"}],"wp:attachment":[{"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/media?parent=5424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/categories?post=5424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.silicloud.com\/blog\/wp-json\/wp\/v2\/tags?post=5424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}