What is RDD in Spark?

RDD (Resilient Distributed Dataset) is the most fundamental data abstraction in Spark, representing an immutable, partitioned collection of elements. RDDs can be computed in parallel across multiple nodes in a cluster. They can be created from data sources such as Hadoop file systems, HBase, Cassandra, and generated through transformations on other RDDs. RDDs are a highly fault-tolerant data structure provided by Spark, capable of automatically recovering even in the event of node failures.

广告
Closing in 10 seconds
bannerAds