Spark Architecture & Data Exploration

In this class I cover the basics of Spark architecture and parallel processing. I explain the basics of data partitioning, the makeup of the Spark cluster, and the difference between transformations and actions. I then show the basics of displaying and exploring your data with grouped aggregations.


Code and Data Available Here

Slides

Suggested Reading:

  • Spark: The Definitive Guide, Chapters 1 and 2 (p. 3-30) and Chapter 4 (p. 49-58)
  • Learning Spark, 2nd Edition, Chapters 1 and 2 (p. 1-42)