In this session we cover PySpark basics such as reading in data, filtering, joining, selecting/dropping columns, and creating new columns. We also cover how to use Koalas, which allows you to use Python Pandas inside Databricks that uses parallel processing!
Suggested Reading:
- Spark: The Definitive Guide, Chapter 5 (p. 59-81), Chapter 6 (p. 83-115), Chapter 7 (p. 117-137), and Chapter 9 (p. 153-158)
- Learning Spark, 2nd Edition, Chapter 3 (p. 43-82)
