Class Overview
Below are materials for my ongoing Introduction to PySpark course. This class will cover the foundational topics of big data analysis with PySpark in Databricks including:
- Spark architecture
- Basic data transformations
- DataFrame joins and aggregations
- Optimmizing PySpark code
- Koalas, or using Python Pandas syntax for parallel processing
- Working with Delta Lake
- Modeling with MLlib
- Model deployment and monitoring with MLFlow
For the ongoing class schedule please join my Data Science and Big Data Meetup
Class Github
Code and data associated with each course can be found on Github here: https://github.com/kelsey-huntzberry/DataAnalysisLab.
