I learn by reading not so much by watching videos or taking classes. Below are a selection of books that I have read on machine learning, coding, and modeling. I show only books that I have read at least a large portion of and give a description for situations where the book will be useful.
Even if you do not know R programming the Machine Learning with R is a great text to get the bare bones basics of machine learning. I recommend this only for absolute beginners. It will not cover enough details for this to be your only source, but it is good for a first pass.
Python Machine Learning is good for a “second pass” after the book I suggested above. It is a lot more detailed and has much more of the underlying math in the text. It may be difficult to understand as a first resource. I recommend it for individuals who have beginner-level knowledge in machine learning and want a more detailed dive.
Deep Learning with Python is a fantastic book that covers the basics of deep learning. Instead of using mathematical notation, it uses Python code, which makes it very accessible for developers. It is very easy understand for very difficult concepts!
The Unsupervised Learning Workshop is an excellent introduction to unsupervised learning lessons. This is by far the most in depth and easiest to understand overview of clustering that I have seen yet.
Fundamentals of Data Visualization was superb at teaching you graphing best practices. I learned so many things I did not know after graphing for years. He makes his book available for free online here and code from his book available here.
This book is the best book on coding in PySpark that I have found. I read it cover to cover, and in general, used Learning Spark below when I needed further clarification from the book above. It does an in-depth overview of programming concepts, Spark architecture, and machine learning modeling and pipelines. I highly recommend this book! This is Databrick’s recommended resource for studying for the Databricks Certified Associate Developer for Apache Spark 3.0 certification exam and the Databricks Certified Associate ML Practitioner for Apache Spark 2.4 exam.
This book covers the fundamentals of coding in Spark using both PySpark and Scala for its examples throughout the book. This book does a great job of not only explaining the code syntax, but also Spark architecture and how to write efficient code. It is more high-level than the book above this one. Unlike the book above, this text covers Delta Lake and is more up-to-date with the latest features. This is Databrick’s recommended resource for studying for the Databricks Certified Associate Developer for Apache Spark 3.0 certification exam.
This book is an excellent introduction to machine learning deployment best practices and MLOps. It covers concepts like data organization, deployment options, and model explainability.
Big Data Analytics with R is useful for learning how to code more efficiently in R and to code more efficiently in general. I found the information about local parallelization very useful for large local R projects.
Practical Time Series Analysis is one of the better books I have read on the basics of time series analysis. It will be a great resources for beginners. One great feature is it includes code in both Python and R for a broader audience.
Above is an easy-to-understand calculus textbook that can help you understand the fundamental math behind machine learning and deep learning algorithms. This is great to accompany any machine or deep learning book if you want to dig into the math behind the algorithms.











