During the honeymoon phase, the first MOOCs with Andrew Ng repeating “concretely” 10 times in a 6 minutes video, machine learning seems pretty easy and intuitive. There are plenty of Medium articles or tutorials that we can read quickly, even in the Parisian metro, and understand what is explained.

But sooner or later, during an interview or with coworkers, we come to realize that there is far more to data science than just reading blog articles or following well-designed MOOCs. Proper code versioning, clean code habits, advanced machine learning libraries and algorithms, dataviz, advanced probabilities and statistics … Whether on the theoretical or the practical aspect, there is a huge gap between readers, newcomers, (or sometimes bloggers like me haha) and professional practitioners, real (data) scientists. Last year, still in uni, I took a few hours of my free time to bridge that gap, at least the theoretical aspect of it. These are some of the resources I used. As I am always learning I will add more resources as I discover them.

Courses pdf


These are some books that I spent time reading. They require much more effort than the above shared pdfs.

  1. MLAPP, Machine Learning a Probabilistic Perspective by Kevin Murphy
  2. ESL, Elements of Statistical Learning, by Hastie, Tibshirani, and Freidman
  3. Deep Learning, by Courville, Goodfellow, and Bengio

While Elements of Statistical Learning was the first I read, I found it too verbose in some parts. Chapters 1 to 4 were really worth my time though (these notes helped me a lot!).
Machine Learning a Probabilistic Perspective is my favorite, it is more concise and tries (and fails sometimes) to go straight to the point in every chapter. It is easy to miss some steps in the equations sometimes, but it is part of the learning process haha!
Finally, I read the Deep Learning book “just for fun”. As Deep Learning is more experimental than theoretical, with a lot of trial and error, I did not want to spend too much time trying to understand the theory. Understanding the main architectures(MLP, Convnet, RNN …), backpropagation, or why LSTM and GRU architectures solve the gradient vanishing problem was enough for me.

Other ressources

Some other books or pdfs that I found interesting :

Categories: Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *