Introduction
This post was written on Quora as an Answer by us, so we thought to make it a formal post here at our Official Channel. 3 Months will be a little less to gain deep understanding but good to complete a full Iteration.
A key challenge with ML is that it touches multiple disciplines. At the same time, you don’t need to complete 100% of all in a serial approach.
Resources are freely available but making a collage out of all is the actual skill. This overflow of information can sometimes become a challenge too as it confuses a new entrant.
But be clear that it's a difficult task at hand. Don't believe what you read in click-bait blogs i.e. you can learn in 1-2 months. So, start with a tough mindset and ready to show your perseverance. Below is the high-level roadmap for the same
Week-I : Python
Objective - Get you to hand wet with Python. Python is at the core of AI, ML. Most of the leading Frameworks are base on Python
- Complete 4 chapters from Python official docs Link
With this, we are good with Python fundamentals.
2. Practice in Google Colab Link
It’s easy and convenient to code in Cloud. It has a quick start-up guide.
Week-II-III : Going Deep into Python, Numpy, and Pandas
Objective - Understand Python Data structures esp. List. Then grasp the two most important Libraries i.e. Numpy and Pandas. Make sure you can Slice arrays, Do List comprehension, can do CRUD with Pandas DataFrame. Though we will need a bit more for that, you may come back as and when needed.
- Follow Python for Data Analysis by Wes Mckinney Paid
This is the best content curated for ML(Python). It has a range from very simple to advanced stuff. Complete chapter #1 to #5. It's ok to skip any heavy-stuff if you feel so.
2. Python Data Science Handbook by Jake VanderPlas Link
This is a to-the-point book for Data Science. You may also use it as the Free alternative for the above. Complete chapters #1 to #3
Week-IV: Understanding and doing EDA and Plotting
Objective - Understand Matplotlib and Seaborn. Skimming the needed Statistics. Use the above two to do basic data exploration. This is a good practice to try to figure out the possible patterns and exceptions of the Dataset. Keep this in mind, EDA is more of an art than science, so you will learn it throughout the journey. Also, none of the books covers it explicitly as a chapter.
- Python Data Science Handbook by Jake VanderPlasLink]
Complete chapter #4 for Matplotlib and Seaborn. With this, you should have all the required hands-on Plotting.
2. Quickly review Statistics and EDA concept [chapter#1 to #4] Link
This is more around theoretically understanding of the EDA
3. Another blog to grasp EDA Link
This is more directly related to practical EDA
4. Code
Most of the EDA related code is either plotting Or simple Statistical function. You can easily do that using Pandas and Seaborn.
Week-V-X: Learning Machine Learning Modelling
Objective - Now you are ready to learn the Models. Simply follow this one book and you are pretty good with theory and Scikit-Learn coding. As far as datasets are concerned, simply follow what the book has used.
- Hands-On Machine Learning by Aurélien Géron Paid
This is one of the best books for ML, DL. It's not that other books [See the list at the end] are not equally good but the way it has accommodated Machine Learning as well as Deep Learning and explained almost every inner working and tricks is amazing. At the same time, it has more than ample coding examples.
Complete Part-I of the book i.e. chapter #1 to #9.
2. Andrew NG course on ML Link
This was a revolutionary course in ML and inspired millions to learn. Follow it as a reference to understand any concept. It will not cover all of the topics e.g. Ensemble modelling. Also, it is not in Python. But it will be the best resource to grasp underlying concepts.
Week-XI-XIII: Observe, Practice, and Iterate
Objective - Now you are ready to apply your learning to real-world problems. But don’t assume that you will simply start solving every bit. Still, there are concepts that can only be learned with practising e.g. quickly manipulating Dataset using Pandas. Let’s list what we can do on this and also list what still remains :-)
What remains
These are the specific cases that are still not covered. But you can learn it in parallel with your practice or after this 3-Month window. By this time you have a decent experience of the core things, so you can also figure out the needed approach.
a. Times series data
These types of data need some special treatment. Jason Brownlee has many good posts on TS data. You can learn as and when you need it. Link]
b. Handling Imbalance Data
This is just a special case of data. You can check any good blog or can check the last link i.e. Machine Learning Mastery
c. Recommendation System
d. Feature engineering
It’s not that you have not done FE till now but as I said earlier it is more of an Art than Science. You will learn along the journey. See the reference[#1] at the end for a great read.
e. Post modeling activities
This task is a separate learning domain. You can follow a recent book by Andriy Burkov Link and in parallel explore more on the same.
f. Observe and Practice
Try spending some time with some of the notebooks on Kaggle. Focus on the notebooks which are around EDA, Feature engineering, or the topic you need. Avoid the Notebooks which claim “How I top, reached top 0.1%, etc” for time being.
Try to practice dataset which teaches different concepts, e.g. Too many features, Lot of NaNs, Imbalance, High volume, very small samples, etc.
Useful datasets to practice -
a. UCI ML repository Link]
Search dataset as per your need *i.e. size, CAT columns, Feature count, Classification, Regression, etc.
b. Another dataset repository Link]
Search dataset as per your need i.e. size, CAT columns, Feature count, Classification, Regression, etc.
c. Kaggle - Advance regression Link]
d. Kaggle - Credit card fraud Link]
e. Kaggle - Craigslist used Car Link ]
Good reads and your Friends -
As we said at the start, 3 months is not ample to gain all the expertise. So, we are listing some content/books which will quench your thirst for knowledge if you want to do so. Many of these are not listed in other lists that you can find on the internet but these are the best of the league. Use these to glance for any specific topic you are interested in
- Feature Engineering and Selection: A Practical Approach for Predictive Models, By Max Kuhn and Kjell Johnson Link
If you want to dive deep into the art of Feature Engineering
2. Support Vector Machine Succinctly by Alexandre Kowalczyk Link
If you want to understand how SVM and its maths work.
3. Experimental Design and Analysis, Howard J. Seltman Link
This is the complete book link for the reference listed in the EDA portion in the Blog. Read it to understand Probability concepts required for ML
4. STAT 501 online course by PennState Eberly College of Science Link
LinearRegression has many things to tell if you are looking for interpretability instead of a quick fit/prediction with Scikit-Learn. This content has all that is needed to comprehend i.e. Assumptions, Interpretation.
5. Understanding Imbalanced dataset, Jeremy Jordon Link
To quickly gain the needed knowledge into Imbalanced datasets.
6. An Introduction to Statistical Learning, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani Link
This and the next one are your handy reference books to look for particular concepts. This one is more friendly to start the ML journey.
7. The Elements of Statistical Learning, by Trevor Hastie, Robert Tibshirani, Jerome Friedman Link
Use this to go deep into particular concepts e.g Tree/ Ensemble/Cross-Validation etc.
8. StackExchange sites i.e. Stats, Datascience and StackOverflow Link Link Link
While you will use SO whenever you are stuck, but the other two can also be treated as reading great contents i.e. some good answers to confusing questions.
9. StatQuest with Josh Starmer, Youtube Channel Link
Another gem of a resource if you want to learn a topic like a Kid
10. Introduction to Machine Learning for Coders By Jeremy Howard Link
Jeremy is a great teacher esp. Focussing on thinking beyond what you get in textbooks. Good to dive into Ensemble esp. Random Forest and Python code.
11. Rules of Machine Learning: Best Practices for ML Engineering by Martin Zinkevich Link
Words of wisdom on Best practice in ML. This will help you look into the journey holistically.
12. Interpretable Machine Learning by Christoph Molnar Link
Interpretability is still evolving and a big challenge in this field. This book has a good summary of related concepts. It will also teach you multiple concepts about the Model you already know.
13. Official websites and Paper reading
While Blogs/Books are great but scanning the examples given on the official website is a good approach to learn things quickly esp. understanding the parameters. Keep checking Scikit-Learn and Seaborn websites. Seaborn , Scikit-learn
You should also try to read the paper esp. Paper for RandomForest and Extremely RandomizedTree. This work will add a very different level of knowledge and perspective to know things. Randomforest, Extremely RandomizedTree
Behavioral aspect -
- Don’t run for a high score, think in terms of Robustness and Business-value
- Don’t just read/watch, make sure you are coding everything by typing yourself. Learning that you gain by solving simple errors is unmatchable. If you will copy/paste you will never see those issue e.g. CAT Encoding a dataset having mixed types of Features
- Try to understand the trade-off of every model. Don’t just try to be a RandomForest/GBM Ninja :-)
- MLOps and Post Modelling work in itself is a completely separate semester. Keep this in mind
- Maintain a balance between understanding the concept and able to code it.
- Try to commit to a longer period of learning ideally lifelong but here at least another 8-9 Months.
- Know how the Dunning-Kruger effect works and be conscious of it when stuck with anything difficult. The key is to keep moving.
- Beware of short-cuts proponents and baits as there are millions on the internet in this field. Learning takes some time and effort.
Optional -
We, at 10xAI Learning, offer Instructor-Led online courses in Python, Machine Learning, and Deep Learning.
Check-out our website learning.10xai.co for the details and Call/WhatsApp us for any needed information.
A glimpse from our Course