Intro to Machine Learning

This is Section (2&3) of the ZTM course.

Hello 👋, I'm Shuraim and I'll be writing blog posts covering each section from the ZTM Complete AI, ML and DS Bootcamp.

I learned a lot of things from this section, which are as follows.

What is Machine Learning?

Goal of ML :

Is to make machines to act more and more like humans.

AI vs ML vs DL vs DS

AI - giving machines human-like intelligence

ML - ability of machine to perform task without explicitily programmed

DL - one of the techniques ts implement Al

ML overlaps with DS.

Steps in a full ML project

This framework will be explained later in this blog.

Types of Machine Learning

All these methods learn from the data it receives and predict something.

What is Machine Learning?

There are many definitions of ML because it contains different aspects.

In a single sentence,

"Machine Learning is using an algorithm/computer program to learn about different patterns in data and then use that algorithm and what its learnt to make predictions about future using similar data."

Normal algorithm vs ML algorithm

The main difference between these two is how these learn.

Normal algorithms - Start with inputs and set of instructions to get our output.

Machine Learning algorithms - Instead of starting with an input and set of instructions, we start with an input and an ideal output.

It looks at inputs and outputs and tries to figure out the instructions between these two.

ML models find patterns collected in data so we can use those patterns for future problems.

Machine Learning and Data Science framework

There are 3 parts in ML framework:-

Data collection
Data modelling
Deployment

Data modelling further has 6 stages /steps :-

Problem Definition - Figuring out what problem we're trying to solve.
Data - What kind of data do we have?
Evaluation - What defines success for us? (meaning when is a model good)
Features - What do we already know about the data?
Modelling - Based on our problem and data, what model should we use?
Experimentation - How can we improve the model / what can we try next?

These steps need not be followed in order and it is just a rough guide.

These were the questions we need to answer for each step, and a more detailed explanation of each step is provided below

1.Problem Definition

"What problem are we trying to solve?"

When you shouldn't use ML?

When a simple hand coded instruction based system works, then use it. Don't use ML.

Ex:- When you have all ingredients and exact steps to make a chicken dish, them don't use ML.

Main types ML

Supervised Learning - has labelled outputs.
Unsupervised Learning - has data without labels. This finds patterns and useful insights from data.
Transfer Learning - Leverages what one ML model has learnt in another ML model.

Ex :- You can take a model that is trained on car images which also includes trees, grass so on in the background. This model has singer idea of how great, trees etc look like and apply it to dog breed example.

Reinforcement Learning - training model to play chess. Reward-penalty model.

How do you match your problem?

Supervised Learning - "I know my inputs and outputs"

Unsupervised Learning - "I'm not sure of outputs but I have inputs"

Transfer Learning - "I think my problem may be similar to somthing else"

2.Data

Structured data - excel, CSV , json files
Unstructured data - images, audio, videos etc

Static data - changes with time.

Streaming data - data changes with time.

3.Evaluation

Evaluation metric - How we'll ML algorithm predicts the future.

Different types of metrics:-

Classification - accuracy, precision, recall etc
Regression - MAE, MSE, RMSLE etc
Recommendations - precision at k

4.Features

We use these features to predict target.

Feature variables can be numerical, categorical or derived.

What features should you use?

The features should have all values filled or atleast 10% coverage i.e feature coverage - how many samples have different values?

5.Modelling

The modelling has 3 parts:-

Choosing and training a model
Tuning a model
Model comparison

Most important concept in ML

Is to divide the dataset into 3 sets before starting to train.

Splits are separate from each other.

Choosing a model

Broadly remember, if you're working with structured data (in case of problem 1) , use XGBoost, RandomForest, CatBoost and if unstructured data (in case of problem 2), use Deep Learning and transfer learning.

Chosen model is trained on the train dataset and the goal is to minimise time between experiments.

Things to remember

Some models work better than others on different problems.
Try things
Start small and add complexity as needed.

Tuning model

Models have many hyperparameters that can be adjusted or tuned.

Things to remember

A models first results aren't is last
Tuning can take place on training and or validation sets.

Model comparison

A model yields similar results on train, dev and test sets.

Overfitting and underfitting are both examples of model not being able to generalise well.

Data leakage of test data into train data leads to overfitting.

Overfitting and Underfitting

Overfitting leads to great performance on train data and poor generalization on test data. Underfitting leads to poor performance on both train and test data.

Fixes for overfitting and underfitting

Underfitting

Try a more advanced model
Increase model hyperparameters
Reduce amout of features
Train longer

Overfitting

Collect more data
Try a less advanced model.

Things to remember

Avoid overfitting and underfitting
Keep test sets separate at all costs
One best performance metric does not equal best model.
Ensure data your using during experimentation matches up with data you're using in production.

All experiments should be conducted on different portions of your data:

Training data - used for training the model. 70 - 80% of data is standard
Validation data - used for hyperparameter tuning and experimentation evaluation. 10-15% of data is standard
Testing data - used for final model testing and evaluation. 10-15% of your data is standard.

These amounts can fluctuate based on your problem.

6.Experimentation

Once the model is trained, we evaluate it then use another model as experiment to get better performance.

Tools we'll use

These are the tools we're going to use in each step.

Conclusion

This was a brief introduction to Machine Learning which I learnt during my ZTM ML course.

I hope you have gained some knowledge about ML through this blog. If you liked it, share it and also give a like. If you have any questions, ask it in the comments.

Next I'll cover the ML DS environment set up using Conda.

Machine Learning 101

What is Machine Learning?

Goal of ML :

Steps in a full ML project

Types of Machine Learning

What is Machine Learning?

Normal algorithm vs ML algorithm

Machine Learning and Data Science framework

1.Problem Definition

2.Data

3.Evaluation

4.Features

5.Modelling

Fixes for overfitting and underfitting

6.Experimentation

Tools we'll use

Conclusion

Comments

More from this blog

Creating a Conda environment

Command Palette

What is Machine Learning?

Goal of ML :

Steps in a full ML project

Types of Machine Learning

What is Machine Learning?

Normal algorithm vs ML algorithm

Machine Learning and Data Science framework

1.Problem Definition

2.Data

3.Evaluation

4.Features

5.Modelling

Fixes for overfitting and underfitting

6.Experimentation

Tools we'll use

Conclusion

Comments

More from this blog