Skip to main content

Command Palette

Search for a command to run...

Machine Learning 101

A beginner's guide to Machine Learning

Updated
โ€ข6 min read
Machine Learning 101
S

I'm an aspiring machine learning engineer interested in machine learning and building ML projects.

This is Section (2&3) of the ZTM course.

Hello ๐Ÿ‘‹, I'm Shuraim and I'll be writing blog posts covering each section from the ZTM Complete AI, ML and DS Bootcamp.

I learned a lot of things from this section, which are as follows.

What is Machine Learning?

Goal of ML :

Is to make machines to act more and more like humans.

AI vs ML vs DL vs DS

AI - giving machines human-like intelligence

ML - ability of machine to perform task without explicitily programmed

DL - one of the techniques ts implement Al

ML overlaps with DS.

Steps in a full ML project

This framework will be explained later in this blog.

Types of Machine Learning

All these methods learn from the data it receives and predict something.

What is Machine Learning?

There are many definitions of ML because it contains different aspects.

In a single sentence,

"Machine Learning is using an algorithm/computer program to learn about different patterns in data and then use that algorithm and what its learnt to make predictions about future using similar data."

Normal algorithm vs ML algorithm

The main difference between these two is how these learn.

Normal algorithms - Start with inputs and set of instructions to get our output.

Machine Learning algorithms - Instead of starting with an input and set of instructions, we start with an input and an ideal output.

It looks at inputs and outputs and tries to figure out the instructions between these two.

ML models find patterns collected in data so we can use those patterns for future problems.

Machine Learning and Data Science framework

There are 3 parts in ML framework:-

  1. Data collection

  2. Data modelling

  3. Deployment

Data modelling further has 6 stages /steps :-

  1. Problem Definition - Figuring out what problem we're trying to solve.

  2. Data - What kind of data do we have?

  3. Evaluation - What defines success for us? (meaning when is a model good)

  4. Features - What do we already know about the data?

  5. Modelling - Based on our problem and data, what model should we use?

  6. Experimentation - How can we improve the model / what can we try next?

These steps need not be followed in order and it is just a rough guide.

These were the questions we need to answer for each step, and a more detailed explanation of each step is provided below

1.Problem Definition

"What problem are we trying to solve?"

When you shouldn't use ML?

When a simple hand coded instruction based system works, then use it. Don't use ML.

Ex:- When you have all ingredients and exact steps to make a chicken dish, them don't use ML.

Main types ML

  1. Supervised Learning - has labelled outputs.

  2. Unsupervised Learning - has data without labels. This finds patterns and useful insights from data.

  3. Transfer Learning - Leverages what one ML model has learnt in another ML model.

Ex :- You can take a model that is trained on car images which also includes trees, grass so on in the background. This model has singer idea of how great, trees etc look like and apply it to dog breed example.

  1. Reinforcement Learning - training model to play chess. Reward-penalty model.

How do you match your problem?

Supervised Learning - "I know my inputs and outputs"

Unsupervised Learning - "I'm not sure of outputs but I have inputs"

Transfer Learning - "I think my problem may be similar to somthing else"

2.Data

  • Structured data - excel, CSV , json files

  • Unstructured data - images, audio, videos etc

Static data - changes with time.

Streaming data - data changes with time.

3.Evaluation

Evaluation metric - How we'll ML algorithm predicts the future.

Different types of metrics:-

  • Classification - accuracy, precision, recall etc

  • Regression - MAE, MSE, RMSLE etc

  • Recommendations - precision at k

4.Features

We use these features to predict target.

Feature variables can be numerical, categorical or derived.

What features should you use?

The features should have all values filled or atleast 10% coverage i.e feature coverage - how many samples have different values?

5.Modelling

The modelling has 3 parts:-

  1. Choosing and training a model

  2. Tuning a model

  3. Model comparison

Most important concept in ML

Is to divide the dataset into 3 sets before starting to train.

Splits are separate from each other.

Choosing a model

Broadly remember, if you're working with structured data (in case of problem 1) , use XGBoost, RandomForest, CatBoost and if unstructured data (in case of problem 2), use Deep Learning and transfer learning.

Chosen model is trained on the train dataset and the goal is to minimise time between experiments.

Things to remember

  • Some models work better than others on different problems.

  • Try things

  • Start small and add complexity as needed.

Tuning model

Models have many hyperparameters that can be adjusted or tuned.

Things to remember

  • A models first results aren't is last

  • Tuning can take place on training and or validation sets.

Model comparison

A model yields similar results on train, dev and test sets.

Overfitting and underfitting are both examples of model not being able to generalise well.

Data leakage of test data into train data leads to overfitting.

Overfitting and Underfitting

Overfitting leads to great performance on train data and poor generalization on test data. Underfitting leads to poor performance on both train and test data.

Fixes for overfitting and underfitting

Underfitting

  • Try a more advanced model

  • Increase model hyperparameters

  • Reduce amout of features

  • Train longer

Overfitting

  • Collect more data

  • Try a less advanced model.

Things to remember

  • Avoid overfitting and underfitting

  • Keep test sets separate at all costs

  • One best performance metric does not equal best model.

  • Ensure data your using during experimentation matches up with data you're using in production.

All experiments should be conducted on different portions of your data:

  • Training data - used for training the model. 70 - 80% of data is standard

  • Validation data - used for hyperparameter tuning and experimentation evaluation. 10-15% of data is standard

  • Testing data - used for final model testing and evaluation. 10-15% of your data is standard.

These amounts can fluctuate based on your problem.

6.Experimentation

Once the model is trained, we evaluate it then use another model as experiment to get better performance.

Tools we'll use

These are the tools we're going to use in each step.

Conclusion

This was a brief introduction to Machine Learning which I learnt during my ZTM ML course.

I hope you have gained some knowledge about ML through this blog. If you liked it, share it and also give a like. If you have any questions, ask it in the comments.

Next I'll cover the ML DS environment set up using Conda.