CLOSE

   Machine Learning  Artificial Intelligence  ML/AI  Regression Problem  Supervised Learning  
   Technology    Machine Learning

Introduction to Machine Learning Concepts

         
 MARCH 22, 2018   by SmritiS

Is Machine Learning indeed teaching the machines? Well to get the result, we should first get a basic idea of what machine learning is and other important concepts related to it. So in this article, which is the first article of our Machine Learning series, we will be discussing the basics of machine learning.

Can we teach machine through machine learning

Basics of Machine Learning

Before starting to actually implement machine learning algorithms, it is essential to know the terminologies, the concepts and the soul of machine learning. Read on through this and it will make you familiar with many things. There are real-life examples which will make it easier for you to remember stuff and relate to them.




Pre-requisites for ML:

Knowledge of basic computer science principles and ability to write reasonably simple non-trivial computer codes, familiarity with basic probability theory and linear algebra.

Think about the following:

  1. How does your email system distinguish between spam emails and important emails?
  2. How does google fetch you the exact page(s) you are searching for, from googols (hundreds of hundreds) of pages, when you search for it?
  3. How does the auto-complete in Google work?

This is all because of Machine Learning. You can find various definitions of machine learning, but in simple words, it is the art of making computers learn things, without explicitly programming them. Computers, basically learn by observing or by getting trained. There are various algorithms to train computers, the prominent ones being supervised learning and unsupervised learning. There is also semi-supervised learning and we will give a brief introduction to this in the further sections.

introduction to machine learning




Supervised learning

Supervised learning is further classified into regression problems and classification problems.

This is the most common kind of machine learning which is used in many places.

Consider the following situation: You recently purchased a new car, but you are unhappy with it and wish to sell it off. This will be considered as a second-hand vehicle. How do you sell it? You ask experts for their advice, you consult your peers, the internet and much more.

Surprisingly, machine learning can help you with this. It can help you set a selling price for your car! How? Here it goes. You plot a data set, wherein, on the x-axis (the horizontal axis), there are names of cars which are similar to yours in terms of size, mileage, color, number of months used etc. while on the y-axis (the vertical axis), you have the price of these cars in lakhs.

How do you figure out how much your car will sell for from the data set provided? A machine learning algorithm will help you, it might fit a straight line or a quadratic function to the data set and then predict a value for your car. This is called Supervised Learning.


Regression Problem:

We are giving the algorithm a data set and the right answers to the data set. To be more specific, this is a Regression Problem in supervised learning. Regression basically means trying to predict a continuous value as an output, which here is the “price of the car”. Prices may not always be continuous values, they may be discrete, but it can always be rounded off to the nearest hundreds or thousands.

For example, Rs 5,95,000 can be rounded off to its nearest thousand as Rs 6,00,000.


Classification Problem:

It is a type of supervised machine learning problem that predicts discrete values as output based on the data set provided. For example, predicting that it might or might not rain today, based on the weather data. There can be more than 2 discrete values in a classification problem, like, it may rain, it may be windy today, it may be hot, it may be cold, it may be stormy, etc.

Classification is a supervised learning technique, in which the given data set is used to learn and then the learnings are used to get to new observations. Some common examples of classification problem are speech recognition, handwriting recognition, document classification etc.




Unsupervised Learning

Unsupervised learning is further classified into clustering problems and association problems.

In simple words, unsupervised learning is when no labels or information is provided with respect to the data set that has been provided. The main reason for not labeling data is because unlabelled data is easily available, cheap and easy to store whereas it is expensive and time-consuming to label data and sometimes requires expertise to access this kind of data.

It is we, who have to figure out some structure within the data and classify it into a different cluster. So, an algorithm called “clustering algorithm”, which breaks up the data into separate clusters (or modules). Visit news.google.com and you will find that news stories that have similar topic are clustered into cohesive stories. Therefore, news about the same topic gets displayed together. Other applications of clustering and unsupervised machine learning include the following :

Organizing large computer clusters:

Finding out which machine work together better so as to improve the efficiency and performance of those systems.

Analysis of social network:

Showing a list of friends that you might know, showing people who may have common friend circles, all this is automatically identified using these unsupervised machine learning algorithms.

Astronomical data analysis:

Clustering algorithms give useful theories on how the galaxies are formed.

Association problems:

An unsupervised machine learning problem, where we need to figure out the relationship between large portions of data, like resolving dependencies such as X depends on Y (here x and y are arbitrary variables).




Semi-supervised learning

Semi-supervised learning algorithms involve problems where a huge data set is present and only some of the data is labeled and a major part of the data is not labeled.

These problems come in between supervised and unsupervised learning, hence the name. Many real-world machine learning problems fall into this category of machine learning.


SHARE YOUR THOUGHTS WITH US!