Classification in Machine Learning

Basic Concepts of Classification in Machine Learning


Classifier: An algorithm to map the input data to a specific category

Classification Model: Predicts or draws a conclusion to the input data given for training. It will predict the class or category for the data

Feature: Individual measurable property of the phenomenon being observed


Types of Classification

  • Binary Classification
  • Multi-class Classification

 

Binary Classification: is a type of Classification with two outcomes. I.e a Binary outcome, true/false, 0/1 etc


Multi-class Classification: Classification with more then two classes. Each sample is only assigned to a single label or target. 


Multi-label classification: Each sample is assigned to a set of labels or targets.

Initialize: Used to design the classifier for classification.

Train: Used to fit the model for training the train label X and label Y

Predict: Predict the target for an unlabelled Y

Evaluate: Evaluate the model

 

Types of Learners

Lazy Learner: Stores the training data and wait until testing data is provided. It has more time to predict then eager learning. K-Nearest Neighbor (KNN) or K based reasoning.

Eager Learner: Constructs a classification model based on the given training data before getting data for predictions. For e.g Decision Tree Naive Bayes 

 

Classification Algorithms

 Classification is supervised learning which categories a set of data into classes. For e.g Handwriting detection, Speech Recognition, Face Recognition, Document Classification etc. 

  • Logistic Regression
  • Decision Tree
  • K-Nearest Neighbor
  • Support Vector Machine
  • Naive Byes
  • Stochastic Gradient Descent
  • Random Forest
  • Artificial Neural Network

Logistic Regression

It is a classification algorithm in machine learning that uses one or more independent variables (a variable that stands alone and isn't changed by the other variables you are trying to measure) to determine an outcome. The outcome is measure with a dichotomous variable (A dichotomous variable is one that takes on one of only two possible values when observed or measured) meaning that it will have only two possible outcomes. For e.g 1 or 0. The goal is to find a best fitting relationship between a dependent variable and a set of independent variables. It quantitatively explains the factors leading to the classification. Only works if the predicted variable is binary in nature. In Logistic Regression it is assumed that there is no missing data and the predictors are independent of each other. For e.g Predicting the risk factor for diseases, word classification, weather prediction, voting application etc.

Naive Bayes

It is a classification algorithm based on Bayes theorm which gives an assumption of independence among predictors. In simple terms Naive Bayes classifier assumes that the presence of a feature in a class is unrelated to the presence of any other feature. This model is easy to make and used for comparatively large datasets.

Pros

  • Requires a small amount of training data to estimate the necessary parameters to get the results. 
  • Extremely fast in nature compared to other classifiers

Cons

  • Bad estimator

Use cases

Disease detection, spam detection or sentiment analysis.

Stochastic Gradient Descent

It is an effective and simple approach to fit linear models. SGD is particularly useful when the sample data is in large numbers. It supports different loss functions and penalties for classification. It calculates the derivatives from each training data instance and calculate the update immediately. 

Pros

  • Ease of implementation
  • Efficiency

Cons

Requires a number of hyper parameters and is very sensitive to feature scaling. 

Use Cases

IOT, Upating the weights in Neural Networks and Linear Regression.

K-Nearest Neighbors

Lazy learning algorithm that stores all instances corresponding to training data in n-dimensional space. It is a lazy learning model as it works on storing instances of training data and not construction of a general internal model. Classification is done by a simple majority vote of K-Nearest neighbor at each point. It is supervised learning and takes a bunch of label points and uses them to label other points. To label a new point it looks at the closest label point (its nearest neighbor). It has the neighbor's vote so whichever label most of the labels have is a label for new point. The K is the number of nearest neighbors it checks. 

Pros

  • Simple in implementation
  • Robust noisy training data
  • Very efficient even if the training data is large

Cons

  • Computation cost is high. 
  • No need to determine the value of K

Use cases

Handwriting detection, image recognition, Stock Analysis etc.

Decision Tree

The decision tree algorithm builds the classification model in the form of a tree structure. it utilizes the if-then rules which are equally exhaustive and mutually exclusive in classification. It breaks down the data into smaller structures and eventually implementing it in a decision tree.

Pros

  • Simple to understand and Visualize
  • Requires very little data preparation

Cons

  • Can create complex trees that may not be efficient in categorization
  • Very unstable model, since a small change can render the whole classification incorrect

Use Cases

Pattern recognition, identify diseases

Random Forest Algorithm

Random decision tree or random forest are an ensemble learning method for classification, regression etc. It operates by constructing a multitude of decision trees at training time and outputs the class that is the mode of the classes or classification or mean prediction(regression) of the individual trees. I

Pros

More accurate then the decision tree

Cons

Complex in implementation and is slow in real time prediction

Use cases

industrial application (If a particular individual applying for loan is high risk or low risk) Mechanical failure of automotive parts, predicting social media share scores, performance scores etc. 

Artificial Neural Networks. 

A neural network consists of neurons that are arranged in layers they take some input vector and convert it into an output. The process involves each neuron taking input and applying a function which is often a non-linear function to it and then passes the output to the next layer. 

Pros

  • It has a high tolerance to noisy data and is able to classify untrained data
  • performs better with continuous inputs and outputs. 

Cons 

  • Poor interpretation
  • Handwriting data, colorization of images and captioning photos based on faces. 

Support Vector Machine 

The support Vector Machine is a classifier that represents the training data as points in space separated into categories by a gap as wide as possible. New points are then added to space by predicting which category they fall into and which space they will belong to. 

Pros

  • Uses a subset of training information making it memory efficient
  • Very effective in high dimensional spaces

Cons

Doesn't provide probability estimates

Use cases

Business application for comparing the performance of a stock over a period of time. Investment suggestions. 

 

Some Machine Learning Concepts

Heuristic: A heuristic is a guiding principle, or best guess, at how to solve a problem. Each attempt made is considered a candidate solution. Sometimes those solutions work, and sometimes they fail


Running Drupal in Docker

I will assume that you have already installed docker. If you haven't installed docker please visit https://www.docker.com/ to download a...