Code Chronicle: 2019

Confusion Matrix Clearly Explained for Beginners

Confusion Matrix is a table which is used to describe the performance of a classification model on a set of data whose true values are known.

Consider the following Table

Predicted	Actual
		Option 1	Option 2
	Option 1	TRUE POSITIVE	FALSE POSITIVE
	Option 2	FALSE NEGATIVE	TRUE NEGATIVE

The Rows in a confusion matrix corresponds to what the machine learning algorithm predicted
The columns correspond to the known results or the Actual Value

To illustrate consider the following Dataset.

Chest Pain	Blood Circulation	Blocked Arteries	Heart Problems
NO	NO	NO	NO
NO	YES	YES	YES
YES	YES	NO	NO
---	----	----	-----

So suppose based upon the dataset above Our ML Algorithm returns the results that we placed in the confusion matrix.

Predicted	Actual
		Heart Problems	No Heart Problems
	Heart problems	250	20
	No Heart Problems	40	500

The above table or confusion matrix shows us the following

The Algorithm correctly predicted that 250 people had heart diseases
The Algorithm correctly predicted that 500 people didn’t have a heart disease
The Algorithm incorrectly predicted that 20 people had heart disease
The Algorithm incorrectly predicted that 40 People had no heart disease

Confusion Matrix can be a very important tool to compare the results of different algorithms.

Summarizing Distributions using Statistics

Summary Statistics are used to summarize information regarding a sample.

Important tools in Summary Statistics are

Mean
Variance
Effect Size (https://en.wikipedia. org/wiki/Effect_size)

A histogram is a complete description of the distribution of a sample, by a histogram a complete reconstruction of the values in a sample can be reconstructed.

Summarizing a distribution is important and descriptive statistics is used to provide a summary of a sample.

Some important characteristics are

Central Tendency: Are the values around a central point, mean, mode or median.
Modes: Is there more than one cluster. ( A modal value, is calculated by counting the number of occurrence of a value.)
Spread of Data: How much variability is in the data. The variability in the data can be calculated by range, quartiles, variance, absolute deviation and standard deviation.
Tails: How quickly do the probabilities drop off as we move away from the modes ?
Outliers: Extreme Values away from the modes, sometimes the result of Errors but other times the result of unusual data.

Simple Linear Regression Example in Python Machine Learning

This script is a simple demonstration of machine learning in python. This uses Linear Regression to predict pizza prices vs diameter
The LinearRegression class is an estimator. Estimators predict a value based onobserved data. In scikit-learn, all estimators implement the fit methods (used to learn the model) and predict (used to predict the value of a response variable).

Simple linear regression assumes that a linear relationship exists between the responsevariable and the explanatory variable; it models this relationship with a linear surface calleda hyperplane. A hyperplane is a subspace that has one dimension less than the ambientspace that contains it. In simple linear regression, there is one dimension for the responsevariable and another dimension for the explanatory variable, for a total of two dimensions.The regression hyperplane thus has one dimension; a hyperplane with one dimension is aline.

Linear Regression Equation

Simple NumPy Array Tutorial

Simple introduction of Numpy

Numpy is the most powerful Python package for working with data.

Knowledge of Numpy is a must for Data analytics, machine learning. Numpy is a core library for scientific computing in Python. Its tools are used to solve computing problem (specifically mathematical models) of Science and Enginering.

The most important aspect of Numpy is its n-dimensional array having significant advantage over Python Lists

More compact
Faster access in reading and writing items
More convenient
More efficient.

1. Create a Numpy Array

There are multiple ways of creating a Numpy Array

array()
ones()
zeros()
logspace()
linspace()
arange()

1.1 Creating from a Python List

# Create a one dimensional array from a list
import numpy as np
lst = [0,1,2,3,4]  #Create a List
np_arr = np.array(lst)  #Convert list to np array

1.2. Create a Two Dimensional Array(Matrix )

lst2 = [[0,1,2], [3,4,5], [6,7,8]]
numpy_2darr = np.array(lst2)

1.3. Create a Three Dimensional Array

from numpy import zeros
np.zeros((2,3,2))

1.4. Create an array using Array function

from numpy import *
arr=array([1,2,3,4,5],int)

1.5 Create an array using linspace function

numpy.linspace(start, stop, num = 50, endpoint = True, retstep = False, dtype = None) : Returns number spaces evenly w.r.t interval.

from numpy import *
arr=linspace(0,15,5)

1.6 Create using arange function

This is not "arrange" with double r. Its more like A Range.

from numpy import *
arr=arange(0,15,5)
arr
#will print array(0,5,10)

1.7 Create using logspace function

logspace creates array with log values. The first parameter specifies the starting point, the second ending point and the third the number of steps to reach the ending point

>>> arr=logspace(10,20,3)
>>> arr
array([1.e+10, 1.e+15, 1.e+20])

1.8 Create using one and zeros function

>>> arr=ones(10)
>>> arr
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
>>> arr=zeros(10)
>>> arr
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

3. Provide a datatype for the Array

numpy_arr_2d = np.array(lst2, dtype='float')

Code Chronicle