🖍️
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • Reference
  • Youtube Lecture
  • Margin
  • Example 1: Without outlier, 1D data
  • Example 2: With outliers, 1D data
  • Soft Margin
  • Soft Margin Classifier (Support Vector Classifier)
  • 1D, 2D, 3D Support Vector Classifier
  • Support Vector Machine
  • Support Vector Machine Concept
  • Radial Basis Function (RBF)
  • Kernel Trick
  • Polynomial Kernel
  • Example: r=0.5, d=2
  • Example: r=1, d=2
  • Radial kernel
  • Example: close observations
  • a=2.5, b=4 which are two observations that are close to each other.
  • Example: relative far observations
  • Intuition

Was this helpful?

  1. Machine Learning
  2. ML Notes
  3. SVM

SVM concept

PreviousSVMNextSVM math

Last updated 3 years ago

Was this helpful?

Reference

Youtube Lecture

Margin

For classification, the shortest distance between the observation and the threshold

Example 1: Without outlier, 1D data

Maximum Marginal Classifier

It is using the threshold to give the largest margin to make classification.

If the margin is halfway between the two end-point observations of red, green, then the margin is the larget.

If the threshold is moved to left or right, the margin becomes smaller.

Example 2: With outliers, 1D data

If there is an outlier and if misclassifcation of outlier is NOT allowed. then, the Maximum Margin Classifier would NOT classify new observation properly.

Maximum Margin Classifier is very sensitive to outliers in the training data

  • Solution: Allow misclassification

If the threshold is halfway, it allows misclassification. But it can classify new observation more sensibly.

New observation will be classify as obese, and makes sense for it is more closer to most obese observations.

Soft Margin

Distance between the observations and threshold, with misclassification.

How to choose a better Soft margin? use cross validation that counts number of misclassification/correct classification to allow in the Soft margin.

Soft Margin Classifier (Support Vector Classifier)

Using soft margins to find the best threshold to classify.

Support Vectors: Observations on the edge and within the soft margin


1D, 2D, 3D Support Vector Classifier

The data within the soft margin is misclassified in this example. Use Cross Validation to determine that allowing this misclassification results in better classification

For 3D, a plane is used for Support Vector Classifier. For 4D or higher, SVC is a hyperplane.

Support Vector Machine

SVC may not work for certain data without changing the dimensions. Example:

Support Vector Machine can solve this limitation as shown in the following example.

Lets transform the data 1D to 2D ( y=Dosage^2 )

Then, it can classify new observation as

Support Vector Machine Concept

(1) With Low Dimension Data, it may not divide nicely. Then, (2) Move the data into a higher dimension by Kernel function. (3) Find a SVC that separates the higher dimension data

Question: Which functions to use to make the data into a higher dimension? x^2? x^3?

Use Kernel Functions to systematically find SVC.

Commonly used Kernels

  • Polynomial kernel with degree d (d= (Point) d=2 (line), d=3(plane) etc)

  • Radial Basis Function (RBF)

Example. Polynomial kernel with d=3

Radial Basis Function (RBF)

Under construction

Kernel Trick

Kernel functions only calculate the relationships between every pair of points as if they are in the higher dimensions. They actually do not do transformation. This trick is known as Kernel Trick.

The Kernel Trick reduces the computation required for SVM by avoiding the math that transforms the data from low to high dimension.

Polynomial Kernel

Polynomial kernel computes relationship between pairs of observation(a, b). The parameter r, d are determined by cross**-**validation.

  • a,b are the two observations we want to calculate the high dimensional relationship for

  • r determines the polynomial coefficient

  • d is degree of polynomial

Kernel Trick은 새로운 고차원 공간 H 로 직접 매핑하지 않고, Kernel 함수를 이용하여 H 공간의 내적( dot product)를 얻는 방법임.

H 공간에서 연산이 내적으로 표현되어야 하는 조건이 있음

Example: r=0.5, d=2

Example: r=1, d=2

Using Polynomial kernel, 1D observation a or b is converted to 2D (ignoring z-axis coordinate).

Applying Polynomial kernel is also equivalent to Dot Product between each pair of points.

For observation a=9, b=14, the dot product results will give a value of the high-dimension relationship, without actually transforming into a high dimension.

Radial kernel

Another popular kernel is Radial Basis Kernel.

RBF finds SVC in infinite dimension, it is hard to visualize the process.

It behaves like a Weighted Nearest Neighbor model. Near observations has more influence than further away observations

The square function of the difference between a and b with a scale of gamma.

Example: close observations

a=2.5, b=4 which are two observations that are close to each other.

Example: relative far observations

a=2.5, b=16

A number very close zero if the observation points are far from each other.


We get the high-dimension relationship by plug in the values into the Radial Kernel


Intuition

Lets start from Polynomial kernel with r=0.


It shifts the data on the original axis(in the original dimension), WITHOUT making it in a higher dimension.

For example, when r=0, d=2, (ab+0)^2=(ab)^2.


We can use Polynomial Kernel with r=0 to explain Radial Function.

Lets add polynomial kernels with r=0, but with d increasing with each term.

For adding two terms with d=1 and d=2:

The terms can be kept adding, with r=0 and d=1 to INF dimension.

Then, it gives a polynomial with infinite number of dimensions.

Lets go back to Radial Function.

The term e^(ab) can be written in Taylor Series that is in terms of Polynomial Kernel with r=0 and d=0 to INF.

The dot product of e^{ab} is

Thus, the Radial Kernel becomes


The Radial Kernel is equal to a Dot Product that has coordinates for an infinite number of dimension.

Thus, the value plugged in the Radial Kernel is the relationship between the two observations in infinite -dimension.

An example: (ab+r)d(ab+r)^d(ab+r)d

e−γ(a−b)2e^{- \gamma (a-b)^2}e−γ(a−b)2

How the Radial Kernel determines how much influence each observation in the Training Dataset on classifying new observation? e−γ(a−b)2e^{- \gamma (a-b)^2}e−γ(a−b)2

If r=0, d=d, (ab+r)d=(ab)d=(ad)⋅(bd)(ab+r)^d=(ab)^d=(a^d) \cdot (b^d)(ab+r)d=(ab)d=(ad)⋅(bd)