🖍️
gitbook_docs
  • Introduction
  • Machine Learning
    • Recommended Courses
      • For Undergrad Research
      • Math for Machine Learning
    • ML Notes
      • Covariance Correlation
      • Feature Selection
      • Linear Regression
      • Entropy, Cross-Entropy, KL Divergence
      • Bayesian Classifier
        • Terminology Review
        • Bayesian Classifier for Normally Distributed classes
      • Linear Discriminant Analysis
      • Logistic Regression
        • Logistic Regression Math
      • Logistic Regression-MaximumLikelihood
      • SVM
        • SVM concept
        • SVM math
      • Cross Validation
      • Parameter, Density Estimation
        • MAP, MLE
        • Gaussian Mixture Model
      • E-M
      • Density Estimation(non-parametric)
      • Unsupervised Learning
      • Clustering
      • kNN
      • WaveletTransform
      • Decision Tree
    • Probability and Statistics for Machine Learning
      • Introduction
      • Basics of Data Analysis
      • Probability for Discrete Random Variable
      • Poisson Distribution
      • Chi-Square Distribution
      • P-value and Statistical Hypothesis
      • Power and Sample Size
      • Hypothesis Test Old
      • Hypothesis Test
      • Multi Armed Bandit
      • Bayesian Inference
      • Bayesian Updating with Continuous Priors
      • Discrete Distribution
      • Comparison of Bayesian and frequentist inference
      • Confidence Intervals for Normal Data
      • Frequenist Methods
      • Null Hypothesis Significance Testing
      • Confidence Intervals: Three Views
      • Confidence Intervals for the Mean of Non-normal Data
      • Probabilistic Prediction
  • Industrial AI
    • PHM Dataset
    • BearingFault_Journal
      • Support Vector Machine based
      • Autoregressive(AR) model based
      • Envelope Extraction based
      • Wavelet Decomposition based
      • Prediction of RUL with Deep Convolution Nueral Network
      • Prediction of RUL with Information Entropy
      • Feature Model and Feature Selection
    • TempCore Journal
      • Machine learning of mechanical properties of steels
      • Online prediction of mechanical properties of hot rolled steel plate using machine learning
      • Prediction and Analysis of Tensile Properties of Austenitic Stainless Steel Using Artificial Neural
      • Tempcore, new process for the production of high quality reinforcing
      • TEMPCORE, the most convenient process to produce low cost high strength rebars from 8 to 75 mm
      • Experimental investigation and simulation of structure and tensile properties of Tempcore treated re
    • Notes
  • LiDAR
    • Processing of Point Cloud
    • Intro. 3D Object Detection
    • PointNet
    • PointNet++
    • Frustrum-PointNet
    • VoxelNet
    • Point RCNN
    • PointPillars
    • LaserNet
  • Simulator
    • Simulator List
    • CARLA
    • Airsim
      • Setup
      • Tutorial
        • T#1
        • T#2
        • T#3: Opencv CPP
        • T#4: Opencv Py
        • Untitled
        • T#5: End2End Driving
  • Resources
    • Useful Resources
    • Github
    • Jekyll
  • Reinforcement Learning
    • RL Overview
      • RL Bootcamp
      • MIT Deep RL
    • Textbook
    • Basics
    • Continuous Space RL
  • Unsupervised Learning
    • Introduction
  • Unclassified
    • Ethics
    • Conference Guideline
  • FPGA
    • Untitled
  • Numerical Method
    • NM API reference
Powered by GitBook
On this page
  • Binomial Distribution
  • Concept
  • Example 1:
  • Poisson Distribution
  • Definition:
  • Intuition
  • Math
  • Example 1:
  • ML example:
  • Exponential Distribution
  • Definition
  • Intuition
  • ML example:
  • Sum of Exponential Random Variables
  • Example

Was this helpful?

  1. Machine Learning
  2. Probability and Statistics for Machine Learning

Probability for Discrete Random Variable

PreviousBasics of Data AnalysisNextPoisson Distribution

Last updated 3 years ago

Was this helpful?

Binomial Distribution

Concept

The binomial distribution is used to get the probability of the number of successful events.

  • Binomial probability: the probability of success (p) or failure (1-p)

  • A binomial random variable is the number of successes x in n repeated trials.

The required information are the success probability p and number of trial n

Example 1:

Using the binomial PMF, what is the probability that 20 ppl will clap next week?

We need the success probability p and number of trial n

From the past statistics,

  • number of trial n= 1134 visits/ week

  • success # = 17 claps per week

    • rate or expected value of x **** is 17 claps/week.

  • success p= 17 claps/1134visits /week = 1.5% for 1 week

Then, what is the probability of x=20 claps in next week? Using binomial PMF, Binomial P(X=x) is 0.0692

Poisson Distribution

Definition:

Predicts the probability of a given number of events occuring in a fixed interval of time with the avg. number of events in that interval is lambda (rate parameter)

Read this article for the full text:

Intuition

What is Possison for? What can Possion do which Binomial can't?

Binomial Shortcoming

Binomial random variable is "Binary" 0 or 1 and it cannot have multiple events in the unit of time.

From the previous example, 17ppl/week means 17/(7*24)=0.1 ppl per hour.

Then, in one week, there can be multiple events(S, F) for different hours.

This multiple event issue can be solved by dividing the time unit into more smaller units, from week to hour, to have one event at a time.

  • For each minute, 0.1/60 [ppl/min] is the rate.

Then, it allows multiple events in one hour for the time unit is now in minutes.

We can continue to make the time unit more smaller, from hour to min to sec and so on.

Concept:

We can make the Binomial random variable handle multiple events by dividing a unit time into smaller units. or making n--> large number

If we make the time unit to be infinitesimal, we no longer have to worry about more than one event within the same unit time.

If the expected rate is a fixed value, (i.e. n*p = constant) , then when we increase n--> inf, the probability p--> 0

Math

As n-->∞ for k is given, and the expected rate ($$\lambda$$) is a fixed value, (i.e. n*p = constant)

Example 1:

From the previous example, the lamda was 17 ppl/week

Unlike the Binomial, Possion distribution does not require to use the value of n and p _****_Poisson is usually used for rare events (n is a large number), but not always.

As lambda becomes bigger, the graph looks more like a normal distribution. It assumed the lambda is a constant value but in real application, it may not be.

Also, it assumed the events are independent, but it may not be in real application.

Poisson distribution is discrete.

  • For continuous distribution, use Exponential distribution

ML example:

(Under construction)

Exponential Distribution

Definition

Exponential distribution is the probability distribution of the time between the events in a Poisson process. λ * e^(−λt).

It is continuous distribution, unlike Possion that is discrete.

Intuition

It predicts the amount of waiting time until the next event occurs.

  • Example: the time until the OS fails again.

What does X~Exp(0.25) means? 0.25 events?

  • 0.25 is not time duration. it is an event rate

  • X~Exp(lamda), lamda = Possion parameter rate

    Example: lamda=17claps/week, is a rate of the unit time of 1 week.

In terms of the unit of time of the event, time = 1/lamda

  • this is the decay paramter or rate.

  • 17claps/week --> (1/17) week per clap.

rate=0.25 means 0.25 events in the time unit(e.g. hours) 4 time unit(e.g. hours) until the event occurs

Understanding λ * e^(−λt)

We want to find the time between the events in Poisson process. The waiting period until the next event occurs means there is NO single event has happened

  • Possion (X=0)

If you want to model the probability distribution of “nothing happens during the time duration** _t**_,” not just during one unit time

P(Nothing happens during t time units)
= P(X=0 in the first time unit) 
  * P(X=0 in the second time unit) 
  * … * P (X=0 in the t-th time unit) 
= e^−λ * e^−λ * … * e^−λ = e^(-λt)

The Poisson distribution assumes that events occur independent of one another.

Therefore, we can calculate the probability of zero success during t units of time by multiplying P(X=0 in a single unit of time) t times.

P(T > t) = P(X=0 in t time units) = e^−λt
* T : the random variable of our interest!
      the random variable for the waiting time until the first event
* X : the # of events in the future which follows the Poisson dist.
* P(T > t) : The probability that the waiting time until the first event is greater than t time units
* P(X = 0 in t time units) : The probability of zero successes in t time units

A PDF is the derivative of the CDF. Since we already have the CDF, 1 - P(T > t), of exponential, we can get its PDF by differentiating it.

ML example:

(Under construction)

Sum of Exponential Random Variables

Read this article for simple and clear explanations

X1 and X2 are independent exponential random variables with the rate λ.

X1~EXP(λ) X2~EXP(λ)

Let Y=X1+X2.

Question : What is the PDF of Y? Where do we use the distribution of Y?

An Erlang distribution is then used to answer the question:

“How long do I have to wait before I see n success events occurs?”

The answer is a sum of independent exponentially distributed random variables, which is an Erlang(n, λ) distribution. The Erlang distribution is a special case of the Gamma distribution. The difference between Erlang and Gamma is that in a Gamma distribution, n can be a non-integer

  • Erlang (2, λ) distribution

Example

[Queuing Theory] You went to Chipotle and joined a line with two people ahead of you. One is being served and the other is waiting. Their service times S1 and S2 are independent, exponential random variables with mean of 2 minutes. (Thus the mean service rate is .5/minute.

Your conditional time in the queue is T = S1 + S2, given the system state N = 2. T is Erlang distributed.

What is the probability that you wait more than 5 minutes in the queue?

Let’s plug λ = 0.5 into the CDF that we have already derived.

A less-than-30% chance that I’ll wait for more than 5 minutes at Chipotle sounds good to me

Probability density funtion (pdf) of exponential distribution

In the with rate λ, X1+X2 would represent the time at which the 2nd event happens (addition of time x1 and x2).

example from medium@AerinKim
Poisson Process
Sum of Exponential Random VariablesMedium
Poisson Distribution Intuition (and derivation)Medium
medium @AerinKim
medium @AerinKim
medium @aerinKim
medium @aerinKim
Logo
Logo