Soohyun’s Machine-learning

Base 본문

Lectures/Machine Learning Basic

Base

Alex_Rose 2018. 3. 5. 12:58

Types of Machine Learning 

 

- Supervised learning   - with labels

- Unsupervised learning - no labels

- Reinforcement learning - know the objective, don't know how to achieve

 

 


 

Thumbtack Question  

- H : probability of Head up

- T : probability of Tail up

 

 

Binomial distribution (Bernoulli experiment) is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, and each success has the probability of theta, θ

 

Flips are i.i.d (Independent and identically distributed random variables, i.i.d condition)

      ㄴ Independent events   / Identically distributed according to binomial distribution

 

 

P(H) = θ / P(T) = 1-θ

P(HHTHT) = θθ (1-θ) θ (1-θ) = θ^3 (1-θ)^2

 



 

 

 

n and p are given as parameters, and the value is calculated by varying k

 




 

 

 


 

Maximum Likelihood Estimation

 

 

Data : we have observed the sequence data of D with a_H and a_T

our hypothesis : The gambling result of thumbtack follows the binomial distribution of θ

 

How to make our hypothesis strong? Finding out the best candidate of θ. What's the condition to make θ most plausible?

 

 

One candidate is the Maximum Likelihood Estimation (MLE) of θ

 




 

 


 

MLE Calculation

 
Using log function 

 

then, this is the maximization problem, so you use a derivative that is set to zero

 

 

 

wikipedia : logarithm

 

 

 


 

Simple Error Bound

 

 

Let's say theta star(

) is the true parameter of the thumbtack flipping for any error, 

 

 

We have a simple upper bound on the probability provided by Hoeffding's inequality : 

 

 

Can you calculate the required number of trials, N?  To obtain epsilon = 0.1 with 0.01% case  (Probably Approximate Correct, PAC) 

                                                                                  ---------------       -------------

                                                                                      probably          approximate

 

 

 

여기까지가 MLE 관점에서의 approximation

 


Incorporating Prior Knowledge

 

 

 

More formula from Bayes viewpoint

 

 

Why not use the Beta distribution?

 

 

 

 



출처 : other references 1)
 









 

 


 

Maximum a Posteriori Estimation

 

 

 

 


 

Probability

 

 

 


 

Conditional Probability

 

The conditional probability of A given B

 

 









Nice to see that we can switch the condition and the target event.







Nice to see that we can recover the target event by adding the whole conditional probs and priors 

 

 


 

Probability Distribution

 

- it assigns a probability to a subset of the potential events of a random trial, experiment, survey, etc.

 

 

A function mapping an event to a probability :: because we call it a probability, the probability should keep its own characteristics (or axioms)

 

 

 

 

 


 

Normal Distribution

 

 

 


 

Beta Distribution

 

Supports a closed interval

- Continuous numerical value

- [0,1]

- Very nice characteristic :: Matches up the characteristics of probs

 

 

 

 

 

 


 

Binomial Distribution

 

Simplest distribution for discrete values

- Bernoulli trial, yes or no / 0 or 1 / selection, switch...

 

 

 

 

 

 

 


 

Multinomial Distribution

 

The generalization of the binomial distribution

- Choose A,B,C,D... Z  / Word selection, cluster selection...

 

 

 

 

 

 

 


Basic References:

1) kooc.kaist.ac.kr

2) Bishop - Pattern Recognition and Machine Learning

3) http://norman3.github.io/prml/

 

 

Other References:

1) https://datascienceschool.net/view-notebook/70a372b9c14a4e8d9d49737f0b5a3c97/

'Lectures > Machine Learning Basic' 카테고리의 다른 글

Kernel Idea  (0) 2017.11.10
cost function and Gradient update rule  (0) 2017.10.21
Comments