Advertisement
Guest User

Untitled

a guest
Aug 21st, 2019
228
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.53 KB | None | 0 0
  1. 1. General Form of EM
  2. 1. Concave functions
  3. 2. Satisfy Jensen’s equality f(Et)>=Ef(t)
  4. 3. kullback-leibler divergence: measure differnce of two probabilistic distributions
  5. 1. KL divergence
  6. 2. how different is each data point at any point of x-axis in the log scale, take expectation
  7. 3. non symmetric
  8. 4. = 0 if compare to self
  9. 5. always non-negative
  10. 4. EM
  11. 1. Build a lower bound at the local likelihood, depending on the theta to maximize
  12. 2. Optimize the lower bound until convergence
  13. 1. maximize the lower bound with respect to q (E step)
  14. 2. fix q and maximize the lower bound with respect to theta (M step)
  15. 3. E step
  16. 1. The gap between the log likelihood and the lower bound equals to the sum of KL divergences within the distribution Q
  17. 4. M Step
  18. 1. maximize Elogp(X,T|theta) + const
  19. 5. Convergence guarantee to at least local maximum or saddle point
  20. 1. EM algorithm never make things worse
  21. 2. On each iteration EM doesn’t decrease the objective
  22. 5. Example for discrete mixture
  23. 6. Summary
  24. 1. Method for training latent variable models: more efficient than some general purpose optimization algorithm like stochastic gradient descent
  25. 2. Handle missing data
  26. 3. Replace the hard problem (maximize margin log likelihood) to sequence of simple task (maximize lower bound)
  27. 4. Guaranties to converge (only local maximum)
  28. 5. Helps complicated parameter constraints
  29. 6. Some extensions
  30. 2. Applications of EM
  31. 1. E step: q -> posterior distribution of latent variable t
  32. 2. M step: maximize expectation of joint log likelihood while fixing q
  33. 3. K-means
  34. 1. Randomly initialize parameters theta
  35. 2. Until convergence repeat: for each point compute closest centroid, update centroids
  36. 3. K-means is the special case of EM
  37. 1. where q belongs to Q (a set of delta functions, 0 or 1)
  38. 2. with fixed covariance matrices (uniform circle with certain width)
  39. 4. k-means is faster but less flexible than GMM
  40. 4. Probabilistic PCA
  41. 1. Dimensional reduction
  42. 1. two variables are so correlated
  43. 2. project two dimension to 1 dimension to keep as much data as possible
  44. 2. PCA tries to find the best linear transformation to project 2d to 1d
  45. 3. In presence of missing values using probabilistic PCA
  46. 4. Integral is intractable
  47. 5. The full exact EM-algorithm will not allow you to optimize intractable functions, but can appoximate it to build an apporximate EM algorithm
  48. 6. Advantages
  49. 1. missing data
  50. 2. straightforward iterative scheme for large dimensionalities
  51. 3. Can do mixture of PPCA
  52. 4. hyperparameter tuning
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement