Untitled

1. Latent variable is a hidden variable unobserved neither in training or testing phases
2. Why probabilistic model
	1. Quantify the uncertainty of the prediction
	2. Missing values
3. Introduce latent variable may simplify models (less edges)
	1. Fewer parameters
	2. sometimes meaningful
	3. can be harder to work with
4. Probabilistic clustering
	1. Hard clustering
	2. soft clustering (probability p(cluster idx|x) instead of cluster idx=f(x))
	3. can be used in hyperparameter tuning in determining the number of clusters
	4. Generate model of the data
5. Gaussian Mixture Model
	1. Weighted multiple Gaussian distributions
	2. Train GMM
		1. MLE
		2. hard to fit with stochastic optimizer
			1. hard to comply some constraints
			2. Expectation maximization algorithm is much faster and more efficient
6. training GMM
	1. latent variable t, p(t) = weight
	2. p(x|t) = Gaussian (x)
	3. EM algorithm
		1. Start with 2 randomly placed Gaussian parameters theta
		2. Unitl convergence repeat to update Gaussian parameter
	4. Global Optimum NP-hard
	5. EM won’t give global optimal (heuristics), suffer from local optimal
	6. Choose the best run among several training attempts with different random initializations
	7. Choose the one with the highest training log-likelihood or with highest validation log-likelihood
	8. EM can train GMM faster than SGD and also handles complicated constraint, suffers from local maxima