Performs Bayesian mixture modeling.
The idea is that we use a probabilistic mixture of a number of models that we use to explain some observed data. The idea here is that each observed data point is assumed to have come from one of the models in the mixture, but we don't know which. The way we deal with that is to use a so-called latent parameter which specifies which model each data point came from.
In addition, since this is a Bayesian clustering algorithm, we don't want to actually commit to any single explanation, but rather to sample from the distribution of models and latent assignments of data points to models given the observed data and the prior distributions of model parameters.
This sampling process is initialized by taking models at random from the prior distribution for models.
Then, we iteratively assign points to the different models using the mixture probabilities and the degree of fit between the point and each model expressed as a probability that the point was generated by that model.
After points are assigned, new parameters for each model are sampled from the posterior distribution for the model parameters considering all of the observed data points that were assigned to the model. Models without any data points are also sampled, but since they have no points assigned, the new samples are effectively taken from the prior distribution for model parameters.
The result is a number of samples that represent mixing probabilities, models and assignment of points to models. If the total number of possible models is substantially larger than the number that ever have points assigned to them, then this algorithm provides a (nearly) non-parametric clustering algorithm.
These samples can give us interesting information that is lacking from a normal clustering that consists of a single assignment of points to clusters. Firstly, by examining the number of models in each sample that actually has any points assigned to it, we can get information about how many models (clusters) that the data support.
Morevoer, by examining how often two points are assigned to the same model, we can get an approximate measure of how likely these points are to be explained by the same model. Such soft membership information is difficult to come by with conventional clustering methods.
Finally, we can get an idea of the stability of how the data can be described. Typically, aspects of the data with lots of data available wind up with stable descriptions while at the edges, there are aspects that are phenomena that we can't really commit to a solid description, but it is still clear that the well supported explanations are insufficient to explain these additional aspects.
One thing that can be difficult about these samples is that we can't always assign a correlation between the models in the different samples. Probably the best way to do this is to look for overlap in the assignments of data observations to the different models.
\theta_i ~ prior() \lambda_i ~ Dirichlet(\alpha_0) z_j ~ Multinomial( \lambda ) x_j ~ model(\theta_i)