Monday, May 11, 2009

Meeting 11 - Topic Models - Latent Dirichlet Allocation

One of the hot topics in the data mining world is how to discover the semantic meaning of a text document. Semantic meaning is related to moving beyond treating documents as a bag of words and instead discovering the underlying topics and meaning of the document.

One of the landmark papers is in this area is Latent Dirichelet Allocation by Jordan, Blei, and Ng, which introduces topic models for text classification.

2 comments:

  1. Actually LDA did use a Bag of Words assumption to justify there hierarchical model based on the exchangeability of word orders in documents in a corpus. However the model is also able to infer latent topics.

    ReplyDelete
  2. What is difference from LSA(Latent Semantics Analysis)?

    ReplyDelete