One of the hot topics in the data mining world is how to discover the semantic meaning of a text document. Semantic meaning is related to moving beyond treating documents as a bag of words and instead discovering the underlying topics and meaning of the document.
One of the landmark papers is in this area is Latent Dirichelet Allocation by Jordan, Blei, and Ng, which introduces topic models for text classification.
Subscribe to:
Post Comments (Atom)
Actually LDA did use a Bag of Words assumption to justify there hierarchical model based on the exchangeability of word orders in documents in a corpus. However the model is also able to infer latent topics.
ReplyDeleteWhat is difference from LSA(Latent Semantics Analysis)?
ReplyDelete