ACM Transactions on Information Systems (TOIS), Volume 28 Issue 1, January 2010

Probabilistic static pruning of inverted files
Roi Blanco, Alvaro Barreiro
Article No.: 1
DOI: 10.1145/1658377.1658378

Information retrieval (IR) systems typically compress their indexes in order to increase their efficiency. Static pruning is a form of lossy data compression: it removes from the index, data that is estimated to be the least important to retrieval...

Statistical lattice-based spoken document retrieval
Tee Kiah Chia, Khe Chai Sim, Haizhou Li, Hwee Tou Ng
Article No.: 2
DOI: 10.1145/1658377.1658379

Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices...

Semantic clustering of XML documents
Andrea Tagarelli, Sergio Greco
Article No.: 3
DOI: 10.1145/1658377.1658380

Dealing with structure and content semantics underlying semistructured documents is challenging for any task of document management and knowledge discovery conceived for such data. In this work we address the novel problem of clustering...

Learning author-topic models from text corpora
Michal Rosen-Zvi, Chaitanya Chemudugunta, Thomas Griffiths, Padhraic Smyth, Mark Steyvers
Article No.: 4
DOI: 10.1145/1658377.1658381

We propose an unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage stochastic process. An author is represented by a...