ACM Transactions on Information Systems (TOIS), Volume 22 Issue 2, April 2004

A study of smoothing methods for language models applied to information retrieval
Chengxiang Zhai, John Lafferty
Pages: 179-214
DOI: 10.1145/984321.984322
Language modeling approaches to information retrieval are attractive and promising because they connect the problem of retrieval with that of language model estimation, which has been studied extensively in other application areas such as speech...

Multidocument summarization: An added value to clustering in interactive retrieval
Manuel J. Maña-López, Manuel De Buenaga, José M. Gómez-Hidalgo
Pages: 215-241
DOI: 10.1145/984321.984323
A more and more generalized problem in effective information access is the presence in the same corpus of multiple documents that contain similar information. Generally, users may be interested in locating, for a topic addressed by a group of similar...

Anchor text mining for translation of Web queries: A transitive translation approach
Wen-Hsiang Lu, Lee-Feng Chien, Hsi-Jian Lee
Pages: 242-269
DOI: 10.1145/984321.984324
To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons through the mining of Web anchor texts and...

Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries
Marcos André Gonçalves, Edward A. Fox, Layne T. Watson, Neill A. Kipp
Pages: 270-312
DOI: 10.1145/984321.984325
Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development efforts diverge and interoperability suffers. In this article, we propose the fundamental abstractions of Streams, Structures, Spaces,...

XIRQL: An XML query language based on information retrieval concepts
Norbert Fuhr, Kai Groβjohann
Pages: 313-356
DOI: 10.1145/984321.984326
XIRQL ("circle") is an XML query language that incorporates imprecision and vagueness for both structural and content-oriented query conditions. The corresponding uncertainty is handled by a consistent probabilistic model. The core features of XIRQL...