ACM Transactions on Information Systems (TOIS), Volume 20 Issue 2, April 2002

Peer-to-peer data trading to preserve information
Brian F. Cooper, Hector Garcia-Molina
Pages: 133-170
DOI: 10.1145/506309.506310
Data archiving systems rely on replication to preserve information. This paper discusses how a network of autonomous archiving sites can trade data to achieve the most reliable replication. A series of binary trades among sites produces a...

Collection statistics for fast duplicate document detection
Abdur Chowdhury, Ophir Frieder, David Grossman, Mary Catherine McCabe
Pages: 171-191
DOI: 10.1145/506309.506311
We present a new algorithm for duplicate document detection that uses collection statistics. We compare our approach with the state-of-the-art approach using multiple collections. These collections include a 30 MB 18,577 web document collection ...

Burst tries: a fast, efficient data structure for string keys
Steffen Heinz, Justin Zobel, Hugh E. Williams
Pages: 192-223
DOI: 10.1145/506309.506312
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information...

Theory of keyblock-based image retrieval
Lei Zhu, Al Bing Rao, Aldong Zhang
Pages: 224-257
DOI: 10.1145/506309.506313
The success of text-based retrieval motivates us to investigate analogous techniques which can support the querying and browsing of image data. However, images differ significantly from text both syntactically and semantically in their mode of...