enter search term and/or author name
Peer-to-peer data trading to preserve information
Brian F. Cooper, Hector Garcia-Molina
Data archiving systems rely on replication to preserve information. This paper discusses how a network of autonomous archiving sites can trade data to achieve the most reliable replication. A series of binary trades among sites produces a...
Collection statistics for fast duplicate document detection
Abdur Chowdhury, Ophir Frieder, David Grossman, Mary Catherine McCabe
We present a new algorithm for duplicate document detection that uses collection statistics. We compare our approach with the state-of-the-art approach using multiple collections. These collections include a 30 MB 18,577 web document collection ...
Burst tries: a fast, efficient data structure for string keys
Steffen Heinz, Justin Zobel, Hugh E. Williams
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information...
Theory of keyblock-based image retrieval
Lei Zhu, Al Bing Rao, Aldong Zhang
The success of text-based retrieval motivates us to investigate analogous techniques which can support the querying and browsing of image data. However, images differ significantly from text both syntactically and semantically in their mode of...