Information Systems (TOIS)


ACM Transactions on Information Systems (TOIS), Volume 36 Issue 1, June 2017

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language Web Pages
Lulwah M. Alkwai, Michael L. Nelson, Michele C. Weigle
Article No.: 1
DOI: 10.1145/3041656

It has long been suspected that web archives and search engines favor Western and English language webpages. In this article, we quantitatively explore how well indexed and archived Arabic language webpages are as compared to those from other...

Clustered Elias-Fano Indexes
Giulio Ermanno Pibiri, Rossano Venturini
Article No.: 2
DOI: 10.1145/3052773

State-of-the-art encoders for inverted indexes compress each posting list individually. Encoding clusters of posting lists offers the possibility of reducing the redundancy of the lists while maintaining a noticeable query processing...

GVoS: A General System for Near-Duplicate Video-Related Applications on Storm
Jiawei Jiang, Yunhai Tong, Hua Lu, Bin Cui, Kai Lei, Lele Yu
Article No.: 3
DOI: 10.1145/3041657

The exponential increase of online videos greatly enriches the life of users but also brings huge numbers of near-duplicate videos (NDVs) that seriously challenge the video websites. The video websites entail NDV-related applications such as...

Unifying Virtual and Physical Worlds: Learning Toward Local and Global Consistency
Xiang Wang, Liqiang Nie, Xuemeng Song, Dongxiang Zhang, Tat-Seng Chua
Article No.: 4
DOI: 10.1145/3052774

Event-based social networking services, such as Meetup, are capable of linking online virtual interactions to offline physical activities. Compared to mono online social networking services (e.g., Twitter and Google+), such dual networks provide a...

IDF for Word N-grams
Masumi Shirakawa, Takahiro Hara, Shojiro Nishio
Article No.: 5
DOI: 10.1145/3052775

Inverse Document Frequency (IDF) is widely accepted term weighting scheme whose robustness is supported by many theoretical justifications. However, applying IDF to word N-grams (or simply N-grams) of any length without relying on heuristics has...