ACM Transactions on

Information Systems (TOIS)

Latest Articles

Binary Sketches for Secondary Filtering

This article addresses the problem of matching the most similar data objects to a given query object. We adopt a generic model of similarity that involves the domain of objects and metric distance functions only. We examine the case of a large dataset in a complex data space, which makes this problem inherently difficult. Many indexing and... (more)

Efficient Learning-Based Recommendation Algorithms for Top-N Tasks and Top-N Workers in Large-Scale Crowdsourcing Systems

The task and worker recommendation problems in crowdsourcing systems have brought up unique... (more)

Learning to Adaptively Rank Document Retrieval System Configurations

Modern Information Retrieval (IR) systems have become more and more complex, involving a large number of parameters. For example, a system may choose... (more)

Brotli: A General-Purpose Data Compressor

Brotli is an open source general-purpose data compressor introduced by Google in late 2013 and now adopted in most known browsers and Web servers. It is publicly available on GitHub and its data format was submitted as RFC 7932 in July 2016. Brotli is based on the Lempel-Ziv compression scheme and... (more)

Product-Based Neural Networks for User Response Prediction over Multi-Field Categorical Data

User response prediction is a crucial component for personalized information retrieval and filtering scenarios, such as recommender system and web... (more)

From Question to Text: Question-Oriented Feature Attention for Answer Selection

Understanding unstructured texts is an essential skill for human beings as it enables knowledge acquisition. Although understanding unstructured texts is easy for we human beings with good education, it is a great challenge for machines. Recently, with the rapid development of artificial intelligence techniques, researchers put efforts to teach... (more)

A Deep Bayesian Tensor-Based System for Video Recommendation

With the availability of abundant online multi-relational video information, recommender systems that can effectively exploit these sorts of data and... (more)

Shallow and Deep Syntactic/Semantic Structures for Passage Reranking in Question-Answering Systems

In this article, we extensively study the use of syntactic and semantic structures obtained with... (more)

Seed-Guided Topic Model for Document Filtering and Classification

One important necessity is to filter out the irrelevant information and organize the relevant information into meaningful categories. However,... (more)

Jointly Minimizing the Expected Costs of Review for Responsiveness and Privilege in E-Discovery

Discovery is an important aspect of the civil litigation process in the United States of America, in... (more)

Adversarial Distillation for Efficient Recommendation with External Knowledge

Integrating external knowledge into the recommendation system has attracted increasing attention in both industry and academic communities. Recent... (more)

SMAPH: A Piggyback Approach for Entity-Linking in Web Queries

We study the problem of linking the terms of a web-search query to a semantic representation given by the set of entities (a.k.a. concepts) mentioned in it. We introduce SMAPH, a system that performs this task using the information coming from a web search engine, an approach we call “piggybacking.” We employ search engines to... (more)


Call for Special Issue Proposals

ACM Transactions on Information Systems (TOIS) invites proposals for a special issue of the journal devoted to any topic in information retrieval. Click here to see more details.

New options for ACM authors to manage rights and permissions for their work


ACM introduces a new publishing license agreement, an updated copyright transfer agreement, and a new authorpays option which allows for perpetual open access through the ACM Digital Library. For more information, visit the ACM Author Rights webpage.


About TOIS

ACM Transactions on Information Systems (TOIS) is a scholarly journal that publishes previously unpublished high-quality scholarly articles in all areas of information retrieval. TOIS is published quarterly.

read more

Forthcoming Articles
Collective Entity Linking: A Multi-View Aspect

Facing lots of name mentions appearing on the web, entity linking becomes essential for many information processing applications. To further improve linking accuracy, the relations between entities are usually considered in linking process. This kind of methods is called collective entity linking and can obtain high-quality results. There are two kinds of information helpful to reveal the relations between entities, i.e. contextual information and structural information of entities. Most of traditional collective entity linking methods consider them separately. In fact, these two kinds of information represent entities from different and specific views. Besides, if we look into each view closely, it can be separated into sub-views that are more meaningful. For this reason, this paper proposes a multi-view based collective entity linking algorithm, which combines several views of entities into an objective function for entity linking. The importance of each view can be valued and the linking results can be obtained along with resolving this objective function. Experimental results demonstrate that our linking algorithm can acquire higher accuracy than many state-of-the-art entity linking methods. Besides, since we simplify entitys structure and change entity linking to a sub-matrix searching problem, our algorithm also owns high efficiency.

A Context-Aware User-Item Representation Learning for Item Recommendation

Both reviews and user-item interactions have been widely adopted for user rating prediction. However, these existing techniques mainly extract the latent representations for users and items in an independent and static manner. In this paper, we propose a novel context-aware user-item representation learning model for rating prediction, named CARL. Namely, CARL derives a joint representation for a given user-item pair based on their individual latent features and latent feature interactions. Then, CARL adopts Factorization Machines to further model higher-order feature interactions on the basis of the user-item pair for rating prediction. Specifically, two separate learning components are devised in CARL to exploit review data and interaction data respectively: review-based feature learning and interaction-based feature learning. In review-based learning component, with convolution operations and attention mechanism, the relevant features for a user-item pair are extracted by jointly considering their corresponding reviews. However, these features are only reivew-driven and may not be comprehensive. Hence, interaction-based learning component further extracts complementary features from interaction data alone, also on the basis of user-item pairs. The final rating score is then derived with a dynamic linear fusion mechanism. Experiments on seven real-world datasets show that CARL achieves significantly better rating predication accuracy than existing state-of-the-art alternatives.

Risk-Sensitive Learning to Rank with Evolutionary Multi-Objective Feature Selection

Learning to Rank (L2R) is one of the main research lines in Information Retrieval. With ever increasing data and more complex machine learning algorithms, the effort to process all L2R sub-tasks (e.g., on-the-fly feature generation, re-ranking) has tremendously increased. In this context, feature selection (FS) becomes an alternative to withdraw noisy and redundant features, improving processing time and ranking effectiveness. Despite that, for years research has focused mostly on effectiveness and feature reduction as the main objectives. However, removing certain features may harm the effectiveness of the learned model for some specific but important queries. Accordingly, in this work we propose to evaluate FS for L2R with an additional objective in mind, namely risk-sensitiveness. We introduce novel single and multi-objective criteria to optimize feature reduction, effectiveness and risk-sensitiveness at the same time. We also introduce a new methodology to explore the search space, suggesting effective and efficient extensions of a well-known Evolutionary Algorithm (SPEA2) for FS applied to L2R. Our experiments show that many of the proposed objective criteria can be achieved by selecting more effective and risk-sensitive feature subsets than those selected by state-of-the-art FS methods. We also provide a thorough analysis of our methodology and experimental results.

Top-N Recommendation with Multi-Channel Positive Feedback using Factorization Machines

User interactions can be considered to constitute different feedback channels, e.g., view, click, like or follow, that provide implicit information on users preferences. Each implicit feedback channel typically carries a unary, positive-only signal, which can be exploited by collaborative fltering models to generate lists of personalized recommendations. This paper investigates how a learning-to-rank recommender system can best take advantage of implicit feedback signals from multiple channels. We focus on Factorization Machines (FM) with Bayesian Personalized Ranking (BPR), a pairwise learning-to-rank method, that allows us to experiment with different forms of exploitation. We perform extensive experiments on three datasets with multiple types of feedback to arrive at a series of insights.We compare conventional, direct integration of feedback types with our proposed method that exploits multiple feedback channels during the sampling process of training.We refer our method as multi-channel sampling. Our results show that multi-channel feedback sampling outperforms conventional integration, and that sampling with the relative level" of feedback, is always superior to a level-blind sampling approach.We evaluated our method experimentally on three datasets in different domains and found out that with our multi-channel sampler the accuracy and the item coverage of recommendations can be improved significantly compared to state-of-the-art models.

ELSA: a multilingual document summarization algorithm based on frequent itemsets and Latent Semantic Analysis

Sentence-based summarization aims at extracting concise summaries of collections of textual documents.The most effective multilingual strategies rely on Latent Semantic Analysis (LSA) and frequent itemset mining, respectively.LSA-based summarizers pick the document sentences that cover the most important concepts. Concepts are modeled as combinations of single document terms and they are derived from a term-by-sentence matrix by exploiting the Singular Value Decomposition (SVD). Itemset-based summarizers pick the sentences that contain the largest number of frequent itemsets, which represent combinations of frequently co-occurring terms.The main drawbacks of existing approaches are (i) the inability of LSA to consider the correlation between combinations of multiple document terms with the underlying concepts, and (ii) the inability of itemset-based summarizers to correlate itemsets with the underlying document concepts. To overcome the issues of both the aforesaid algorithms, we propose a new summarization approach that exploits frequent itemsets to describe all the latent concepts covered by the documents under analysis and LSA to reduce the potentially redundant set of itemsets to a compact set of uncorrelated concepts. The summarizer selects the sentences that cover the latent concepts with minimal redundancy. We tested the summarization algorithm on both multilingual and English-written benchmark document collections.

Spatiotemporal Representation Learning for Translation-based POI Recommendation

The increasing proliferation of location-based social networks brings about a huge volume of user check-in data, which facilitates the recommendation of points of interest (POIs). Time and location are two most important contextual factors in the user?s decision making for choosing a POI to visit. In this paper, we focus on the spatiotemporal context-aware POI recommendation which considers the joint effect of time and location for POI recommendation. Inspired by the recent advances in knowledge graph embedding, we propose a spatiotemporal context-aware & translation-based recommender framework (STA) to model the third-order relationship among users, POIs and spatiotemporal contexts for large-scale POI recommendation. Specifically, we embed both users and POIs into a ?transition space? where spatiotemporal contexts (i.e., a pair) are modeled as translation vectors operating on users and POIs. We further develop a series of strategies to exploit various correlation information to address the data sparsity and cold start issues for new spatiotemporal contexts, new users, and new POIs. We conduct extensive experiments on two real-world datasets. The experimental results demonstrate that our STA framework achieves the superior performance in terms of high recommendation accuracy, robustness to data sparsity, and effectiveness in handling the cold start problem.

Handling Massive N-Gram Datasets Efficiently

This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and estimation, that is computing the probability distribution of the strings from a large textual source. Performing these two tasks efficiently is fundamental for several applications in the fields of Information Retrieval, Natural Language Processing and Machine Learning. Regarding the problem of indexing, we describe compressed, exact and lossless data structures that achieve, at the same time, high space reductions and no time degradation with respect to the state-of-the-art solutions. In particular, we present a compressed trie data structure in which each word following a context of fixed length k, i.e., its preceding k words, is encoded as an integer whose value is proportional to the number of words that follow such context. Despite the significant savings in space, our technique introduces a negligible penalty at query time. Regarding the problem of estimation, we present a novel algorithm for estimating modified Kneser-Ney language models. The state-of-the-art algorithm uses three sorting steps in external memory: we show an improved construction that requires only one sorting step thanks to exploiting the properties of the extracted n-gram strings.

Using Gaussian Processes for Rumour Stance Classification in Social Media

Social media tend to be rife with rumours while new reports are released piecemeal during breaking news. Interestingly, one can mine multiple reactions expressed by social media users in those situations, exploring their stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. In this work, we set out to develop an automated, supervised classifier that uses multi-task learning to classify the stance expressed in each individual tweet in a rumourous conversation as either supporting, denying or questioning the rumour. Using a classifier based on Gaussian Processes, and exploring its effectiveness on two datasets with very different characteristics and varying distributions of stances, we show that our approach consistently outperforms competitive baseline classifiers. Our classifier is especially effective in estimating the distribution of different types of stance associated with a given rumour, which we set forth as a desired characteristic for a rumour-tracking system that will warn both ordinary users of Twitter and professional news practitioners when a rumour is being rebutted.

All ACM Journals | See Full Journal Index

Search TOIS
enter search term and/or author name