News recommendation has become an essential way to help readers discover interesting stories. While a growing line of research has focused on modeling reading preferences for news recommendation, they neglect the instability of reader consumption behaviors, i.e., the process of reader decision making when selecting articles may be influenced by other factors other than personal interests, which degrades the recommendation effectiveness of existing methods, especially in the scenario of extreme sparsity from news domain. In this paper, to adapt to the instability, we propose a latent probabilistic generative model BoRe to mimic reader consumption behaviors according to three factors, including personal interests, reading sequences, and crowd?s interests. Further, the extreme sparsity problem in news domain severely hinders modeling accurately reading preferences and usage of reading sequences, which discounts BoRe?s ability to adapt to the instability. Accordingly, we leverage domain-specific feature to model reading preferences in the situation of extreme sparsity. Meanwhile, we consider groups of users instead of individuals to capture reading sequences. Besides, we study how to reduce model training time to allow online application. Extensive experiments have been conducted to evaluate effectiveness and efficiency of BoRe on real-world datasets. The experimental results show the superiority of BoRe, compared with the state-of-the-art competing methods.
Product search is one of the most popular methods for customers to discover products online, and product search engines are usually optimized for user transactions so that the profits of e-commerce companies can be maximized. Most existing studies on product search focus on developing effective retrieval models that rank items by their likelihood to be purchased. They, however, ignore the problem that there is a gap between how systems and customers perceive the relevance of items. Without explanations, users may not understand why product search engines retrieve certain items for them, which consequentially leads to imperfect user experience and suboptimal system performance in practice. In this work, we tackle this problem by constructing explainable retrieval models for product search. Specifically, we propose a Dynamic Relation Embedding Model that creates a session-dependent knowledge graph based on multi-relational product data and search context. Ranking is conducted based on the relationship between users and items in the latent space, and explanations are generated with logic inferences and entity soft matching on the knowledge graph. Empirical experiments on e-commerce benchmark datasets show that our model significantly outperforms the state-of-the-art product search models and has the ability to produce reasonable explanations for search results.
Paper-reviewer recommendation problem in academic usually refers to recommend experts to comment on the quality of papers. How to effectively and accurately recommend reviewers for papers is a meaningful and still tough task. Generally, the representation of a paper and a reviewer is very important for the paper-reviewer recommendation. Actually, a reviewer or a paper often belongs to multiple research fields, which slightly increases difficulty in paper-reviewer recommendation. In this paper, we propose a Multi-Label Classification method using a HIErarchical and transPArent Representation named Hiepar-MLC. Firstly, we introduce a HIErarchical and transPArent Representation (Hiepar) to express the semantic information of the reviewer and the paper. Hiepar is learned from a two-level bidirectional gated recurrent unit based network applying the attention mechanism. It is capable of capturing the two-level hierarchical information (word-sentence-document) and highlighting the elements in reviewers or papers. Further we transform the paper-reviewer recommendation problem into a Multi-Label Classification (MLC) issue, whose multiple research labels exactly guide the learning process. It?s flexible that we can select any multi-label classification method to solve the paper-reviewer recommendation problem. Our experiments on the real dataset consists of the papers in ACM Digital Library show the effectiveness and feasibility of our method.
?Unbiasedness?, which is an important property to ensure that users? ratings indeed reflect their true evaluations to products, is vital both in shaping consumer purchase decisions and providing reliable recommendations in online ratings systems. Recent experimental studies showed that distortions from historical ratings would ruin the unbiasedness of subsequent ratings. How to ?discover? historical distortions in each single rating (or at the micro-level), and perform the ?debiasing operations? are our main objective. Using 42 million real customer ratings, we first show that users either ?assimilate? or ?contrast? to historical ratings under different scenario, which can be further explained by the well-known psychological argument: the ?Assimilate-Contrast? theory. This motivates us to propose the Historical Influence Aware Latent Factor Model (HIALF), the ?first? model for real rating systems to capture and mitigate historical distortions in each single rating. HIALF allows us to study the influence patterns of historical ratings from a modelling perspective, and it perfectly matches the assimilation and contrast effects observed in experiments. Moreover, HIALF achieves significant improvements in predicting subsequent ratings, characterizing relationships in ratings. It also contributes to better recommendations, wiser consumer purchase decisions, and deeper understanding of historical distortions in both honest rating and misbehaving rating settings.
A series of click models have been proposed to extract accurate and unbiased relevance feedback from valuable yet noisy click-through data in search logs. Previous works have shown that users' search behaviors in mobile and desktop scenarios are rather different in many aspects, therefore, the click models designed for desktop search may not be effective in the mobile context. To address this problem, we propose two novel click models for mobile search: 1) Mobile Click Model (MCM) which models click necessity bias and examination satisfaction bias; 2) Viewport Time Click Model (VTCM) which further extends MCM by utilizing the viewport time. Extensive experiments on large-scale real mobile search logs show that: 1) MCM and VTCM outperforms existing models in predicting users' clicks and estimating result relevance; 2) MCM and VTCM can extract richer information, such as the click necessity of search results and the probability of user satisfaction, from mobile click logs; 3) By modeling the viewport time distributions of heterogeneous results, VTCM can bring a significant improvement over MCM in click prediction and relevance estimation tasks. Our proposed click models can help better understand user behavior patterns in mobile search and improve the ranking performance of mobile search engines.
Recommender systems aim to capture user preferences and provide accurate recommendations to users accordingly. A collection of users may have similar preferences with each other, thus forms a community. Since such communities may not be explicitly given, they are formally defined and named Implicit Preference Communities (IPCs) in this paper. By enriching user preferences with the information of other users in the communities, the performance of recommender systems can also be enhanced. In this paper, we propose a recommendation model with Implicit Preference Communities from user ratings and social connections. To tackle the unsupervised learning limitation of IPC modeling, we propose a semi-supervised Bayesian probabilistic graphical model to capture the IPC structure for recommendation. Meanwhile following the spirit of transfer learning, both rating behaviors and social connections are introduced into the model by parameter sharing. Moreover Gibbs sampling based algorithms are proposed for parameter inferences of the models. And to meet the need for online scenarios when data arrives as a stream, a novel online sampling based parameter inference algorithm is proposed. Extensive experiments on seven real-world datasets have been conducted and compared with fourteen state-of-art recommendation algorithms. Statistically significant improvements verify the effectiveness of the proposed IPC-aware recommendation models.
Low-rank matrix approximation (LRMA) has attracted more and more attention in the community of recommendation. Even though LRMA-based recommendation methods obtain promising result, they suffer from the complicated structure of the large-scale and sparse rating matrix. The main reason is that they have to predefine the important parameters, such as the rank of the rating matrix, the number of submatrices. Moreover, most existing Local LRMA methods are usually designed in a two-phase separated framework and do not consider the missing mechanisms of rating matrix. In this paper, a non-parametric unified Bayesian graphical model is proposed for Adaptive Local low-rank Matrix Approximation (ALoMA). ALoMA has ability to simultaneously identify rating submatrices, determine the optimal rank for each submatrix, learn the submatrix-specific user/item latent factors and estimate the importance of latent feature with missing mechanisms, so that these four parts are seamlessly integrated and enhance each other. We theoretically analyze the model?s generalization error bounds and give an approximation guarantee. Furthermore, an efficient Gibbs sampling-based algorithm is designed to infer the proposed model. A series of experiments have been conducted on six real-world datasets. The results demonstrate that ALoMA outperforms the state-of-the-art LRMA-based methods and can friendly provide interpretable recommendation results.
Rank fusion is a powerful technique that allows multiple sources of information to be combined into a single result set. However, to date fusion has not been regarded as being cost-effective in cases where strict per- query efficiency guarantees are required, such as in web search. In this work we propose a novel solution to rank fusion by splitting the computation into two parts ? one phase that is carried out offline to generate pre-computed centroid answers for queries with broadly similar information needs, and then a second online phase that uses the corresponding topic centroid to compute a result page for each query. We explore efficiency improvements to classic fusion algorithms whose costs can be amortized as a pre-processing step, and can then be combined with re-ranking approaches to dramatically improve effectiveness in multi-stage retrieval systems with little efficiency overhead at query time. Experimental results using the ClueWeb12B collection and the UQV100 query variations demonstrate that centroid-based approaches allow improved retrieval effectiveness at little or no loss in query throughput or latency, and with reasonable pre-processing requirements. We additionally show that queries that do not match any of the pre-computed clusters can be accurately identified and efficiently processed in our proposed ranking pipeline.
Next and next new Point-of-interest (POI) recommendation are essential instruments in promoting customer experiences and business operations related to locations. However, due to the sparsity of the check-in records, they still remain insufficiently studied. In this paper, we propose to utilize personalized latent behavior patterns learned from contextual features, e.g., time of day, day of week, and location category, to improve the effectiveness of the recommendations. Two variations of models are developed, including GPDM which learns a fixed pattern distribution for all users and PPDM which learns personalized pattern distribution for each user. In both models, a soft-max function is applied to integrate the personalized Markov chain with the latent patterns, and a sequential Bayesian Personalized Ranking (S-BPR) is applied as the optimization criterion. Then, Expectation Maximization (EM) is in charge of finding optimized model parameters. Extensive experiments on three large-scale commonly adopted real-world LBSN datasets prove that the inclusion of location category and latent patterns helps to boost the performance of POI recommendations. Specifically, our models in general significantly outperform other state-of-the-art methods for both next and next new POI recommendation tasks. Moreover, our models are capable of making accurate recommendations no matter they are short/long duration or distance.
Question answering over knowledge base (KB-QA) aims to take full advantage of the knowledge in knowledge bases with the ultimate purpose of returning answers to questions. To access the substantial knowledge within the KB, many model architectures are hindered by the bottleneck of accurately predicting relations which connect subject entities to object entities. To break the bottleneck, this paper presents a novel framework which can be viewed as an extension to APVA-TURBO. Experimental results show a boost in performance to the APVA-TURBO approach and outperform other question answering approaches.
Document retrieval methods that utilize relevance feedback often induce a single query model from the set of feedback documents, specifically, the relevant ones. We empirically show that for a few state-of-the-art query-model induction methods, retrieval performance can be significantly improved by constructing the query model from a subset of the relevant documents rather than from all of them. Motivated by this finding, we propose a new approach for relevance-feedback-based retrieval. The approach, derived from the risk minimization framework, is based on utilizing multiple query models induced from all subsets of the given relevant documents. Empirical evaluation shows that the approach is significantly more effective than the standard practice of utilizing a single query model induced from the relevant documents. The approach also substantially outperforms a variety of methods addressing various challenges in using relevance feedback.
An intensive recent research work investigated the combined use of hand-curated knowledge sources and corpus-driven sources to learn effective text representations. The overall learning process could be run by online revising the learning objective or by offline refining an original learned representation. The differentiated impact of each of the learning approaches on the quality of the learned representations has not been so far studied in the literature. This article focuses on the design of comparable offline vs. online knowledge-enhanced document representation learning models and the comparison of their effectiveness using a set of standard IR and NLP downstream tasks. The results of quantitative and qualitative analyses show that 1) offline vs. online learning approaches have dissimilar results trends regarding the task as well as the dataset distribution counts with regard to domain application; 2) while considering relational semantics is undoubtedly beneficial, the way used to express relational constraints could affect semantic inference effectiveness. The findings of this work present opportunities for the design of future representation learning models, but also for providing insights about the evaluation of such models.
In the personalized POI (Point-Of-Interest, or venue) recommendation, the diversity of recommended POIs is an important aspect. Diversity is especially important when POIs are recommended in the target users? frequently visited areas because users are likely to revisit such areas. In addition to the (POI) category diversity that is a popular diversification objective in recommendation domains, diversification of recommended POI locations is an interesting subject itself. Despite its importance, existing POI recommender studies generally focus on and evaluate prediction accuracy. In this paper, geographical diversification (geo-diversification), a novel diversification concept that aims to increase recommendation coverage for a target users? geographic areas of interest is introduced, from which a method that improves geo-diversity as an addition to existing state-of-the-art POI recommenders is proposed. In experiments with the datasets from two real Location Based Social Networks (LSBNs), we first analyze the performance of four state-of-the-art POI recommenders from various evaluation perspectives including category diversity and geo-diversity that have not been examined previously. The proposed method consistently improves geo-diversity (CPR(geo)@20) by 5 to 17% when combined with four state-of-the-art POI recommenders with negligible prediction accuracy ([email protected]) loss, and provides 10 to 22% geo-diversity improvement with tolerable prediction accuracy loss (1 to 3%).