We investigate three research questions about Kendalls tau correlation and AP correlation between evaluation measures: (i) what is the effect of the number of systems and topics? (ii) what is the effect of removing low performing systems? (iii) what is the effect of the experimental collections? We propose a methodology based on General Linear Mixed Model and ANalysis Of VAriance to isolate the effects of the number of topics, number of systems, and experimental collections and to observe expected correlation values, net from these effects, which are stable and reliable. We learned that the effect of the number of topics is more prominent than the one of the number of systems and that 100 topics and 100 systems provide quite stable results, being the typical size of 50 topics and 100 systems already reasonable. Even if it produces different absolute values, removing low performing systems does not seem to provide information substantially different from not removing them. Finally, we found out that both document corpora and topic sets affect the correlation among evaluation measures, being the effect of the latter more prominent. However, the effect of the number of topics seems to influence correlation more than the topic sets.
We propose the AWARE stochastic framework, a novel methodology for dealing with multiple crowd assessors, who may be contradictory and/or noisy. By modeling relevance judgements and crowd assessors as sources of uncertainty, AWARE takes the expectation of a generic performance measure, like AP, composed with these random variables. In this way, it approaches the problem of aggregating different crowd assessors from a new perspective, i.e. directly combining the performance measures computed on the ground-truth generated by the crowd assessors instead of adopting some classification technique to merge the labels produced by them. We propose several unsupervised estimators that instantiate the AWARE framework and we experiment them on TREC collections with respect to state-of-the-art approaches, i.e. MV and EM. We found out that AWARE approaches improve in terms of capability of correctly ranking systems and predicting the actual performance scores.
Cold start recommendation is one of the most challenging problems in recommender systems. An important approach for this problem is to conduct an interview for new users, called interview-based approach. Among the interview-based methods, Representative-Based Matrix Factorization (RBMF) provides an effective solution with appealing merits: It represents users over representative items, making the recommendations highly intuitive and interpretable. However, RBMF only utilizes a global set of representative items to model all users. Such a representation is somehow too restrict and may not be flexible to capture varying users' interests. To address this problem, we propose a novel interview-based model to dynamically create meaningful user groups using decision trees, and select local representative items for different groups. A two-round interview is performed for a user. In the first round, l1 global questions are issued for group division; in the second round, l2 local group-specific questions are given to derive local representation. We collect feedbacks on the (l1+l2) items to learn user representations. By combining these steps, we develop a joint optimization model, named local representative based matrix factorization, for new user recommendation. Extensive experiments on three public datasets have demonstrated the effectiveness of the proposed model compared with several competitive baselines.
This special issue focuses on research problems of search, mining and their applications in mobile devices. Topics of interest in this special issue include but are not limited to mobile data mining and management, mobile search, personalization and recommendation, mobile user interfaces and human-computer interaction, and new applications in mobile environment. The aim of this special issue is to bring together top experts across multiple disciplines, including information retrieval, data mining, mobile computing, and cyber-physical systems, such that academic and industrial researchers can exchange ideas and share the latest developments on the state-of-the-art and practice of mobile search and mobile data mining.
The interplay between the response latency of web search systems and users' search experience has only recently started to attract research attention, despite the important implications on monetization of such systems. In this work, we carry out two complementary studies to investigate the impact of response latency on users' searching behaviour. We first conduct a controlled study to investigate the sensitivity of users to increasing delays in response latency and show that the users of a fast search system are more sensitive to delays than the users of a slow search system. Moreover, the study finds that users become very likely to notice the response latency delays beyond a certain latency threshold. We then analyse a large number of search queries obtained from Yahoo Web Search and demonstrate the significant change in users' click behaviour as the response latency increases. To demonstrate a possible use case for our findings, we devise a machine learning framework that leverages the latency impact, together with other features, to predict whether a user will issue any click on web search results or not. As a further extension, we investigate whether this machine learning framework can be exploited to help search engines to reduce their energy consumption during query processing.
Suggesting/recommending venues or points-of-interest (POIs) is a hot topic in recent years, especially for tourism applications and mobile users. We propose and evaluate several suggestion methods, taking an effectiveness, feasibility, efficiency, and privacy perspective. The task is addressed by two content-based methods, namely, a Weighted kNN classifier and a Rated Rocchio personalized query, Collaborative Filtering methods, as well as several methods of merging results (rank-based or rating-based) of different systems. Effectiveness is evaluated on two standard benchmark datasets, provided and used by TREC's Contextual Suggestion Tracks in 2015 and 2016. First, we enrich these datasets with more information on venues, collected from web-services like Foursquare and Yelp; we make this extra data available for future experimentation. Then, we find that the content-based methods provide state-of-the-art effectiveness, the collaborative filtering variants mostly suffer from data sparsity problems in the current datasets, and the merging methods further improve results by mainly promoting the first relevant suggestion. Concerning mobile feasibility, efficiency, and user privacy, the content-based methods, especially Rated Rocchio, are the best. Collaborative filtering has the worst efficiency and privacy leaks. Our findings can be very useful for developing effective operational systems, respecting user privacy.