apid advances in mobile devices and cloud-based music service now allow consumers to enjoy music at anytime and anywhere. Consequently, there has been increasing demand in studying intelligent techniques for context aware music recommendation. However, one important context that is generally overlooked in current research is user's venue, which often includes various semantics to greatly influence the users' music preference. In this paper, we present a novel location-aware music recommender system to effectively identify suitable music for various common venues in our daily lives. Towards this goal, we develop a Location aware Topic Model (LTM) to 1) effectively extract the topics related to different venues and 2) accurately infer the music for a certain location under a latent topic space, in which the music contents can be directly matched with the venue. An extensive experimental study based on two large music test collections demonstrates the various advantages of the proposed systems.
Several properties of information retrieval (IR) data, e.g. query length, are widely considered to be approximately distributed as a power law. This assumption is common practice, aiming to focus on specific characteristics of an empirical probability distribution, e.g. its long/fat tail. Motivated by recent work in the statistical treatment of power law claims, we investigate two research questions: (1) To what extent do power law approximations hold for the following IR data properties: (i) term frequency, (ii) document length, (iii) query length, (iv) citation frequency and (v) syntactic $n$-gram frequency? (2) What is the computational cost (and its practical consequences to operational IR systems) of replacing empirical power law approximations with more accurate distribution fitting? To answer these two questions we study 25 TREC and 3 non-TREC datasets and we compare the fit of power laws to 15 other standard probability distributions. We find that term frequency is best approximated by a power law only for 5 datasets. The other data properties are better approximated by the inverse Gaussian, Generalized Extreme Value, Negative Binomial or Yule. We also find that the overhead of replacing power law approximations by more informed distribution fittings is negligible for operational IR systems.
Volunteers have always been extremely crucial and in urgent need for nonprofit organizations (NPOs) to sustain their continuing operations. However, it is expensive and time consuming to recruit volunteers using traditional approaches. In Web 2.0 era, the abundant and ubiquitous social media data open a door to the possibility of automatic volunteer identification. In this paper, we aim to fully explore this possibility by proposing a scheme, which is able to predict users' volunteerism tendency from user-generated contents collected from multiple social networks based on a conceptual volunteering decision model. We conducted comprehensive experiments as well as a user study to investigate the effectiveness of our proposed scheme, and further discussed its generalizibility and extendability. The novel interdisciplinary research will potentially inspire more promising and important human-centered applications.
In recent years there has been a growing trend to use publically available social media sources within the field of journalism. Breaking news has tight reporting deadlines, measured in minutes not days, but content must still be checked and rumours verified. As such journalists are looking at automated content analysis to pre-filter large volumes of social media content prior to more traditional manual verification techniques. This paper describes a real-time social media analytics framework for journalists. We extend our previously published geoparsing approach to improve its scalability and efficiency. We develop and evaluate a novel approach to geosemantic feature extraction, classifying evidence in terms of situatedness, timeliness, confirmation and validity. Our approach is not news story specific and works for new unseen event types. We report results from 4 experiments using 5 Twitter datasets crawled during different English-language news events. One of our datasets is the standard TREC 2012 microblog corpus. Our classification results are promising, with F1 scores varying by class from 0.64 to 0.92 for unseen event types. We lastly report results from two case studies looking at spatio-temporal grounding of rumours during real-world news stories, showcasing different ways our system can assist journalists in the verification process.
The stock market is strongly affected by various types of highly interrelated sources of information. To study the impact of information on stock markets, the common strategy in previous studies has been to concatenate the features of different information sources into one super feature vector, which weakens the ability to distinguish the effects of different information sources. The challenge lies in how to model the complex information space of various sources and study their joint impacts on stock markets. For this purpose, we introduce a tensor-based information framework to predict stock price movements. Specifically, our framework first models the complex investors' information environment and their intrinsic links with tensors. A global dimensionality reduction algorithm to capture the intrinsic links among different information sources in terms of the geometric structure of a single tensor and the entire information tensor stream is then proposed. Finally, a tensor-based predictive model to forecast stock price movements in response to new information, which is in essence a high-order tensor-regression learning problem, is presented. Experiments performed on CSI 100 stocks demonstrate that TeSlA, which is a trading system based on our framework, outperforms the classic Top-N trading strategy and two state-of-the-art media-aware traders.