We investigate how two factors influence the way people formulate information requests. Our first factor, medium, considers whether the request is produced using text or voice. Our second factor, target, considers whether the request is intended for a search engine or a human search intermediary. In particular, we study how these two factors influence the way people formulate requests in situations where the information need has a specific type of extra-topical dimension. We focus on six extra-topical dimensions: (1) domain knowledge, (2) viewpoint, (3) experiential, (4) venue location, (5) source location, and (6) temporal. We analyzed information requests gathered through a crowdsourced study and address three research questions. We study the effects of our two factors (medium and target) on: (RQ1) participants' perceptions about their information requests, (RQ2) the different characteristics of their information requests (e.g., natural language structure, retrieval performance), and (RQ3) participants' strategies for requesting information when the search task has a specific type of extra-topical dimension. Our results found that both factors influenced participants' perceptions about their own information requests, the characteristics of participants' requests, and the strategies used by participants to request information matching the extra-topical dimension. Our results call for future research on retrieval algorithms that can effectively harness (rather than ignore) extra-topical query-terms.
When content consumers explicitly judge content positively, we consider them to be engaged. Unfortunately, explicit user evaluations are difficult to collect, as they require user effort. Therefore, we propose to use device interactions as implicit feedback to detect engagement. We assess the usefulness of swipe interactions on tablets for predicting engagement, and make the comparison with using traditional features based on time spent. We gathered two unique datasets of more than 250,000 swipes, 100,000 unique article visits, and over 35,000 explicitly judged news articles, by modifying two commonly used tablet apps of two newspapers. We tracked all device interactions of 407 experiment participants during one month of habitual news reading. We employed a behavioral metric as a proxy for engagement, because our analysis needed to be scalable to many users, and scanning behavior required us to allow users to indicate engagement quickly. We point out the importance of taking into account content ordering, report the most predictive features, zoom in on briefly read content and on the most frequently read articles. Our findings demonstrate that fine-grained tablet interactions are useful indicators of engagement for newsreaders on tablets. The best features successfully combine both time-based aspects and swipe interactions.
We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.
Sentence regression is a type of extractive summarization that achieves state-of-the-art performance. The most challenging task within the sentence regression framework is to identify discriminative features to represent each sentence. In this work, we study the use of sentence relations, e.g., Contextual Sentence Relations (CSR), Title Sentence Relations (TSR), and Query Sentence Relations (QSR), so as to improve the performance of sentence regression. CSR, QSR, and TSR refer to the relations between a main body sentence and its local context, its document title, and a given query respectively. We propose a deep neural network model, Sentence Relation-based Summarization (SRSum), that consists of five sub-models, PriorSum, CSRSum, TSRSum, QSRSum, and SFSum. PriorSum encodes the latent semantic meaning of a sentence. SFSum encodes the surface information of a sentence. CSRSum evaluates the ability of each sentence to summarize its local contexts. TSRSum evaluates the semantic closeness of each sentence with respective to its title, which usually reflects the main ideas of a document.QSRSum evaluates the relevance of each sentence with given queries for the query-focused summarization. We conduct extensive experiments on six benchmark datasets including generic multi-document summarization and query-focused multi-document summarization. On both tasks, SRSum achieves comparable or superior performances with state-of-the-art approaches in terms of multiple ROUGE metrics.
Location extraction, also called toponym extraction, is a field covering geoparsing, extracting spatial representations from location mentions in text, and geotagging, assigning spatial coordinates to content items. Location extraction typically performs location identification followed by location disambiguation. This paper presents two best of class location extraction algorithms, one focussed on geoparsing using an OpenStreetMap database and one on geotagging using a model constructed from combination of a social media tag dataset and multiple gazetteers. We perform two benchmark evaluations, examining performance of location identification on tweets and location estimation on Flickr posts respectively, and directly compare results to a previously published best of class entity recognition technique. We also perform a qualitative evaluation recalling topN location mentions from tweets during several major news events. The map-database approach was best (F1 0.90) for location identification and the tags-gazetteer approach best ([email protected] 0.49) for location estimation. In the case study the map-database was strongest ([email protected] 0.60+). We analyze in detail the strengths and weaknesses of each approach, and suggest concrete areas for further research to improve location extraction in the future.
This paper presents a framework for speedy video matching and retrieval through detection and measurement of visual similarity. The frameworks efficiency stems from its power to encode a given shot content into compact fixed length signature that facilitates towards a robust real-time matching. Separate scene and motion signatures are developed and fused together to fully represent and match respective video shots. The framework works on thumbnail images (DC-image from the MPEG stream). Scene information is captured through the Statistical Dominant Colour Profile (SDCP), while motion information is captured through a graph-based signature, called the Dominant Colour Graph Profile (DCGP). The SDCP is a fixed-length compact signature that statistically encodes the colours spatio-temporal patterns across video frames. The DCGP is a fixed-length signature that records and tracks blocks movement across video frames, where the graph structural properties are used to extract the signature values. Finally, the overall video signature is generated by fusing the individual scene and motion signatures, where the matching is done by directly comparing the respective fused video signatures. The signature-based aspect of the proposed framework is the key to its high matching speed (>2000 fps), compared to the current techniques that relies on exhaustive processing, such as dense sampling.