Miles Efron scite author profile

Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our estimates show that genomics is a “four-headed beast”—it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the “genomical” challenges of the next decade.

show abstract

Estimation methods for ranking recent information

Efron

Golovchinsky

2011

112

View full text Add to dashboard Cite

Temporal aspects of documents can impact relevance for certain kinds of queries. In this paper, we build on earlier work of modeling temporal information. We propose an extension to the Query Likelihood Model that incorporates query-specific information to estimate rate parameters, and we introduce a temporal factor into language model smoothing and query expansion using pseudo-relevance feedback. We evaluate these extensions using a Twitter corpus and two newspaper article collections. Results suggest that, compared to prior approaches, our models are more effective at capturing the temporal variability of relevance associated with some topics.

show abstract

Questions are content: A taxonomy of questions in a microblogging environment

Efron

Winget

2010

Proc. Am. Soc. Info. Sci. Tech.

View full text Add to dashboard Cite

Microblogging services such as twitter.com have become popular venues for informal information interactions. An important aspect of these interaction is question asking. In this paper we report results from an analysis of a large sample of data from Twitter. Our analysis focused on the characteristics and strategies that people bring to asking questions in microblogs. In particular, based on our analysis, we propose a taxonomy of questions asked in microblogs. We find that microblog authors express questions to accomplish a wide variety of social and informational tasks. Some microblog questions seek immediate answers, while others accrue information over time. Our overarching finding is that question asking in microblogs is strongly tied to peoples' naturalistic interactions, and that the act of asking questions in Twitter is not analogous to information seeking in more traditional information retrieval environments.

show abstract

Hashtag retrieval in a microblogging environment

Efron

2010

122

View full text Add to dashboard Cite

Microblog services let users broadcast brief textual messages to people who "follow" their activity. Often these posts contain terms called hashtags, markers of a post's meaning, audience, etc. This poster treats the following problem: given a user's stated topical interest, retrieve useful hashtags from microblog posts. Our premise is that a user interested in topic x might like to find hashtags that are often applied to posts about x. This poster proposes a language modeling approach to hashtag retrieval. The main contribution is a novel method of relevance feedback based on hashtags. The approach is tested on a corpus of data harvested from twitter.com.

show abstract

Improving retrieval of short texts through document expansion

Efron

Organisciak

Fenlon

2012

View full text Add to dashboard Cite

Collections containing a large number of short documents are becoming increasingly common. As these collections grow in number and size, providing effective retrieval of brief texts presents a significant research problem. We propose a novel approach to improving information retrieval (IR) for short texts based on aggressive document expansion. Starting from the hypothesis that short documents tend to be about a single topic, we submit documents as pseudo-queries and analyze the results to learn about the documents themselves. Document expansion helps in this context because short documents yield little in the way of term frequency information. However, as we show, the proposed technique helps us model not only lexical properties, but also temporal properties of documents. We present experimental results using a corpus of microblog (Twitter) data and a corpus of metadata records from a federated digital library. With respect to established baselines, results of these experiments show that applying our proposed document expansion method yields significant improvements in effectiveness. Specifically, our method improves the lexical representation of documents and the ability to let time influence retrieval.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Miles Efron

Big Data: Astronomical or Genomical?

Estimation methods for ranking recent information

Questions are content: A taxonomy of questions in a microblogging environment

Hashtag retrieval in a microblogging environment

Improving retrieval of short texts through document expansion

Contact Info

Product

Resources

About