Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2005
DOI: 10.1145/1076034.1076180
|View full text |Cite
|
Sign up to set email alerts
|

Indexing emails and email threads for retrieval

Abstract: Electronic mail poses a number of unusual challenges for the design of information retrieval systems and test collections, including informal expression, conversational structure, variable document granularity (e.g., messages, threads, or longer-term interactions), a naturally occuring integration between free text and structural metadata, and incompletely characterized user needs. This paper reports on initial experiments with a large collection of public mailing lists from the World Wide Web consortium that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Year Published

2006
2006
2019
2019

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(20 citation statements)
references
References 0 publications
0
20
0
Order By: Relevance
“…These heuristics are imperfect, and after we submitted our runs we found one more quotation pattern (lines below "Reply Separator"); additional as-yet undetected patterns probably remain to be found. Regardless, the patterns that we did use provide a reasonable basis for exploring the effect of suppression of (probably duplicated) quoted text within the threads (Wu and Oard, 2005).…”
Section: Detection Of Quoted Textmentioning
confidence: 99%
See 1 more Smart Citation
“…These heuristics are imperfect, and after we submitted our runs we found one more quotation pattern (lines below "Reply Separator"); additional as-yet undetected patterns probably remain to be found. Regardless, the patterns that we did use provide a reasonable basis for exploring the effect of suppression of (probably duplicated) quoted text within the threads (Wu and Oard, 2005).…”
Section: Detection Of Quoted Textmentioning
confidence: 99%
“…We constructed threads automatically based on subject line repetition patterns that are indicative of the use of the reply function in widely-used email clients. This was done by using Lucene 4 to index the text from the subject field with a short stopword list consisting of "re:," "fw:," and "fwd:"), searching the collection with every subject line as a query, removing duplicate results, and then sorting the resulting threads in chronological order (Wu and Oard, 2005). Note that this simple process creates a single sequence rather than the richer tree representation normally associated with threading.…”
Section: Document Expansion With Threadsmentioning
confidence: 99%
“…We can clearly see that, the space of all possible solutions is very large. To overcome this difficulty, some prior work [2] [3] completely ignore the tree structure of conversations and find conversations as clusters of emails without considering any tree structure inside them. Finally, they arrange emails that belong to the same conversation in a chronological order, and so the structure of conversations becomes linear.…”
Section: Introductionmentioning
confidence: 99%
“…In fact, they consider another search space that is very smaller than the real space. Some work [2][4] also have made some assumptions to restrict the original search space. For example, they considered all of the emails of some conversation to have the same subject line.…”
Section: Introductionmentioning
confidence: 99%
“…Not the entire data is useful -for instance, the dev part is rarely used despite its size. While there are not so many near-duplicates in the lists part, only about 60,000 e-mails are single messages and the rest of them belongs to about 21,000 multi-message threads (Wu and Oard, 2005). In contrast, www part contains a lot of "almost near-duplicates", e.g.…”
Section: Evaluation Standardsmentioning
confidence: 99%