2012
DOI: 10.2197/ipsjjip.20.216
|View full text |Cite
|
Sign up to set email alerts
|

TSUBAKI: An Open Search Engine Infrastructure for Developing Information Access Methodology

Abstract: Due to the explosive growth in the amount of information in the last decade, it is getting extremely harder to obtain necessary information by conventional information access methods. Hence, creation of drastically new technology is needed. For developing such new technology, search engine infrastructures are required. Although the existing search engine APIs can be regarded as such infrastructures, these APIs have several restrictions such as a limit on the number of API calls. To help the development of new … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
6
4

Relationship

2
8

Authors

Journals

citations
Cited by 18 publications
(13 citation statements)
references
References 12 publications
0
11
0
Order By: Relevance
“…On the StackFAQ dataset, we further report the result of (Sakata et al, 2019), which serves as the strongest supervised baseline. This baseline combines two methods: TSUBAKI (Shinzato et al, 2008) -a search engine for Q-to-q matching; and a supervised fine-tuned BERT model for Q-to-a matching. We put the results of this work (that were available only on the StackFAQ dataset), just to emphasize that our approach can reach the quality of a supervised approach, and not to directly compare with it.…”
Section: Baselinesmentioning
confidence: 99%
“…On the StackFAQ dataset, we further report the result of (Sakata et al, 2019), which serves as the strongest supervised baseline. This baseline combines two methods: TSUBAKI (Shinzato et al, 2008) -a search engine for Q-to-q matching; and a supervised fine-tuned BERT model for Q-to-a matching. We put the results of this work (that were available only on the StackFAQ dataset), just to emphasize that our approach can reach the quality of a supervised approach, and not to directly compare with it.…”
Section: Baselinesmentioning
confidence: 99%
“…Target Pattern Pairs We extracted our binary patterns from the TSUBAKI corpus (Shinzato et al, 2008) of 600 million Japanese web pages. Binary patterns are defined as sequences of words on the path of dependency relations connecting two nouns in a sentence and have two variables.…”
Section: Target Data and Baseline Classifiersmentioning
confidence: 99%
“…We used 100,000 Japanese sentences to evaluate our approach. These sentences were obtained from an open search engine infrastructure TSUB-AKI (Shinzato et al, 2008), which included at least one hiragana character and consisted of more than twenty characters…”
Section: Settingmentioning
confidence: 99%