Search results generated by searchable databases are served dynamically and far larger than the static documents on the Web. These results pages have been referred to as the Deep Web [1]. We need to extract the target data in results pages to integrate them on different searchable databases. We propose a testbed for information extraction from search results. We chose 100 databases randomly from 114,540 pages with search forms. Therefore, these databases have a good variety. We selected 51 databases which include URLs in a results page and manually identify target information to be extracted. We also suggest evaluation measures for comparing extraction methods and methods for extending the target data.
Blog articles by tourists contain interesting and personal experiences of where and how they have gone, what they have done and what they thought. Such individual experiences are helpful in many cases compared to the general and official information about the tourist resort by tourist agents. However, it is not easy to choose related articles and to extract still more nearly required information from these unsorted blog articles. This paper proposes a technique of feature extraction by dependency analysis of verbs and objects in those sentences that describe tourist's behavior. This paper applied the method to 7,917,385 blog articles on Kyushu area and reports some analysis on "where and what did they eat" as case studies.
To help researchers in building a knowledge foundation of their research fields which could be a time-consuming process, the authors have developed a Cross Tabulation Search Engine (CTSE). Its purpose is to assist researchers in 1) conducting research surveys, 2) efficiently and effectively retrieving information (such as important researchers, research groups, keywords), and also 3) providing analytical information relating to past and current research trends in a particular field. Their CTSE system employs data-processing technologies and emphasizes the use of a “Learn by Searching” learning strategy to support students to analyze such research trends. To show the effectiveness of CTSE, a pilot experiment has been conducted, where participants were assigned to do research survey tasks and then answer a questionnaire regarding the effectiveness and usability of the system. The results showed that the system has been helpful to students in conducting research surveys, and the research trend transitions that our system presented were effective for producing research trend surveys. Moreover, the results showed that most students had favorable attitudes toward the usage and usability of the system, and those students were satisfied in gaining more know ledge in a particular research field in a short period.
There is a huge demand on multilingual tourism information of Japan because of the increasing number of tourists from foreign countries. Most of them may expect typical and stereotyped culture, nature, and modern society of Japan. However, people from different backgrounds, cultures, and languages might expect different aspects of Japan, as well. In this paper, we analyze these kinds of differences as the cultural tourism preference for Japan. We propose a machine-learning-based method to figure out the cultural tourism preference of people of different countries based on comparing the access logs to a multilingual tourism information site in different languages. We focus our discussion on the pages accessed in Thai and Vietnamese languages. Our research result shows that for Thai tourists the characteristic features are the famous places in an area and local specialties, but Vietnamese tourists pay much more attention to facilities and location of hotels. This difference was not observable by naive extraction of keywords and their visualization. This result has been used as a guide to the further creation of content in the tourism information site.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.