The authors propose a model for analyzing English sentences including coordinate conjunctions such as "and", "or", "but" and the equivalent words. Syntactic analysis of the English coordinate sentences is one of the most difficult problems for machine translation (MT) systems. The problem is selecting, from all possible candidates, the correct syntactic structure formed by an individual coordinate conjunction, i.e. determining which constituents are coordinated by the conjunction. Typically, so many possible structures are produced that MT systems cannot select the correct one, even if the grammars allow to write the rules in the simple notations. This paper presents an English coordinate structure analysis model, which provides topdown scope information of the correct syntactic structure by taking advantage of the symmetric patterns of the parallelism. The model is based on a balance matching operation for two lists of the feature sets, which provides four effects: the reduction of analysis cost, the improvement of word disambiguation, the interpretation of ellipses, and robust analysis. This model was practically implemented and incorporated into the English-Japanese MT system, and provided about 75% accuracy in the practical translation use.
This paper proposes a method by which 5WlH (who, when, where, what, why, how, and predicate) information is used to classify and navigate Japaneselanguage texts. 5WlH information, extracted from text data, has an access platform with three functions: episodic retrieval, multi-dimensional classification, and overall classification. In a six-month trial, the platform was used by 50 people to access 6400 newspaper articles. The three functions proved to be effective for office documentation work and the precision of extraction was approximately 82%.
In an office,it is necessary for understanding the temporal transition and the overall situation on an event from various information to extract and abstract a large number of documents.This paper proposes two robust methods for generating an extract and an abstract from documents:an episodic extraction method which generates an extract on the temporal transition of an event and an overall abstraction method which generates an abstract of overall documents for survey.The episodic extraction method retrieves documents including the 5W1H(who,when,where,what,why, how and predicates)information which specifies an event and generates an extract on the temporal transition of the event.The overall abstraction method abstracts documents by replacing 5W1H elements in each document with their upper categories in a thesaurus.These methods proved to be effective for office work from an application to 10000 news articles and 2500 sales reports.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.