0DULH)UDQFLQH 0RHQV DQG 5LN 'H %XVVHU .DWKROLHNH 8QLYHUVLWHLW /HXYHQ %HOJLXP ,QWHUGLVFLSOLQDU\ &HQWUH IRU /DZ ,7 7LHQVHVWUDDW % /HXYHQ %HOJLXP [[ WHO [[ PDULHIUDQFHPRHQV#ODZNXOHXYHQDFEH ULNGHEXVVHU#ODZNXOHXYHQDFEH ABSTRACTTopic segmentation is an important initial step in many text-based tasks. A hierarchical representation of a text's topics is useful in retrieval and allows judging relevancy at different levels of detail. This short paper describes research on generic algorithms for topic detection and segmentation that are applicable on texts of heterogeneous types and domains.
The MOSAIC project investigates a retrieval model for court decisions based on structured and unstructured (natural language) information in legal cases. This paper focuses on how relevant information in court decisions can function as a key for retrieval and on the automated construction of case representations. Techniques of automated concept learning and rhetorical structure identification are among the most promising ones. # Automated retrieval from large document collections was one of the earliest applications of computer science to law. In 1961 the US Air Force contracted with the University of Pittsburgh for building a full text retrieval system for legal documents. As a result, finding legal information through electronics (FLITE) system saw its first productive use in 1964. A few years later, the US Department of Justice developed the JURIS system, which has been in use since 1971. More recently, commercial systems such as LEXIS-NEXIS and Westlaw, which offer interactive retrieval through terminals at the customer's office, have gained widespread acceptance in circles of legal professionals. In the European Union, there are many databases of court decisions, most of which can be consulted by any citizen via the World Wide Web. Present-day retrieval systems allow users to express their query with a set of key terms. In some systems, key terms can be used in combination with Boolean operators. The result of such a search is a list of documents. These are usually sorted by 'relevance', which most of the time is simply computed as a function of the frequency of occurrence of the search terms in the documents. Documents are returned as being relevant if they contain the query terms or if keywords that were manually assigned to them match the query terms. Thus, current commercial retrieval systems for searching court decisions either rely on a manually built index of cases or on a full-text search. In manual indexing, texts or textual passages are linked to the concepts of a pre-defined thesaurus or classification scheme; case texts are linked by means of citation links or common descriptors. The disadvantages of manual indexing are the huge cost}a problem that will only aggravate with the current growth of the number of cases}and the large amount of inconsistencies between different indexers. In a full-text search, each term (except for stopwords z ) acts as a search key. The major disadvantage of a full-text search is that its retrieval results are quite unreliable, since the occurrence of a particular key word or key phrase in a text is no guarantee that the text is a relevant output for the search request. This is a fundamental and rather problematic issue, which bears on most retrieval tasks but is especially important when retrieving legal cases. You cannot search for meaning in a case solely by taking into account word occurrences. Another disadvantage of full-text searches is that they often overwhelm users with documents that are not or only marginally relevant.z Stopwords are small words}like articl...
This paper discusses two major ways in which the introduction of Christianity exerted an important influence on the Bunun language. In the second half of the twentieth century, Christian churches were instrumental in the protection of indigenous languages, including Bunun, against the cultural and linguistic unification policies of the Taiwanese government. In a different way, work on Bible translation in Bunun has resulted in the creation of a pan-dialectal religious vocabulary and led to the creation of a de facto standard variant of the language based on the Isbukun dialect. Today, a complex relationship exists between this written standard and other Bunun dialects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.