This paper discusses the basic design of the encoding scheme described by the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange (TEI document number TEI P3, hereafter simply P3 or the Guidelines). ~ It first reviews the basic design goals of the TEI project and their development during the course of the project. Next, it outlines some basic notions relevant for the design of any markup language and uses those notions to describe the basic structure of the TEI encoding scheme. It also describes briefly the "core" tag set defined in chapter 6 of P3, and the "default text structure" defined in chapter 7 of that work. The final section of the paper attempts an evaluation of P3 in the light of its original design goals, and outlines areas in which further work is still needed.
The British National Corpus (BNC) has been a major influence on the construction of language corpora during the last decade, if only as a major reference point. This corpus may be seen as the culmination of a research tradition going back to the one-million word Brown corpus of 1964, but its constitution and its industrial-scale production techniques look forward to a new world in which language-focussed engineering and software development are at the heart of the information society instead of lurking on its academic fringes.This paper attempts to review the design and management issues and decisions taken during the construction of the BNC and to suggest what lessons have been learned over the last five years about how such corpus building exercises can most usefully be extended into the new century.
The Text Encoding Initiative was born into quite a different world from that of today. In 1987, there was no such thing as the World Wide Web, and construction of the tunnel beneath the English Channel had only just begun. A major political power called the Union of Soviet Socialist Republics still existed, while in the UK, Margaret Thatcher's government had just been reelected for a third time, and in the US the Senate rejected for the first (and so far only) time a presidential nomination to the Supreme Court. In academic life, it was still (just about) possible to finance an undergraduate degree on the basis of government grants. A typical "home computer" cost about 1,500 pounds in the UK, had an Intel 80286 processor and up to 640 Kb of memory, with maybe up to 50 Mb of storage on its internal hard disk, and probably ran some version of Microsoft's ubiquitous MS-DOS, unless of course it was a Macintosh. New machines were beginning to appear on the market, some of them with nearly enough memory and processing power to run Microsoft's new Windows operating system, or IBM's optimistically named "OS/2," also launched in this year. And meanwhile in another part of the forest Steve Jobs was busy imagining the Next computer, which would run something like Unix, but with a Windowing interface. However, any serious computing would still be done on your departmental minicomputer (perhaps a VAX or a PDP) or your institutional "mainframe," as the massive energy-hungry arrays of transistors and magnetic storage systems sold by such companies as IBM, Univac, Burroughs, ICL, or Control Data were known. 2 At the same time, much of the work done on those massive machines looks quite familiar today. The process of digitization of the office environment had already begun in some scientific disciplines with software such as TeX, Scribe, or tRoff becoming dominant in
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.