Abstract. The majority of taxonomic descriptions are currently in print format. The majority of digital descriptions are in a format, such as DOC, HTML, or PDF, for human readers. These formats do not convey rich semantics in taxonomic descriptions for computer-aided processing. Newer digital formats, such as XML and RDF, accommodate semantic annotations that allow a computer to process the rich semantics on human's behalf, opening up opportunities for a wide range of innovative usages of taxonomic descriptions, including searching in more precise and flexible ways, integrating morphological, genomic, georeference, or other information, automatically generating taxonomic keys, and knowledge mining and visualizing taxonomic data etc. This paper reports our experience with the development of an automated semantic markup system named MARTT and discusses challenging issues involved. To address these challenging issues, a number of utilities were implemented to make MARTT a more operable system. The utilities can be used to speed up the preparation of training examples for MARTT, to facilitate the creation of more comprehensive annotation schemas, and to predict system performance on a new collection of descriptions. MARTT has been tested on several plant and alga taxonomic publications including Flora of China, Flora of North America, and Flora of North Central Texas.Key words. Digital formats, morphological descriptions, semantic markup, supervised machine learning, system evaluation , taxonomic descriptions, unsupervised machine learning, XML.Taxonomic descriptions of living organisms are a major information resource used by systematists and evolutionary biologists. The majority of such information is in a print or digital format for human readers.On-going and planned digitalization projects such as those initiated by the Global Biodiversity Information Facility (GBIF, 2007) and the Biodiversity Heritage Library (BHL, 2007) will likely increase the volumes of taxonomic descriptions in legacy formats (e.g., DOC, HTML, or PDF). These documents will have to be converted to a new digital format such as XML or RDF to allow for any innovative usages beyond keyword-based search. Due to the scale of the problem, automated means for the conversion must be sought.Large volumes of taxonomic descriptions, print or digital, have been produced over the past two hundred years. While descriptions created by trained taxonomists are of high quality and provide consistent information in general, there is not a well-defined and well-accepted standard to regulate the content of a description. A manual comparison among the descriptions of five plant species, found in six well-known floras, revealed surprisingly large variations in terms of description content and style (Lydon et al, 2003). Lydon and colleagues found that only 9% of information was exactly the same in six sources, over 55% of information was from a single source, and around 1% of information contradicted information from another source. Besides the large variation,...