Agreement statistics play an important role in the evaluation of coding schemes for discourse and dialogue. Unfortunately there is a lack of understanding regarding appropriate agreement measures and how their results should be interpreted. In this article we describe the role of agreement measures and argue that only chance-corrected measures that assume a common distribution of labels for all coders are suitable for measuring agreement in reliability studies. We then provide recommendations for how reliability should be inferred from the results of agreement statistics.Since Jean Carletta (1996) exposed computational linguists to the desirability of using chance-corrected agreement statistics to infer the reliability of data generated by applying coding schemes, there has been a general acceptance of their use within the field. However, there are prevailing misunderstandings concerning agreement statistics and the meaning of reliability.Investigation of new dialogue types and genres has been shown to reveal new phenomena in dialogue that are ill suited to annotation by current methods and also new annotation schemes that are qualitatively different from those commonly used in dialogue analysis. Previously prescribed practices for evaluating coding schemes become less applicable as annotation schemes become more sophisticated. To compensate, we need a greater understanding of reliability statistics and how they should be interpreted. In this article we discuss the purpose of reliability testing, address certain misunderstandings, and make recommendations regarding the way in which coding schemes should be evaluated.
Agreement, Reliability, and Coding SchemesAfter developing schemes for annotating discourse or dialogue, it is necessary to assess their suitability for the purpose for which they are designed. Although no statistical test can determine whether any form of annotation is worthwhile or how applications will benefit from it, we at least need to show that coders are capable of performing the annotation. This often means assessing reliability based on agreement between annotators applying the scheme. Agreement measures are discussed in detail in section 2.Much of the confusion regarding which agreement measures to apply and how their results should be interpreted stems from a lack of understanding of what it means to
During Paleogene times up to 15 000 ft (4570 m) of clastic sediment was deposited in the Faeroe Basin, north of the Shetland Islands. A sequence stratigraphic study has shown that the Paleogene deposition in the Faeroe Basin was cyclic with prominent basinward and landward shifts in sedimentation. The correlation of major unconformity surfaces allowed the section to be subdivided into genetically related packages.The sequence stratigraphic study utilized approximately 5000 km of seismic data with a line density of approximately 10 x 20 km. Available well control was integrated into the study by means of synthetic seismograms. Limited palaeontological control utilizing largely dinocysts and radiolaria allowed the identification of 11 Paleocene/Eocene bioevents.The section was subdivided into nine Paleocene and six Eocene sequences, each separated by Type 1 unconformities. Four of the Paleocene and one of the Eocene packages had evidence of multiple Type 1 unconformities and these are described as sequence sets.Sequence development has been related to the tectonic subsidence history of the basin. Early in Paleocene times, rapid subsidence resulted in the deposition of thick sequences with distinct shelf, slope and basinal systems. Nine sequences were deposited with a combined maximum thickness of 12000 ft (3660 m). The following period, late Paleocene to early Eocene, was marked by slower subsidence; thin sequences and ramp systems with seven sequences were deposited with a maximum thickness of 2400 ft (730 m). More rapid subsidence during the late Eocene resulted in five sequences with distinct shelf, slope and basinal systems with a thickness of up to 3500 ft (1070 m). The periods of slower subsidence in the Faeroe Basin may have occurred in response to active rifting in other adjacent basins along the Atlantic margin. The distinct basinal systems which developed during times of more rapid subsidence were more likely to develop sand-prone basin floor deposits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.