2021
DOI: 10.7191/jeslib.2021.1211
|View full text |Cite
|
Sign up to set email alerts
|

Plain Text & Character Encoding: A Primer for Data Curators

Abstract: Plain text data consists of a sequence of encoded characters or “code points” from a given standard such as the Unicode Standard. Some of the most common file formats for digital data used in eScience (CSV, XML, and JSON, for example) are built atop plain text standards. Plain text representations of digital data are often preferred because plain text formats are relatively stable, and they facilitate reuse and interoperability. Despite its ubiquity, plain text is not as plain as it may seem. The set of standa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…Most of the writings I reviewed in preparation for this report focus on the basics: introducing key standards (e.g., ASCII, Unicode) and the complications of identifying and converting plain text formats. Plain Text & Character Encoding: A Primer for Data Curators by Seth Erickson (2021) provides an excellent introduction to character encoding as it relates to digital preservation. Dig a little deeper and one will inevitably find more ethically and culturally-grounded conversations surrounding the implications of diacritic handling, examining the "racial and Western-centric implications of considering certain characters to be 'illegal'" (Blewer, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…Most of the writings I reviewed in preparation for this report focus on the basics: introducing key standards (e.g., ASCII, Unicode) and the complications of identifying and converting plain text formats. Plain Text & Character Encoding: A Primer for Data Curators by Seth Erickson (2021) provides an excellent introduction to character encoding as it relates to digital preservation. Dig a little deeper and one will inevitably find more ethically and culturally-grounded conversations surrounding the implications of diacritic handling, examining the "racial and Western-centric implications of considering certain characters to be 'illegal'" (Blewer, 2019).…”
Section: Introductionmentioning
confidence: 99%