2021
DOI: 10.48550/arxiv.2101.00411
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Substructure Substitution: Structured Data Augmentation for NLP

Abstract: We study a family of data augmentation methods, substructure substitution (SUB 2 ), for natural language processing (NLP) tasks. SUB 2 generates new examples by substituting substructures (e.g., subtrees or subsequences) with ones with the same label, which can be applied to many structured NLP tasks such as part-of-speech tagging and parsing. For more general tasks (e.g., text classification) which do not have explicitly annotated substructures, we present variations of SUB 2 based on constituency parse trees… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 57 publications
0
5
0
Order By: Relevance
“…This is known as structure augmentation (also known as substructure augmentation); in our case, this form of data augmentation divides the research data into two data tree structures (i.e., aerospace and aviation). This structural data augmentation allows us to perform comparative NLP tasks such as text parsing, textual classification, and comparative token analysis (Shi et al, 2021).…”
Section: Methodsmentioning
confidence: 99%
“…This is known as structure augmentation (also known as substructure augmentation); in our case, this form of data augmentation divides the research data into two data tree structures (i.e., aerospace and aviation). This structural data augmentation allows us to perform comparative NLP tasks such as text parsing, textual classification, and comparative token analysis (Shi et al, 2021).…”
Section: Methodsmentioning
confidence: 99%
“…This is known as structure augmentation (also known as substructure augmentation); in our case, this form of data augmentation divides the research data into two data tree structures (i.e., aerospace and aviation). This structural data augmentation allows us to perform comparative NLP tasks such as text parsing, textual classification, and comparative token analysis (Shi et al, 2021).…”
Section: Methodsmentioning
confidence: 99%
“…proposed a multi-task view of DA. SUB 2 (Shi et al, 2021) generates new examples by substituting substructures via constituency parse trees. Although these methods are easy to implement, they do not consider controlling data quality and diversity.…”
Section: Rule-based Methodsmentioning
confidence: 99%