SHOMA at Parseme Shared Task on Automatic Identification of VMWEs: Neural Multiword Expression Tagging with High Generalisation

Taslimipoor, Shiva; Rohanian, Omid

doi:10.48550/arxiv.1809.03056

Cited by 3 publications

(6 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Such cases are rare in the corpora, and as such do not greatly impact the data. One paper (Walsh et al, 2022) attempts to address this problem of overlapping or shared-token expressions by modifying the BIOstyle encoding, while another paper (Taslimipoor and Rohanian, 2018) appends multiple categories separated by a semicolon, similar to the CUPT-style encoding.…”

Section: Corpus and Splitsmentioning

confidence: 99%

“…Analyses tended to take one of two forms: example-based analysis reporting individual instances where the model performed better or worse than usual (Klyueva et al, 2017;Walsh et al, 2022), and automatic metrics aggregated across particular properties or phenomena. Among the focused metrics, some papers pay special attention to discontinuities (Björne and Salakoski, 2016;Moreau et al, 2018;Berk et al, 2018a;Rohanian et al, 2019) and seen/unseen MWEs (Maldonado et al, 2017;Zampieri et al, 2018;Taslimipoor and Rohanian, 2018). Some studies analyse the model's features and modules via ablation experiments (Scherbakov et al, 2016;Tang et al, 2016;Stodden et al, 2018;Pasquer et al, 2020a).…”

Section: Error Analysismentioning

confidence: 99%

“…One exception was made for the SHOMA system paper, available only on arXiv, but listed in the PARSEME ST 1.1 paper and website(Taslimipoor and Rohanian, 2018).…”

mentioning

confidence: 99%

See 2 more Smart Citations

A Survey of MWE Identification Experiments: The Devil is in the Details

Ramisch,

Walsh,

Blanchard

et al. 2023

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

View full text Add to dashboard Cite

Multiword expression (MWE) identification has been the focus of numerous research papers, especially in the context of the DiMSUM and PARSEME Shared Tasks (STs). This survey analyses 40 MWE identification papers with experiments on data from these STs. We look at corpus selection, pre-and post-processing, MWE encoding, evaluation metrics, statistical significance, and error analyses. We find that these aspects are usually considered minor and/or omitted in the literature. However, they may considerably impact the results and the conclusions drawn from them. Therefore, we advocate for more systematic descriptions of experimental conditions to reduce the risk of misleading conclusions drawn from poorly designed experimental setup.

show abstract

Section: Corpus and Splitsmentioning

confidence: 99%

Section: Error Analysismentioning

confidence: 99%

See 1 more Smart Citation

A Survey of MWE Identification Experiments: The Devil is in the Details

Ramisch,

Walsh,

Blanchard

et al. 2023

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

View full text Add to dashboard Cite

show abstract

“…In recent years, deep learning has demonstrated remarkable success in sequence tagging tasks, including MWE identification (Ramisch et al, 2018;Taslimipoor and Rohanian, 2018). RNNs and ConvNets have shown significant progress in this area.…”

Section: Related Workmentioning

confidence: 99%

Predicting Compositionality of Verbal Multiword Expressions in Persian

Sarlak,

Yarandi,

Shamsfard

2023

Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)

View full text Add to dashboard Cite

The identification of Verbal Multiword Expressions (VMWEs) presents a greater challenge compared to non-verbal MWEs due to their higher surface variability. VMWEs are linguistic units that exhibit varying levels of semantic opaqueness and pose difficulties for computational models in terms of both their identification and the degree of compositionality. In this study, a new approach to predicting the compositional nature of VMWEs in Persian is presented. The method begins with an automatic identification of VMWEs in Persian sentences, which is approached as a sequence labeling problem for recognizing the components of VMWEs. The method then creates word embeddings that better capture the semantic properties of VMWEs and uses them to determine the degree of compositionality through multiple criteria. The study compares two neural architectures for identification, BiLSTM and ParsBERT, and shows that a fine-tuned BERT model surpasses the BiLSTM model in evaluation metrics with an F1 score of 89%. Next, a word2vec embedding model is trained to capture the semantics of identified VMWEs and is used to estimate their compositionality, resulting in an accuracy of 70.9% as demonstrated by experiments on a collected dataset of expert-annotated compositional and non-compositional VMWEs.

show abstract

“…However, such techniques have been proven effective only when dealing with very specific MWE classes [5]. In addition, it should be noted that the models that performed the best in the last two editions of the PARSEME shared task: SHOMA [24] and MTLB-STRUCT [23], both based on deep learning, achieved the best score also on unseen MWEs [17,18].…”

Section: Introduction and State Of The Artmentioning

confidence: 99%

The Role of Semi-productivity in Multiword Expression Identification: Why can BERT Capture novel MWEs?

Cirillo¹,

Paone²

2022

Proceedings of the International Conference EUROPHRAS 2022 (Short Papers, Posters and MUMTTT Workshop Contributions)

View full text Add to dashboard Cite

In this paper, we argue that multiword expression identification systems based on BERT are able to capture semi-productive patterns that generate multiword expressions. To test this hypothesis we analyzed the results obtained by MTLB-STRUCT on unseen multiword expressions during edition 1.2 of the PARSEME shared task. We observed that MTLB-STRUCT discovers, in proportion, more light verb constructions and verb particle constructions than verbal idioms. Since light verb constructions and verb-particle constructions often result from semi-productive patterns, while verbal idioms are more idiosyncratic, the results corroborate our hypothesis.

show abstract

SHOMA at Parseme Shared Task on Automatic Identification of VMWEs: Neural Multiword Expression Tagging with High Generalisation

Cited by 3 publications

References 15 publications

A Survey of MWE Identification Experiments: The Devil is in the Details

A Survey of MWE Identification Experiments: The Devil is in the Details

Predicting Compositionality of Verbal Multiword Expressions in Persian

The Role of Semi-productivity in Multiword Expression Identification: Why can BERT Capture novel MWEs?

Contact Info

Product

Resources

About