Sentence embeddings encode information relating to the usage of idioms in a sentence. This paper reports a set of experiments that combine a probing methodology with input masking to analyse where in a sentence this idiomatic information is taken from, and what form it takes. Our results indicate that BERT's idiomatic key is primarily found within an idiomatic expression, but also draws on information from the surrounding context. Also, BERT can distinguish between the disruption in a sentence caused by words missing and the incongruity caused by idiomatic usage.
This article examines the basis of Natural Language Understanding of transformer based language models, such as BERT. It does this through a case study on idiom token classification. We use idiom token identification as a basis for our analysis because of the variety of information types that have previously been explored in the literature for this task, including: topic, lexical, and syntactic features. This variety of relevant information types means that the task of idiom token identification enables us to explore the forms of linguistic information that a BERT language model captures and encodes in its representations. The core of this article presents three experiments. The first experiment analyzes the effectiveness of BERT sentence embeddings for creating a general idiom token identification model and the results indicate that the BERT sentence embeddings outperform Skip-Thought. In the second and third experiment we use the game theory concept of Shapley Values to rank the usefulness of individual idiomatic expressions for model training and use this ranking to analyse the type of information that the model finds useful. We find that a combination of idiom-intrinsic and topic-based properties contribute to an expression's usefulness in idiom token identification. Overall our results indicate that BERT efficiently encodes a variety of information from topic, through lexical and syntactic information. Based on these results we argue that notwithstanding recent criticisms of language model based semantics, the ability of BERT to efficiently encode a variety of linguistic information types does represent a significant step forward in natural language understanding.
We analyse the time series of solar irradiance measurements using chaos theory.The False Nearest Neighbour method (FNN), one of the most common methods of chaotic analysis is used for the analysis. One year data from the weather station located at Nanyang Technological University (NTU) Singapore with a temporal resolution of 1 minute is employed for the study. The data is sampled at 60 minutes interval and 30 minutes interval for the analysis using the FNN method. Our experiments revealed that the optimum dimension required for solar irradiance is 4 for both samplings. This indicates that a minimum of 4 dimensions is required for embedding the data for the best representation of input. This study on obtaining the embedding dimension of solar irradiance measurement will greatly assist in fixing the number of previous data required for solar irradiance forecasting.
Solar irradiance is the primary input for all solar energy generation systems. The amount of available solar radiation over time under the local weather conditions helps to decide the optimal location, technology and size of a solar energy project. We study the behaviour of incident solar irradiance on the earth's surface using weather sensors. In this paper, we propose a time-series based technique to forecast the solar irradiance values for shorter lead times of upto 15 minutes. Our experiments are conducted in the tropical region viz. Singapore, which receives a large amount of solar irradiance throughout the year. We benchmark our method with two common forecasting techniques, namely persistence model and average model, and we obtain good prediction performance. We report a root mean square of 147 W/m 2 for a lead time of 15 minutes.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.