A deep learning‐based image captioning method to automatically generate comprehensive explanations of bridge damage

Chun, Pang‐jo; Yamane, Tatsuro; Maemura, Yu

doi:10.1111/mice.12793

Cited by 76 publications

(42 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…An example of a study that combines visual and language data is the image captioning model developed for bridge damages by Chun et al. (2022). The present study compared and analyzed the differences between the performances of single‐modal and multimodal prompts.…”

Section: Results and Analysismentioning

confidence: 99%

“…One of the recent solutions to this problem is to deploy multimodal data (both images and texts) or derive one type of data from the other, instead of choosing either one of them. An example of a study that combines visual and language data is the image captioning model developed for bridge damages by Chun et al (2022). The present study compared and analyzed the differences between the performances of single-modal and multimodal prompts.…”

Section: Single-modal Prompts Versus Multimodal Promptsmentioning

confidence: 99%

See 1 more Smart Citation

Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model

Yong

Jeon

Gil

et al. 2022

Computer aided Civil Eng

View full text Add to dashboard Cite

Zero‐shot learning, applied with vision‐language pretrained (VLP) models, is expected to be an alternative to existing deep learning models for defect detection, under insufficient dataset. However, VLP models, including contrastive language‐image pretraining (CLIP), showed fluctuated performance on prompts (inputs), resulting in research on prompt engineering—optimization of prompts for improving performance. Therefore, this study aims to identify the features of a prompt that can yield the best performance in classifying and detecting building defects using the zero‐shot and few‐shot capabilities of CLIP. The results reveal the following: (1) domain‐specific definitions are better than general definitions and images; (2) a complete sentence is better than a set of core terms; and (3) multimodal information is better than single‐modal information. The resulting detection performance using the proposed prompting method outperformed that of existing supervised models.

show abstract

Section: Results and Analysismentioning

confidence: 99%

Section: Single-modal Prompts Versus Multimodal Promptsmentioning

confidence: 99%

Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model

Yong

Jeon

Gil

et al. 2022

Computer aided Civil Eng

View full text Add to dashboard Cite

show abstract

“…An approach to obtain output that takes into account the relationship between member and damage names has been studied using an image captioning model [10]. In that study, a model is developed to output sentences regarding the damage and the member in which the damage occurs from bridge images, making it possible to obtain information that includes the relationships between words.…”

Section: Introductionmentioning

confidence: 99%

Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering

Yamane¹,

Chun²,

Dang³

et al. 2023

Preprint

View full text Add to dashboard Cite

In this paper, a bridge member damage cause estimation framework is proposed by calculating the image position using Structure from Motion (SfM) and acquiring its information via Visual Question Answering (VQA). For this, a VQA model was developed that uses bridge images for dataset creation and outputs the damage or member name and its existence based on the images and questions. In the developed model, the correct answer rate for questions requiring the member's name and the damage's name were 67.4% and 68.9%, respectively. The correct answer rate for questions requiring a yes/no answer was 99.1%. Based on the developed model, a damage cause estimation method was proposed. In the proposed method, the damage causes are narrowed down by inputting new questions to the VQA model, which are determined based on the surrounding images obtained via SfM and the results of the VQA model. Subsequently, the proposed method was then applied to an actual bridge and shown to be capable of determining damage and estimating its cause. The proposed method could be used to prevent damage causes from being overlooked, and practitioners could determine inspection focus areas, which could contribute to the improvement of maintenance techniques. In the future, it is expected to contribute to infrastructure diagnosis automation.

show abstract

“…In the structural dynamics field, ANN has been successfully adopted in the response prediction and performance evaluation from the vibration signals (Perez‐Ramirez et al., 2019; Y. Xu et al., 2021; Z. Xu et al., 2022). Although these networks have the inherent ability to simulate complex systems with high fidelity (Chun et al., 2022; Hornik et al., 1989; Kuok & Yuen, 2021), there exist challenges in efficient and accurate training deep networks for long‐time dependent problems (e.g., nonlinear flutter behavior; T. Wu & Kareem, 2014b). To address this issue, the long short‐term memory (LSTM) cell (Hochreiter & Schmidhuber, 1997) has been successfully employed in a recurrent neural network (RNN) architecture to simulate nonlinear unsteady aerodynamics (K. Li et al., 2019; T. Li et al., 2020; W. Li et al., 2020).…”

Section: Introductionmentioning

confidence: 99%

Modeling nonlinear flutter behavior of long‐span bridges using knowledge‐enhanced long short‐term memory network

2023

Computer aided Civil Eng

View full text Add to dashboard Cite

The nonlinear characteristics of bridge aerodynamics preclude a closed-form solution of limit-cycle oscillation (LCO) amplitude and frequency in the postflutter stage. To address this issue, a long short-term memory (LSTM) network is utilized as the reduced-order modeling of nonlinear aeroelastic forces on the bridge deck section, and it is repeatedly employed to generate force inputs at spanwise nodes of a three-dimensional (3D) finite element model (FEM) of the long-span bridge (using spatial beam elements). All LSTM networks are dynamically coupled through FEM, and the 3D nonlinear flutter response is accordingly obtained. To improve the simulation accuracy and reduce the required training data of the standard LSTM network, both general knowledge (motivated by the gating mechanism and mathematical models for information processing) and domain knowledge (resulting from the basic understanding of bridge aerodynamics) are leveraged to, respectively, customize the LSTM cell and network architecture. In addition, a fast-training algorithm effectively combining the linear convergence of stochastic gradient descent and superlinear convergence of modified Broyden-Fletcher-Goldfarb-Shanno is developed to improve the training efficiency of the obtained knowledge-enhanced LSTM network. To further advance the computational efficiency of the coupled LSTM-FEM nonlinear flutter analysis, the convolution-based numerical integration is adopted in the finite element modeling of long-span bridge dynamics. A case study of a long-span suspension bridge under strong winds demonstrates the proposed 3D nonlinear flutter analysis presents high simulation efficiency and accuracy and can be utilized to effectively obtain the nonlinear LCO characteristics in a wide range of post-flutter wind speeds.

show abstract

A deep learning‐based image captioning method to automatically generate comprehensive explanations of bridge damage

Cited by 76 publications

References 69 publications

Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model

Prompt engineering for zero‐shot and few‐shot defect detection and classification using a visual‐language pretrained model

Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering

Modeling nonlinear flutter behavior of long‐span bridges using knowledge‐enhanced long short‐term memory network

Contact Info

Product

Resources

About