C2F-TCN: A Framework for Semi- and Fully-Supervised Temporal Action Segmentation

Singhania, Dipika; Rahaman, Rahul; Yao, Angela

doi:10.1109/tpami.2023.3284080

Cited by 7 publications

(3 citation statements)

References 80 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…After the input of the C2F module undergoes processing via a CBS, the output dimensions are transformed into B, C, H, and W, where B denotes the number of images, C indicates the number of channels, and H and W represent the height and width of the feature map, respectively [20]. The detailed structure is depicted in Figure 5.…”

Section: Yolov8 Modelmentioning

confidence: 99%

“…Essentially, the output of each BottleNeck must be preserved and utilized as input for the following BottleNeck. Eventually, all corresponding feature channels are fused, resulting in dimensions of B × (N + 2)C/2 × H × W. After the input of the C2F module undergoes processing via a CBS, the output dimensions are transformed into B, C, H, and W, where B denotes the number of images, C indicates the number of channels, and H and W represent the height and width of the feature map, respectively [20]. The detailed structure is depicted in Figure 5.…”

Section: Yolov8 Modelmentioning

confidence: 99%

See 1 more Smart Citation

Early Drought Detection in Maize Using UAV Images and YOLOv8+

Niu,

Nie,

et al. 2024

Drones

View full text Add to dashboard Cite

The escalating global climate change significantly impacts the yield and quality of maize, a vital staple crop worldwide, especially during seedling stage droughts. Traditional detection methods are limited by their single-scenario approach, requiring substantial human labor and time, and lack accuracy in the real-time monitoring and precise assessment of drought severity. In this study, a novel early drought detection method for maize based on unmanned aerial vehicle (UAV) images and Yolov8+ is proposed. In the Backbone section, the C2F-Conv module is adopted to reduce model parameters and deployment costs, while incorporating the CA attention mechanism module to effectively capture tiny feature information in the images. The Neck section utilizes the BiFPN fusion architecture and spatial attention mechanism to enhance the model’s ability to recognize small and occluded targets. The Head section introduces an additional 10 × 10 output, integrates loss functions, and enhances accuracy by 1.46%, reduces training time by 30.2%, and improves robustness. The experimental results demonstrate that the improved Yolov8+ model achieves precision and recall rates of approximately 90.6% and 88.7%, respectively. The mAP@50 and mAP@50:95 reach 89.16% and 71.14%, respectively, representing respective increases of 3.9% and 3.3% compared to the original Yolov8. The UAV image detection speed of the model is up to 24.63 ms, with a model size of 13.76 MB, optimized by 31.6% and 28.8% compared to the original model, respectively. In comparison with the Yolov8, Yolov7, and Yolo5s models, the proposed method exhibits varying degrees of superiority in mAP@50, mAP@50:95, and other metrics, utilizing drone imagery and deep learning techniques to truly propel agricultural modernization.

show abstract

Section: Yolov8 Modelmentioning

confidence: 99%

Section: Yolov8 Modelmentioning

confidence: 99%

Early Drought Detection in Maize Using UAV Images and YOLOv8+

Niu,

Nie,

et al. 2024

Drones

View full text Add to dashboard Cite

show abstract

“…Recently, semi-supervised approaches [16]- [18] for this task have attracted increasing attention, with a small percentage of labelled videos in the training set. Iterative-Contrast-Classify (ICC) [16] is the first attempt to explore semi-supervised learning for human action segmentation, which consists of two steps, i.e., unsupervised representation learning based on contrastive learning [19] (Fig.…”

Section: Introductionmentioning

confidence: 99%

Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation

Zhou

Zhang

Zhao

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Semi-supervised temporal action segmentation aims to perform frame-wise classification in long untrimmed videos, where only a fraction of videos in the training set have labels.Recent studies have shown the potential of contrastive learning in unsupervised representation learning using unlabelled data. However, learning the representation of each frame by unsupervised contrastive learning for action segmentation remains an open and challenging problem. In this paper, we propose a novel Semantic-guided Multi-level Contrast scheme with a Neighbourhood-Consistency-Aware unit (SMC-NCA) to extract strong frame-wise representations for semi-supervised action segmentation. Specifically, for representation learning, SMC is firstly used to explore intra-and inter-information variations in a unified and contrastive way, based on dynamic clustering process of the original input, encoded semantic and temporal features. Then, the NCA module, which is responsible for enforcing spatial consistency between neighbourhoods centered at different frames to alleviate over-segmentation issues, works alongside SMC for semi-supervised learning. Our SMC outperforms the other state-of-the-art methods on three benchmarks, offering improvements of up to 17.8% and 12.6% in terms of edit distance and accuracy, respectively. Additionally, the NCA unit results in significantly better segmentation performance against the others in the presence of only 5% labelled videos. We also demonstrate the generalizability and effectiveness of the proposed method on our Parkinson's Disease Mouse Behaviour (PDMB) dataset. The code will be made publicly available.

show abstract