A Multiview Recognition Method of Predefined Objects for Robot Assembly Using Deep Learning and Its Implementation on an FPGA

Lomas-Barrié, Víctor; Silva-Flores, Ricardo; Neme, Antonio; Peña‐Cabrera, Mario

doi:10.3390/electronics11050696

Cited by 4 publications

(3 citation statements)

References 26 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A neural network, fuzzy ARTMAP, conducted the classification stage of the pieces, and the results were highly precise for all combinations. In a more recent application [12], it was applied in a technique to identify objects from several viewing perspectives. A condensed convolutional neural network model, inspired by LENET-5, was employed for the classification phase.…”

Section: Related Workmentioning

confidence: 99%

A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points

Lomas-Barrie,

Suarez-Espinoza,

Hernandez-Chavez

et al. 2023

Sensors

Self Cite

View full text Add to dashboard Cite

Scene classification in autonomous navigation is a highly complex task due to variations, such as light conditions and dynamic objects, in the inspected scenes; it is also a challenge for small-factor computers to run modern and highly demanding algorithms. In this contribution, we introduce a novel method for classifying scenes in simultaneous localization and mapping (SLAM) using the boundary object function (BOF) descriptor on RGB-D points. Our method aims to reduce complexity with almost no performance cost. All the BOF-based descriptors from each object in a scene are combined to define the scene class. Instead of traditional image classification methods such as ORB or SIFT, we use the BOF descriptor to classify scenes. Through an RGB-D camera, we capture points and adjust them onto layers than are perpendicular to the camera plane. From each plane, we extract the boundaries of objects such as furniture, ceilings, walls, or doors. The extracted features compose a bag of visual words classified by a support vector machine. The proposed method achieves almost the same accuracy in scene classification as a SIFT-based algorithm and is 2.38× faster. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and robustness for the 7-Scenes and SUNRGBD datasets.

show abstract

Section: Related Workmentioning

confidence: 99%

A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points

Lomas-Barrie,

Suarez-Espinoza,

Hernandez-Chavez

et al. 2023

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

“…The proposed RTB-MAXP engine and CMB-MAXP engine were implemented for employment in an FPGA-based CNN accelerator. The target model for the CNN was YOLOv4-CSP-S-Leaky designed for object detection [8]. It consists of 108 layers, including 3 × 3 convolution layers, 1 × 1 convolution layers, residual addition layers, concatenation layers, max-pooling layers, and up-sampling layers.…”

Section: Implementationsmentioning

confidence: 99%

“…To minimize computational costs and simplify the model, reducing the size of these feature maps is necessary. The max-pooling technique is employed to achieve this while preserving spatial invariance of distinct features within the feature maps [8,9]. Typically, a window of size 2 × 2 is used in max-pooling operations, ensuring spatial overlap of the maximum values and sampling values along the horizontal and vertical axes every two positions [9].…”

Section: Introductionmentioning

confidence: 99%

Efficient Two-Stage Max-Pooling Engines for an FPGA-Based Convolutional Neural Network

Hong,

Choi,

Joo

2023

Electronics

View full text Add to dashboard Cite

This paper proposes two max-pooling engines, named the RTB-MAXP engine and the CMB-MAXP engine, with a scalable window size parameter for FPGA-based convolutional neural network (CNN) implementation. The max-pooling operation for the CNN can be decomposed into two stages, i.e., a horizontal axis max-pooling operation and a vertical axis max-pooling operation. These two one-dimensional max-pooling operations are performed by tracking the rank of the values within the window in the RTB-MAXP engine and cascading the maximum operations of the values in the CMB-MAXP engine. Both the RTB-MAXP engine and the CMB-MAXP engine were implemented using VHSIC hardware description language (VHDL) and verified by simulations. The implementation results demonstrate that the 16 CMB-MAXP engines achieved a remarkable throughput of about 9 GBPS (gigabytes per second) while utilizing only about 3% of the available resources on the Xilinx Virtex UltraScale+ FPGA XCVU9P. On the other hand, the 16 RTB-MAXP engines exhibited somewhat lower throughput and resource utilization, although they did offer a slightly better latency when compared to the CMB-MAXP engines. In the comparison with existing techniques, the CMB-MAXP engine exhibited comparable implementation results in terms of the resource utilization and maximum operating frequency. It is crucial to note that only the proposed engines provide the features of runtime window scalability and boundary padding capability, which are essential requirements for CNN accelerators. The proposed max-pooling engines were employed and tested in our CNN accelerator targeting the CNN model YOLOv4-CSP-S-Leaky for object detection.

show abstract

Robot Path Recognition and Target Tracking System Based on Computer Vision

Tang¹

2023

Lecture Notes on Data Engineering and Communications Technologies

View full text Add to dashboard Cite

A Multiview Recognition Method of Predefined Objects for Robot Assembly Using Deep Learning and Its Implementation on an FPGA

Cited by 4 publications

References 26 publications

A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points

A New Method for Classifying Scenes for Simultaneous Localization and Mapping Using the Boundary Object Function Descriptor on RGB-D Points

Efficient Two-Stage Max-Pooling Engines for an FPGA-Based Convolutional Neural Network

Robot Path Recognition and Target Tracking System Based on Computer Vision

Contact Info

Product

Resources

About