Clownfish: Edge and Cloud Symbiosis for Video Stream Analytics

Nigade, Vinod; Wang, Lin; Bal, Henri E.

doi:10.1109/sec50012.2020.00012

Cited by 17 publications

(6 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Like CloudSeg, DDS [4] similarly also sends the video at a low resolution, but then requests additional parts of frames separately when the DNN has low confidence. Clownfish [27] extracts the background content of the video frames, and separately sends only the objects to reduce the amount of data. Reducto [6] and SmartFilter [24] use a set of pixel-level operations to filter out irrelevant frames.…”

Section: Related Workmentioning

confidence: 99%

VISTA: Fast and Efficient Traffic Surveillance by Tile Sampling

Chaudhary¹,

Taneja²,

Singh³

et al. 2023

Preprint

View full text Add to dashboard Cite

<p>With the increasing number of vehicles in modern cities, traffic surveillance via cameras on roads has become an important application. Cities have installed thousands of cameras on roads, which send video feeds to a cloud center to run computer vision algorithms. This requires high bandwidth. Current techniques reduce the bandwidth requirement by either sending a limited number of frames/pixels/regions or relying on re-encoding the important parts of the video. This requires running DNNs to extract important portions in a frame so that they can be again sent at a higher resolution from the camera to the server. This has the disadvantage of imposing significant overhead on the camera side compute, as re-encoding is known to be expensive, and makes the system less real-time. In this work, we propose VISTA, a system that utilizes tile sampling, where a limited number of rectangular areas within the frames, known as tiles, are sent to the server. We then propose an adaptive tile sampling algorithm, that estimates the presence of moving objects by comparing the statistics of the tiles' bitrate (in kbps) and then decide to retain only the necessary tiles, thus eliminating the requirement to use a DNN at the camera side. We evaluate VISTA on different datasets having 56 videos in total to show that on average our technique reduces $17$-$40$\% of the total amount of data sent to the cloud while providing a detection accuracy of over $85\%$. Furthermore, VISTA also runs in real-time even on cheap edge devices like Raspberry Pi and nVidia Jetson Nano. Further, it requires minimal calibration compared to prior works.</p>

show abstract

Section: Related Workmentioning

confidence: 99%

VISTA: Fast and Efficient Traffic Surveillance by Tile Sampling

Chaudhary¹,

Taneja²,

Singh³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…T HE rapid development of artificial intelligence has rendered deep learning (DL) into a promising solution for audio or video processing in modern mobile applications. Applications like Google Assistant or Apple AR typically employ pre-trained deep neural networks (DNNs) to perform inference tasks such as speech recognition [1], natural language processing [2], [3], and object recognition [4], [5], [6], [7]. Inference tasks take audio or image data as input and use DNNs to generate predictions.…”

Section: Hitdl: High-throughput Deep Learningmentioning

confidence: 99%

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

Wang

Pei

et al. 2022

IEEE Trans. Parallel Distrib. Syst.

Self Cite

View full text Add to dashboard Cite

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal ? Take down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

show abstract

“…Cutting back on costly cloud servers with cheaper local edge computation solves this issue, and has been used in existing deep neural network service platforms [34,35]. However, edge solutions are often tightly resource constrained, which results in other implementations utilizing hybrid edge/cloud solutions [38], or optimizing the DNN for performance [40,45].…”

Section: Real-world Constraintsmentioning

confidence: 99%

Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance

Sanchez,

Neff,

Tabkhi

2022

Preprint

View full text Add to dashboard Cite

Action recognition is a key algorithmic part of emerging on-the-edge smart video surveillance and security systems. Skeleton-based action recognition is an attractive approach which, instead of using RGB pixel data, relies on human pose information to classify appropriate actions. However, existing algorithms often assume ideal conditions that are not representative of real-world limitations, such as noisy input, latency requirements, and edge resource constraints.To address the limitations of existing approaches, this paper presents Real-World Graph Convolution Networks (RW-GCNs), an architecture-level solution for meeting the domain constraints of Real World Skeleton-based Action Recognition. Inspired by the presence of feedback connections in the human visual cortex, RW-GCNs leverage attentive feedback augmentation on existing near state-of-the-art (SotA) Spatial-Temporal Graph Convolution Networks (ST-GCNs). The ST-GCNs' design choices are derived from information theory-centric principles to address both the spatial and temporal noise typically encountered in end-to-end real-time and on-the-edge smart video systems. Our results demonstrate RW-GCNs' ability to serve these applications by achieving a new SotA accuracy on the NTU-RGB-D-120 dataset at 94.1%, and achieving 32× less latency than baseline ST-GCN applications while still achieving 90.4% accuracy on the Northwestern UCLA dataset in the presence of spatial keypoint noise. RW-GCNs further show system scalability by running on the 10× cost effective NVIDIA Jetson Nano (as opposed to NVIDIA Xavier NX), while still maintaining a respectful range of throughput (15.6 to 5.

show abstract

Clownfish: Edge and Cloud Symbiosis for Video Stream Analytics

Cited by 17 publications

References 42 publications

VISTA: Fast and Efficient Traffic Surveillance by Tile Sampling

VISTA: Fast and Efficient Traffic Surveillance by Tile Sampling

HiTDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

Real-World Graph Convolution Networks (RW-GCNs) for Action Recognition in Smart Video Surveillance

Contact Info

Product

Resources

About