2018
DOI: 10.1109/access.2018.2861223
|View full text |Cite
|
Sign up to set email alerts
|

A 3D Atrous Convolutional Long Short-Term Memory Network for Background Subtraction

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 53 publications
(38 citation statements)
references
References 28 publications
0
33
0
Order By: Relevance
“…But, it may not be the case in certain applications as pointed out by Hu et al [82]. ; 3) Each pixel is processed independently and then the foreground mask may contain isolated false positives and false negatives; 4) It is computationally expensive due to large number of patches extracted from each frame as remarked by Lim and Keles [114]; 5) it requires preprocessing or post-processing of the data, and hence is not based on an end-to-end learning framework [82]; 6) ConvNet use few frames as input and thus can not consider long-term dependencies of the input video sequences [82]; and 7) ConvNet is a deep encoder-decoder network that is a generator network. But, the classical generator networks produce blurry foreground regions and such networks can not preserve the objects edges because they minimize the classical loss functions (e.g., Euclidean distance) between the predicted output and the ground-truth [117].…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
See 2 more Smart Citations
“…But, it may not be the case in certain applications as pointed out by Hu et al [82]. ; 3) Each pixel is processed independently and then the foreground mask may contain isolated false positives and false negatives; 4) It is computationally expensive due to large number of patches extracted from each frame as remarked by Lim and Keles [114]; 5) it requires preprocessing or post-processing of the data, and hence is not based on an end-to-end learning framework [82]; 6) ConvNet use few frames as input and thus can not consider long-term dependencies of the input video sequences [82]; and 7) ConvNet is a deep encoder-decoder network that is a generator network. But, the classical generator networks produce blurry foreground regions and such networks can not preserve the objects edges because they minimize the classical loss functions (e.g., Euclidean distance) between the predicted output and the ground-truth [117].…”
Section: Convolutional Neural Networkmentioning
confidence: 99%
“…In another work, Hu et al [82] developed a 3D atrous CNN model to learn deep spatial-temporal features without losing resolution information. In addition, this model is combined with two convolutional long short-term memory (ConvLSTM) networks in order to capture both short-term and long-term spatio-temporal information of the input video data.…”
Section: D-cnnsmentioning
confidence: 99%
See 1 more Smart Citation
“…Similar decomposition schemes have been used by Denton and Birodkar [18] and Villegas et al [19] who developed the MCnet+Res system. These decomposition methods borrow the idea from background subtraction techniques [20] and work well in rather simple scenarios. Hou et al [10] introduced a bidirectional constraint network, BCnet-D, and used a video adversarial loss function to constrain the motion of predictions.…”
Section: Related Workmentioning
confidence: 99%
“…SFEN [21] first extracts semantic maps from a single frame as the input of a ConvLSTM, and an STN model [22] and CRF [23] are combined to enhance the motion robustness and spatial smoothness of the output mask. Hu et al [24] conduct 3D atrous convolutional network with multi-frame input before ConvLSTM. By contrast to the sequence-based method, the foreground segmentation methods [11], [12] only consider a single frame to segment foreground.…”
Section: Related Workmentioning
confidence: 99%