Wenxuan Xie scite author profile

Wenxuan Xie

5Publications

79Citation Statements Received

36Citation Statements Given

How they've been cited

206

How they cite others

Affiliations

Xi'an Jiaotong University, Microsoft Research Asia (China), Peking University

Publications

Order By: Most citations

Joint Time-Frequency and Time Domain Learning for Speech Enhancement

Tang

Luo

Zhao

et al. 2020

View full text Add to dashboard Cite

For single-channel speech enhancement, both time-domain and time-frequency-domain methods have their respective pros and cons. In this paper, we present a cross-domain framework named TFT-Net, which takes time-frequency spectrogram as input and produces time-domain waveform as output. Such a framework takes advantage of the knowledge we have about spectrogram and avoids some of the drawbacks that T-F-domain methods have been suffering from. In TFT-Net, we design an innovative dual-path attention block (DAB) to fully exploit correlations along the time and frequency axes. We further discover that a sample-independent DAB (SDAB) achieves a good tradeoff between enhanced speech quality and complexity. Ablation studies show that both the cross-domain design and the SDAB block bring large performance gain. When logarithmic MSE is used as the training criteria, TFT-Net achieves the highest SDR and SSNR among state-of-the-art methods on two major speech enhancement benchmarks.

show abstract

Detect or Track: Towards Cost-Effective Video Object Detection/Tracking

Luo

Xie

Wang

et al. 2019

AAAI

View full text Add to dashboard Cite

State-of-the-art object detectors and trackers are developing fast. Trackers are in general more efficient than detectors but bear the risk of drifting. A question is hence raised -how to improve the accuracy of video object detection/tracking by utilizing the existing detectors and trackers within a given time budget? A baseline is frame skipping -detecting every N -th frames and tracking for the frames in between. This baseline, however, is suboptimal since the detection frequency should depend on the tracking quality. To this end, we propose a scheduler network, which determines to detect or track at a certain frame, as a generalization of Siamese trackers. Although being light-weight and simple in structure, the scheduler network is more effective than the frame skipping baselines and flow-based approaches, as validated on Ima-geNet VID dataset in video object detection/tracking.

show abstract

Convolutional Neural Networks for forecasting flood process in Internet-of-Things enabled smart city

et al. 2021

View full text Add to dashboard Cite

Unsupervised Visual Representation Learning by Tracking Patches in Video

Wang

Zhou

Luo

et al. 2021

View full text Add to dashboard Cite

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Tang

Zhao

Wang

et al. 2022

AAAI

View full text Add to dashboard Cite

Transformers have sprung up in the field of computer vision. In this work, we explore whether the core self-attention module in Transformer is the key to achieving excellent performance in image recognition. To this end, we build an attention-free network called sMLPNet based on the existing MLP-based vision models. Specifically, we replace the MLP module in the token-mixing step with a novel sparse MLP (sMLP) module. For 2D image tokens, sMLP applies 1D MLP along the axial directions and the parameters are shared among rows or columns. By sparse connection and weight sharing, sMLP module significantly reduces the number of model parameters and computational complexity, avoiding the common over-fitting problem that plagues the performance of MLP-like models. When only trained on the ImageNet-1K dataset, the proposed sMLPNet achieves 81.9% top-1 accuracy with only 24M parameters, which is much better than most CNNs and vision Transformers under the same model size constraint. When scaling up to 66M parameters, sMLPNet achieves 83.4% top-1 accuracy, which is on par with the state-of-the-art Swin Transformer. The success of sMLPNet suggests that the self-attention mechanism is not necessarily a silver bullet in computer vision. The code and models are publicly available at https://github.com/microsoft/SPACH.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wenxuan Xie

Joint Time-Frequency and Time Domain Learning for Speech Enhancement

Detect or Track: Towards Cost-Effective Video Object Detection/Tracking

Convolutional Neural Networks for forecasting flood process in Internet-of-Things enabled smart city

Unsupervised Visual Representation Learning by Tracking Patches in Video

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?

Contact Info

Product

Resources

About