Haoyi Zhou scite author profile

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.

show abstract

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Zhou¹,

Zhang²,

Peng³

et al. 2020

Preprint

View full text Add to dashboard Cite

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, such as quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse Self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.

show abstract

Osprey: A fast and accurate patch presence test framework for binaries

Sun

Yan

Zhou

et al. 2021

Computer Communications

View full text Add to dashboard Cite

MTTrans: Cross-domain Object Detection with Mean Teacher Transformer

Liu

Wei

et al. 2022

View full text Add to dashboard Cite

Colorectal cancer lymph node metastasis prediction with weakly supervised transformer-based multi-instance learning

Tan

et al. 2023

Med Biol Eng Comput

View full text Add to dashboard Cite

Lymph node metastasis examined by the resected lymph nodes is considered one of the most important prognostic factors for colorectal cancer (CRC). However, it requires careful and comprehensive inspection by expert pathologists. To relieve the pathologists’ burden and speed up the diagnostic process, in this paper, we develop a deep learning system with the binary positive/negative labels of the lymph nodes to solve the CRC lymph node classification task. The multi-instance learning (MIL) framework is adopted in our method to handle the whole slide images (WSIs) of gigapixels in size at once and get rid of the labor-intensive and time-consuming detailed annotations. First, a transformer-based MIL model, DT-DSMIL, is proposed in this paper based on the deformable transformer backbone and the dual-stream MIL (DSMIL) framework. The local-level image features are extracted and aggregated with the deformable transformer, and the global-level image features are obtained with the DSMIL aggregator. The final classification decision is made based on both the local and the global-level features. After the effectiveness of our proposed DT-DSMIL model is demonstrated by comparing its performance with its predecessors, a diagnostic system is developed to detect, crop, and finally identify the single lymph nodes within the slides based on the DT-DSMIL and the Faster R-CNN model. The developed diagnostic model is trained and tested on a clinically collected CRC lymph node metastasis dataset composed of 843 slides (864 metastasis lymph nodes and 1415 non-metastatic lymph nodes), achieving the accuracy of 95.3% and the area under the receiver operating characteristic curve (AUC) of 0.9762 (95% confidence interval [CI]: 0.9607–0.9891) for the single lymph node classification. As for the lymph nodes with micro-metastasis and macro-metastasis, our diagnostic system achieves the AUC of 0.9816 (95% CI: 0.9659–0.9935) and 0.9902 (95% CI: 0.9787–0.9983), respectively. Moreover, the system shows reliable diagnostic region localizing performance: the model can always identify the most likely metastases, no matter the model’s predictions or manual labels, showing great potential in avoiding false negatives and discovering incorrectly labeled slides in actual clinical use. Graphical Abstract

show abstract

Differentially-private Federated Neural Architecture Search

Singh¹,

Zhou²,

Yang³

et al. 2020

Preprint

View full text Add to dashboard Cite

Neural architecture search, which aims to automatically search for architectures (e.g., convolution, max pooling) of neural networks that maximize validation performance, has achieved remarkable progress recently. In many application scenarios, several parties would like to collaboratively search for a shared neural architecture by leveraging data from all parties. However, due to privacy concerns, no party wants its data to be seen by other parties. To address this problem, we propose federated neural architecture search (FNAS), where different parties collectively search for a differentiable architecture by exchanging gradients of architecture variables without exposing their data to other parties. To further preserve privacy, we study differentially-private FNAS (DP-FNAS), which adds random noise to the gradients of architecture variables. We provide theoretical guarantees of DP-FNAS in achieving differential privacy. Experiments show that DP-FNAS can search highly-performant neural architectures while protecting the privacy of individual parties. The code is available at https://github.com/UCSD-AI4H/DP-FNAS . *Equal contribution . †The work was done during internship at UCSD.

show abstract

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

Chen¹,

Bao²,

Huang³

et al. 2022

View full text Add to dashboard Cite

The privacy concerns associated with the use of Large Language Models (LLMs) have grown recently with the development of LLMs such as ChatGPT. Differential Privacy (DP) techniques are explored in existing work to mitigate their privacy risks at the cost of generalization degradation. Our paper reveals that the flatness of DP-trained models' loss landscape plays an essential role in the trade-off between their privacy and generalization. We further propose a holistic framework to enforce appropriate weight flatness, which substantially improves model generalization with competitive privacy preservation. It innovates from three coarse-to-grained levels, including perturbation-aware min-max optimization on model weights within a layer, flatness-guided sparse prefix-tuning on weights across layers, and weight knowledge distillation between DP & non-DP weights copies. Comprehensive experiments of both black-box and white-box scenarios are conducted to demonstrate the effectiveness of our proposal in enhancing generalization and maintaining DP characteristics. For instance, on text classification dataset QNLI, DP-Flat achieves similar performance with non-private full fine-tuning but with DP guarantee under privacy budget ϵ = 3, and even better performance given higher privacy budgets. Codes are provided in the supplement.

show abstract

Differentially-private Federated Neural Architecture Search

Singh¹,

Zhou²,

Yang³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Haoyi Zhou

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Osprey: A fast and accurate patch presence test framework for binaries

MTTrans: Cross-domain Object Detection with Mean Teacher Transformer

Colorectal cancer lymph node metastasis prediction with weakly supervised transformer-based multi-instance learning

Differentially-private Federated Neural Architecture Search

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

Differentially-private Federated Neural Architecture Search

Contact Info

Product

Resources

About