In natural listening environments, speech signals are easily distorted by various acoustic interference, which reduces the speech quality and intelligibility of human listening; meanwhile, it makes difficult for many speech-related applications, such as automatic speech recognition (ASR). Thus, many speech enhancement (SE) algorithms have been developed in the past decades. However, most current SE algorithms are difficult to capture underlying speech information (e.g., phoneme) in the SE process. This causes it to be challenging to know what specific information is lost or interfered with in the SE process, which limits the application of enhanced speech. For instance, some SE algorithms aimed to improve human listening usually damage the ASR system.The objective of this dissertation is to develop SE algorithms that have the potential to capture various underlying speech representations (information) and improve the quality and intelligibility of noisy speech. This study starts by introducing the hidden Markov model (HMM) into the Non-negative Matrix Factorization (NMF) model (NMF-HMM) because HMM is a convenient way to find underlying speech information for better SE performance. The key idea is applying HMM to capture the underlying speech temporal dynamics information in the NMF model. Additionally, a computationally efficient method is also proposed to ensure that this NMF-HMM model can achieve fast online SE.Although NMF-HMM captures the underlying speech information, it is difficult to explain what detailed information is obtained. In addition, NMF-HMM cannot represent the underlying information in a vector form, which makes information analysis difficult. To address these problems, we introduce deep representation learning (DRL) for SE. DRL can also improve the SE performance of DNN-based algorithms since DRL can obtain a discriminative speech representation, which can reduce the requirements for the learning machine to perform a task successfully. Specifically, we propose a Bayesian permutation training variational autoencoder (PVAE) to analyze underlying speech information for SE, which can represent and disentangle underlying noisy speech information in a vector form. The experimental results indicate that disentangled signal representations can also help current DNN-based SE algorithms achieve better SE performance. Additionally, based on this PVAE framework, we propose applying β-VAE and generative adversarial networks to improve PVAE's information disentanglement and signal restoration ability, respectively.