Protein language models have exhibited remarkable representational capabilities in various downstream tasks, notably in the prediction of protein functions. Despite their success, these models traditionally grapple with a critical shortcoming: the absence of explicit protein structure information, which is pivotal for elucidating the relationship between protein sequences and their functionality. Addressing this gap, we introduce DeProt, a Transformer-based protein language model designed to incorporate protein sequences and structures. It was pre-trained on millions of protein structures from diverse natural protein clusters. DeProt first serializes protein structures into residue-level local-structure sequences and use a graph neural network based auto-encoder to vectorized the local structures. Then, these vectors are quantized and formed a discrete structure tokens by a pre-trained codebook. Meanwhile, DeProt utilize disentangled attention mechanisms to effectively integrate residue sequences with structure token sequences. Despite having fewer parameters and less training data, DeProt significantly outperforms other state-of-the-art (SOTA) protein language models, including those that are structure-aware and evolution-based, particularly in the task of zero-shot mutant effect prediction across 217 deep mutational scanning assays. Furthermore, DeProt exhibits robust representational capabilities across a spectrum of supervised-learning downstream tasks. Our comprehensive benchmarks underscore the innovative nature of DeProt's framework and its superior performance, suggesting its wide applicability in the realm of protein deep learning.