Arabic corpora have existed since the last decade of the past century. Although they are constantly increasing, more advanced tools and morpho-syntactically annotated Arabic corpora are still needed for research and teaching. Likewise, parallel and specialised corpora are rare despite the growing need to use them in empirical linguistic investigations of authentic Arabic texts and for language and translation teaching. Therefore, building legal corpora will pave the way for more research in Arabic legal translation, an area which is under-researched worldwide. This paper aims to discuss the building of a collection of specialised parallel and monolingual legal corpora. In particular, it will discuss the building of diachronic corpora, which include all available constitutions of 22 Arabic countries. The aim of building all available versions of these constitutions is two-fold: (1) interdisciplinary corpus-based and socio-cultural investigations and (2) research-led and blended-learning pedagogical approaches to translation teaching and learning. Thus, these corpora are of great value to translation trainers and researchers, law academics and professionals, and governmental, non-governmental and international organisations. The paper will demonstrate the process of building these specialised complex corpora and the challenges encountered throughout this process. Among the challenges faced during the data collection and processing phases are (1) limitations of finding the original constitutions for each Arabic country since some of them date back to 1922; (2) file conversion and the difficulty of choosing one Optical Character Recognition (OCR) tool to rely on for the Arabic language since many lack accuracy, efficiency as well as encoding issues in Arabic.