In this paper, I discuss how sociolinguistic corpora can be compiled so as to document and maximize access to the context of its collection. This is no doubt a murky issue for the coding and categorization enterprise, but it is as critical as demographic information if we are going to be able to compare data sets from different communities, eras, or across research projects. However, how far does the researcher go in documenting this type of information? My goal will be to outline what I have found to be 'best practice' in my own research while at the same time highlighting issues and problems I have encountered along the way. I build on the foundations of earlier corpus-building projects and on data arising from my own fieldwork conducted in the UK and Canada between 1995-2011. How can sociolinguistic corpora be compiled so as to document and maximize access to the context of its collection? This is no doubt a murky issue for the coding and categorization enterprise because it is so difficult to decide which information to code and how. Judicious decisions are critical if we are going to be able to compare data sets from different communities, eras, or across research projects. (for example, sites with different nationalities (e.g. Tagliamonte 2013), founders (Poplack and Tagliamonte 1991), different points in time (centuries, eras, and decades) or even across disparate research projects (focus on language contact vs. language change). However, how far does the researcher go in documenting this type of information? My goal in this paper will be to outline what I have found to be 'best practice' in my own research while at the same time highlighting issues and problems I have encountered along the way. I build on the foundations of earlier corpus-building projects (Sankoff and Cedergren, 1971;Sankoff and Sankoff 1973;Poplack 1989;Thibault and Vincent 1990) and on data arising from my own fieldwork conducted in the UK and Canada between 1995and 2011 (Tagliamonte 1996-1998, 1999-2001, 2001-2003, 2003-2006, 2007-2013.The original fieldwork situation is a critical component of any research because it determines the nature of the linguistic data. How comparable are data samples? This can only be known if substantial information about the context of data collection has been recorded and is retrievable for later work. Indeed, this fact highlights the extreme importance of the original research goals and practice. At the outset of data collection, the nature of the discourse to be obtained must somehow be planned and then controlled (to the best of the researcher's ability). First, it is necessary to decide on the geographic boundaries, whether suburb, neighborhood, or block. For example, in the Ottawa-Hull French project, Poplack (1989) controlled for border differences, contrasting the province of Ontario versus Quebec, but also varying proportions of French versus English in the same province. This permitted her to disentangle political boundaries and language contact. Further, if obtaining vernacular data ...