Language documentation overview

The terms “language documentation” and “language description” are sometimes used interchangeably to refer to any effort to make a record of a language and/or its use by speakers. When they’re distinguished, “language documentation” is typically used to refer to efforts that are more focused on broadening the written/spoken record of the language (by collecting texts, recording and transcribing spoken narratives, etc.), while “language description” is typically reserved for endeavors that intend to produce a linguistically-oriented grammar (or portion thereof), typically mainly for use by scholars. You can read more about this distinction in Himmelmann 1998 (available through the Mason libraries website).

Many description and documentation projects are undertaken on endangered languages–that is, languages that are estimated to be in danger of having no speakers left. Over 40% of the world’s languages are currently endangered (see e.g., and the UNESCO Atlas of the World’s Languages in Danger) and 50-90% are expected to have become extinct by the end of this century (Austin & Sallabank 2011). Languages become endangered for a number of different reasons. In what we might think of as the best-case scenario, speakers of one language may over time collectively choose to speak a different language or languages for social reasons. In other cases, speakers may be directly or indirectly encouraged or forced to switch to a different language (by individuals in power, or by the social or political situation). Languages can also become endangered or die out when their speakers themselves die in epidemics or via heinous acts like “ethnic cleansing” and genocide. 

Other projects (particularly those looking to produce grammars) focus on languages that may not be endangered, but are undescribed or underdescribed in the linguistic literature, meaning that not much has been written about them. 

Projects may be initiated by linguists or other scholars, or by speakers or non-speaker community members. Projects vary depending on the state of the language and the goals of those involved. For instance:

  • A group of non-speaker community members whose heritage language is only spoken by a few elders may aim to collect as many texts and narratives as possible in a short amount of time, while those speakers are still alive.
  • A linguist with an interest in a particular language family may collect texts and elicit sentences from the speakers of a “healthy” but undocumented variety, with the goal of writing a grammar for other scholars to consult.
  • Linguists and speakers may work together to create a better written record of a language that is beginning to undergo attrition in younger generations, with the aim of producing pedagogical materials in the language.

Computer scientists and computational linguists may also wish to build corpora of “low-resource” languages to increase the power and capabilities of the language tools they are working to build. These tools in turn can help language speakers and learners in their efforts to use the language with modern technology, build pedagogical resources, and otherwise maintain and revitalize the language.

You can read more about endangered languages here. You can read more about this distinction in Himmelmann 1998 (available through the Mason libraries website).