Inferring Prehistory from Language Genealogy

Gradual Diversification

This chart depicts the branching of the Uralic language family into its subfamilies. The numbers at the bottom indicate the number of living languages in each subfamily. The expansion of this family was a gradual process: almost every forking was into just two parts.

Each intermediate branch denotes an independent prehistoric language whose existence can be inferred; of these only Finno-Ugric is labeled. The chart is not ``drawn to scale'': linguists can infer that the Finno-Ugric stage probably lasted for thousands of years.
Several thousand years ago there was a single language, Uralic, which has diversified until today it has 33 living descendants. Several explanations might be supposed for this expansion:

A complete explanation may involve a combination of these reasons. For Uralic, the key reason is probably yet something else:

Rapid Diversification

The Austronesian language family, with 1227 living languages, is much larger than Uralic, yet its overall family structure has fewer major branchpoints. Here the branches aren't named, but the number of languages is shown and their geographic range. (In some cases, the geographic range may be broader than indicated. The subfamily with 144 languages, called the Borneo branch of Western Malayo-Polynesian, includes Malagasy, the national language of Madagascar.) Of the four major branches of Austronesian, three are located only in Formosa; the fourth, called Malayo-Polynesian, comprises languages all the way from Madagascar to Easter Island.

Geographic barriers (the Malayo-Polynesian languages are spoken on many different islands) are a major reason for the rapid diversification of the family, but not the whole explanation. The Philippines were populated before the Austronesian speakers arrived, but their ancient languages have completely disappeared. The Austronesian language expanded rapidly because the Austronesian culture suddenly developed the advantages of agriculture and ocean navigation, and overwhelmed less advanced cultures.

The most interesting thing about this chart is that you can see at a glance that the Austronesian homeland was almost certainly Formosa. Otherwise it would be very unusual that of the four primary branches of such a vast language family, three would be isolated among the aboriginal languages of present-day Taiwan.

It seems exciting that the original source of the ancestors of the Polynesian expansion can be inferred, not from archaeology, but from simple examination of a language tree.

The Volta-Congo language family in Africa has as many languages as Austronesian; and its chart shows a pattern very similar to that of Austronesian (even though there are no ocean barriers to explain the diversity). Again the explosive form of the language tree would be explained by the rapid expansion of a superior culture, in this case the Iron-Age Bantu farmers. And again a glance at the tree would reveal the original homeland of the Bantu farmers: the valley of the upper Benue River in present-day Cameroon.

Slow Diffusion

There is another major type of language family tree that needs to be mentioned. Unlike the Malayo-Polynesian expansion, or the Iron-Age Bantu revolution, Australia probably experienced no rapid cultural revolution until modern times. Yet its family tree doesn't resemble the gradual diversification model of Uralic either. Instead the Australian languages are all related, but it is unclear which of the relationships are genetic and which result from borrowing. The chart is intended to depict languages migrating and borrowing traits from unrelated languages.
It can be surmised that the linguistic situation in Australia developed roughly in three phases:

Solving Ancient Mysteries

It should be fun to solve ancient riddles by studying such family trees. Unfortunately, the most useful evidence is often missing or controversial. After the examples of Polynesians and Bantus, perhaps the most cited homeland discovery is that of the Algonquins, but this is exciting only to specialists. (In Columbus' time Algonquin languages were prominent along the North Atlantic coast, but the language tree suggests they probably originated much further West.)

Yet one of the most amazing stories of pre-history is hidden in the most-studied language tree.

In Search of the Indo-Europeans

In Search of the Indo-Europeans is the title of a book by Mallory in support of Marija Gimbutas' theory of Indo-European origin. The early Indo-European speakers were the horse-riders of the East European steppes (Ukraine and southern Russia) who invaded Central Europe between 4500 and 2500 BC. There is much evidence supporting the Gimbutas-Mallory theory, such as religious motifs based on cattle and horses which are seen all the way from Ireland to India, and the fact that the center of radiation for the Indo-European languages is near Romania, right on the boundary between the Eurasian steppes and the fertile breadbasket of Central Europe. Nevertheless there is much reluctance to accept Gimbutas' theory. Recently Nature magazine published a paper (though not written by linguists) claiming to have proven that the Indo-European break-up occured at least 4000 years earlier than in the Gimbutas theory.

Two major theories ccompeting with the Gimbutas theory are the Anatolian hypothesis and the Balkano-Danubian hypothesis. These competing theories, however, can be ruled out by examining the structure of the Indo-European language tree, as I now try to explain.

Before trying to draw an inference from the structure of the family tree, we must agree on the structure. For brevity, the charts to the left show only six branches (or specima of branches) of the Indo-European family (Hittite, Italic, Greek, Armenian, Sanskrit, Baltic), but even when ten branches are shown the usual structure shown is as in (a) -- Indo-European suddenly exploded, just like Malayo-Polynesian. Obviously, Indo-European was associated with a major cultural revolution.

Indo-European has been very extensively studied; yet the experts have never fully agreed on a substructure other than the sudden split into ten branches. Nevertheless, there have been attempts to identify a substructure. These attempts are confused by areal borrowings. Thus in (b) we indicate similarities between Greek and Armenian, between Armenian and Sanskrit, and between Sanskrit and Baltic, some of which may be due to areal rather than genetic connection.

One universally recognized distinction among Indo-European languages is the split between Centum and Satem languages. In the charts the Satem branches are shown in yellow. It is generally agreed that the ancient Proto-Indo-European was a Centum language; Satem arose from a K-->S sound change.

Professor Ringe and others have used computer software to determine the detailed structure of the I-E family; their result is shown in (c). The branches are seen splitting off from a single core. The time between branchings must be small: if there were more than a few centuries between the first branching and the last, the structure would be clear and well-known, rather than controversial and clarified only after special statistical analysis.

Finally in (d) we pretend that I-E has a normal gradual fanout, like Uralic. We show Centum subfamilies in one branch and Satem subfamilies in another. While the Centum-Satem split (along with an initial split of Hittite from Indo-European proper) may be the closest thing I-E has to a major branching, even it is not given that role by most theorists. Some linguists accept an ``Armeno-Greek hypothesis,'' that Armenian and Greek are particularly close subfamilies, yet Armenian is Satem and Greek is Centum.

We will argue that the Anatolian and Balkano-Danubian hypotheses require a definite branching structure, like (d), and are incompatible with either a tree structure like (a), or one like (c).

Gimbutas' theory involves successive waves of invasion from the Ukrainian steppes to the Balkans or Central Europe. This is exactly compatible with diagram (c): ``Kurgan wave 1'' (ca 4300 BC) led to the splitoff of Hittite, ``Kurgan wave 2'' (ca 3600 BC) led to the splitoff of Greek, and ``Kurgan wave 3'' (ca 3000 BC) led to the final separation of the Satem subfamilies. This clearly locates the change from Centum to Satem in time and space: it occured in the Pit Grave (Yamnaya) culture of the Pontic-Caspian steppes during the middle of the 4th millenium BC.

In the Gimbutas theory the close affinity of Greek and Armenian is no surprise: they each migrated southwest from the Kurgan homeland in Scythia, but a few centuries apart, Greek just before and Armenian just after the K-->S sound change.

Anatolian and Balkano-Danubian Hypotheses

Because the Indo-European family fanned out so rapidly (see chart (a)) into language branches that show up from Ireland to India, one thing certain is that it was associated with a major technological or cultural change. An obvious candidate for such a change is the arrival of agriculture and both the Anatolian and Balkano-Danubian hypotheses are based on that. That the language explosion didn't occur before the advent of farming is indicated by reconstructed words for farming terms represented across many I-E branches. (Actually there are also such reconstructed words for stockbreeding terms which suggest the I-E explosion occurred even later.)

Agriculture was invented near the upper Euphrates River, crossed Anatolia and arrived in Greece by 7500 BC and southern France by 6500 BC. In the Anatolian hypothesis, it was these farming colonies that brought Indo-European language to Europe.

There was a delay, while Europe's climate warmed, before farming moved North. In addition to different climate, the different soil conditions required new techniques, like forest-clearing, crop rotation and, probably, manure fertilization by herding animals. From a gestation stage in the Balkans, an early pig/cow/cereal culture arose in Eastern Europe by 6000 BC, and a similar farming culture moved into the Danube Basin 5500 BC, reaching northern France before 4500 BC.

In the Balkano-Danubian hypothesis, the languages of the southern European farmers (possibly including Etruscan) have become extinct and the language of farmers in the Balkans and Danube basin was proto-Indo-European. We will focus on this hypothesis as less implausible than the Anatolian hypothesis.

The Balkano-Danubian Hypothesis is similar to the Anatolian, except that one doesn't bother to push the Homeland back past ca 5500 BC. The Danubian Linear Ware culture and affiliated Balkan cultures like Tripolye-Cucutenis spoke Indo-European in both these hypotheses. Eventually Indo-European was spoken by the Kurgan people of southern Russia but Balkano-Danubists may differ as to whether the language was adopted in the early 6th millenium (Bug-Dniester pig-breeders lending language to D-Donetz horse-breeders) or early 4th millenium (Tripolye-Cucuteni lending language to Sredny Stog/Pit Grave).

Gimbutist Kurgan Theory

Kurganists would agree that the demic (farming) migrations of the 8th and 7th millenia BC probably led to a major linguistic thrust, from the eastern Mediteranean to southern and then central Europe, but the languages (call them Old European), though they were likely spoken by Lengyel, Triplye, Cucuteni, have not survived, having been overwhelmed by westward thrusts from the Eurasian steppes.

The westward thrusts include:

Indo-European's westward thrusts (and later those of Phrygian/Thracians, Scythians, Huns, Magyars, Turks, Mongols) fit a historical pattern: from the broad steppes of Central Eurasia, mounted armies can overwhelm Europe's breadbasket. Invasions never go the other way: it would be like trying to push water the wrong way through a funnel.

The only pre-Indo-European language to survive in Europe is Basque, supposed to descend from mesolithic (Solutrean) people.

The Gimbutas Theory fits the facts like a glove. For every migration needed to explain the arrival of an I-E branch at an appropriate place and time, there is archaeological evidence of just such a migration. One doesn't have to guess when the early Greek speakers left the Kurgan homeland: one sees cultures like Usatovo in Romania that share cultural motifs with both Kurgans and the earliest Greeks, and occur at just the right time to fit Gimbutas' Kurgan Wave 2.

For competing theories, the appropriate migrations and intermediate cultures are missing: this means different versions of the Balkanist hypothesis will vary greatly in detail. Let's start by identifying a few facts that are agreed by all serious theorists.

Regarding this last point it should be noted that historic invasions from the steppes into Central Europe are almost too numerous to list: the Scythians, the Slavs, the Huns, the Magyars, the Mongols, the Turks, but there is no example of an invasion in the opposite direction. (Napolean tried it after the invention of artillery but was defeated.) Moreover, there is much uncontroversial evidence of prehistoric invasions by Kurgans into the North European Plain, and to the West of the Carpathian Mountains, but no evidence of intrusion from the West into the Kurgan homeland.

Defenders of the Anatolian and Balkanist hypotheses base their case on three ideas:

The first point is not one most linguists would take seriously. If, as Gimbutas maintains, I-E overwhelmed a non-IE speaking Europe, it would undergo faster change than an I-E surrounded by I-E speakers. Language changed faster in prehistoric times, before liturgical and written language acted as a brake on change.

The second point is to underestimate the horse-riding Kurgan culture, with its military superiority, dominating social and religious motifs (even Balkanists have to admit that India was overwhelmed quickly, with the caste system and Hindu religion due to Indo-European intruders), and much greater stress on individual initiative compared with the farming villages of the Danubian culture.

The third point ignores that language replacement does happen. Celtic dominated West and Central Europe at the time of Caesar, yet disappeared from the Continent completely. (The Breton language is not a Continental language, but the result of back migration from Britain.) Anyway, if the Balkanists assume that Italic and Celtic subfamilies diverged soon after the Danubian expansion, good for them! As we soon see, they will be ``hoisted on their own petard!''

We mentioned earlier that Balkanists must assume that the Kurgans adopted Indo-European at some point. When did this happen? Different variations of the Balkan hypothesis can put this date as early as 6000 BC (the earliest East European farmers supplied I-E to the earliest Kurgans) or as late as 3000 BC (Kurgans finally adopted the European lingua franca just in time for the Indo-Iranian expansion). However none of the possible dates will be compatible with the I-E family tree structure (chart (a) or (c) above).

Because the Danubian farmers and Kurgan stockbreeders had completely different cultures and were isolated from each other, any theory that has both cultures speaking I-E before 4500 BC would require a clear division of I-E into 3 to 6 branches (in addition to Danubian, Kurgan and presumably Tocharian, one would postulate a few SE European branches to explain Hittite and perhaps Greek). The I-E tree does not have that character. If this isn't clear, figure out where Armenian and Greek would fit, remembering their close affinity and the close affinity of Armenian and Indo-Iranian. You may end up with a structure where I-E is all Kurgan except Hittite, Celtic and Italic, yet that doesn't match linguistic evidence.

Therefore the Balkanists need to assume that Kurgans adopted I-E after 4500 BC, after the I-E breakup was in progress. A powerful culture might adopt an alien lingua franca but the new language would surely be transformed greatly, preserving an old Kurgan substrate. Again there will be a clear distinction between the Kurgan and non-Kurgan branches of I-E (that is, something like chart (d)), and again this would not match the linguistic evidence. (Nor the cultural evidence, as Celtic preserves Kurgan horse-riding and horse-worship motifs, but must be a non-Kurgan language in any variation of the Balkanist hypothesis.)

Playing with the Balkanist hypothesis to make it fit the tree, one inevitably concludes that most of the I-E fanout occured near Romania and Bulgaria during the Copper Age and early Bronze Age (during this era, the relatively homogeneous Balkano-Danubist culture of that area was replaced with a variety of new cultures). This is essentially the same time and place as the Gimbutas theory (thereby forfeiting the main raison-d'etre of the Balkanist theory: to give I-E an earlier more Westerly Homeland). The difference is that in Gimbutas' theory all I-E branches are due to Kurgan intrusion while the Balkanists contend that I-E was already spoken, that Kurgan invaders adopted the Balkan language. This might make sense if they didn't need to fit Indo-Iranian in. And what about Balto-Slavic: it's very close to Indo-Iranian, was it also Kurgan?

We have not yet mentioned Tocharian, the exotic branch of I-E located in China which is now extinct. This language occupies a position similar to Italic or Celtic in the I-E tree, and its culture shares motifs with Celtic. It poses no problem for Gimbutas (part of `Kurgan Wave 2' went East instead of West) but cannot be handled in any reasonable way by the Balkanists. Geographically it belongs with the Kurgan branch of I-E but that doesn't fit linguistic evidence. The Balkanists need to suppose an obscure very early migration from Central Europe to Asia, with no linguistic interaction with East Europe. Although Celtic and Italic were almost adjacent until historic times, with Tocharian many thousands of miles away, Tocharian, Celtic and Italic would be co-equal branches of what might be called ``Western I-E'' (there is only weak support for the so-called Italo-Celtic hypothesis).

Sherlock Holmes once said ``After eliminating the impossible, whatever remains, however unlikely, is the mystery's solution.'' However bizarre it may seem, the Kurgan horse-riders are indeed the source of the Indo-European languages now spoken all around the world.


Go back to James Allen's home page.