muckefuck | Why I don't know Hindi numbers, or, Non-compositionality FTW! (Reply)

Despite my best intentions, I'm stalled in Hindi at the moment. I was working my way concurrently through two different textbooks and have hit the chapter on numbers in both of them. True to its memorisation-heavy ethos, the McGregor is like, "Here's the list; learn 'em all." Now, for most languages, this would be a trivial exercise. But when it comes to numbers, Hindi is not most languages.

Two things that virtually all languages have in common when it comes to number systems (Pirahã, please leave the room you total freak): They all have a certain base (most commonly decimal, though other systems are attested and often there is some mixing) and the names of the number are compositional. That is, the names for larger numbers are created by combining the names of smaller ones in a predictable fashion. A few languages are, for all intents and purposes, 100% compositional. Chinese is a good example of such a language. "21" is expressed as "two tens one" (二十一). Where this isn't the case, the exceptions tend to come early. Most Western European languages, for instance, are non-compositional in the ones up until some point in the teens and in the tens up until 100. (I haven't met a language yet that was non-compositional in its hundreds, but I'm sure one exists.)

For instance, Spanish numbers are non-compositional up through quince "15"; dieciseis is nothing more than a respelling of diez y seis "ten and six". Catalan, on the other hand, has setze and then continues on with semi-compositional disset, divuit, dinou (cf. deu "ten"). Whether English is compositional in the teens or not depends on how much leeway you allow. If you're willing to call "teen" just a variant of "ten", then "fifteen" is also the last irregular number; otherwise, they're non-compositional up through 20.

Hindi is unlike every other language I have learned in that every number below 100 is completely non-compositional. That is, knowing the words for "20" (बीस bees) and "1" (एक ek) will not, in any straightforward manner, give you the name of "21" (इक्कीस ikkees). Sure there are patterns--e.g. all the twenties end in -ीस ees, but then so do all the thirties and forties as well (e.g. इकतीस ikatees "31", इकतालीस iktaalees "41", etc.)--but there are so many irregularities, they're of limited use in remembering any particular number.

What happened here? Well, essentially the same thing which happened in languages like English, only on a larger scale. Our numbers were apparently once 100% compositional as well (or at least closer to it than they are now). The "-teen" of "thirteen" and the "-ty" of "thirty", for instance, are both cognates of Proto-Germanic *teXan which owe their different outcomes to difference in stress and inflection in earlier times, and the "thir-" element is simply a metathesised and shortened variant of "three".

Similarly, the Sanskrit teens are transparent compounds of the first nine numbers with daśa "ten", i.e. ekādaśa, dvādaśa, trayodaśa, caturdaśa, etc. (Although there is already a bit of variance here due to the rules of ablaut, e.g. the independent form of "4" is catvāraḥ.) However, later sound change has regularly turned the /d/ of daśa to /r/ between vowels and aspirated the /ś/, yielding the opaque modern forms ग्यारह gyaarah, बारह baarah, तेरह terah, चौदह chaudah.

The only real difference between the two systems is that in the Indo-Aryan languages (Panjabi is the same), these changes took place not just in the teens but in every single number up to 100. Language change is constant struggle between the forces of regular sound change (which seek to generalise phonological rules to all situations in which they could possibly apply) and those of analogy (which seek to preserve existing patterns of inflexion and derivation). Sometimes analogy wins out and we end up saying "twenty-one", "thirty-one", "forty-one" instead of *"twentiun", *"thirtiun", *"fortiun". But other times it loses, and you have no choice but to memorise ikkees, ikatees, iktaalees.