muckefuck: (Default)
muckefuck ([personal profile] muckefuck) wrote2008-11-13 09:45 am

Why I don't know Hindi numbers, or, Non-compositionality FTW!

Despite my best intentions, I'm stalled in Hindi at the moment. I was working my way concurrently through two different textbooks and have hit the chapter on numbers in both of them. True to its memorisation-heavy ethos, the McGregor is like, "Here's the list; learn 'em all." Now, for most languages, this would be a trivial exercise. But when it comes to numbers, Hindi is not most languages.

Two things that virtually all languages have in common when it comes to number systems (Pirahã, please leave the room you total freak): They all have a certain base (most commonly decimal, though other systems are attested and often there is some mixing) and the names of the number are compositional. That is, the names for larger numbers are created by combining the names of smaller ones in a predictable fashion. A few languages are, for all intents and purposes, 100% compositional. Chinese is a good example of such a language. "21" is expressed as "two tens one" (二十一). Where this isn't the case, the exceptions tend to come early. Most Western European languages, for instance, are non-compositional in the ones up until some point in the teens and in the tens up until 100. (I haven't met a language yet that was non-compositional in its hundreds, but I'm sure one exists.)

For instance, Spanish numbers are non-compositional up through quince "15"; dieciseis is nothing more than a respelling of diez y seis "ten and six". Catalan, on the other hand, has setze and then continues on with semi-compositional disset, divuit, dinou (cf. deu "ten"). Whether English is compositional in the teens or not depends on how much leeway you allow. If you're willing to call "teen" just a variant of "ten", then "fifteen" is also the last irregular number; otherwise, they're non-compositional up through 20.

Hindi is unlike every other language I have learned in that every number below 100 is completely non-compositional. That is, knowing the words for "20" (बीस bees) and "1" (एक ek) will not, in any straightforward manner, give you the name of "21" (इक्कीस ikkees). Sure there are patterns--e.g. all the twenties end in -ीस ees, but then so do all the thirties and forties as well (e.g. इकतीस ikatees "31", इकतालीस iktaalees "41", etc.)--but there are so many irregularities, they're of limited use in remembering any particular number.

What happened here? Well, essentially the same thing which happened in languages like English, only on a larger scale. Our numbers were apparently once 100% compositional as well (or at least closer to it than they are now). The "-teen" of "thirteen" and the "-ty" of "thirty", for instance, are both cognates of Proto-Germanic *teXan which owe their different outcomes to difference in stress and inflection in earlier times, and the "thir-" element is simply a metathesised and shortened variant of "three".

Similarly, the Sanskrit teens are transparent compounds of the first nine numbers with daśa "ten", i.e. ekādaśa, dvādaśa, trayodaśa, caturdaśa, etc. (Although there is already a bit of variance here due to the rules of ablaut, e.g. the independent form of "4" is catvāraḥ.) However, later sound change has regularly turned the /d/ of daśa to /r/ between vowels and aspirated the /ś/, yielding the opaque modern forms ग्यारह gyaarah, बारह baarah, तेरह terah, चौदह chaudah.

The only real difference between the two systems is that in the Indo-Aryan languages (Panjabi is the same), these changes took place not just in the teens but in every single number up to 100. Language change is constant struggle between the forces of regular sound change (which seek to generalise phonological rules to all situations in which they could possibly apply) and those of analogy (which seek to preserve existing patterns of inflexion and derivation). Sometimes analogy wins out and we end up saying "twenty-one", "thirty-one", "forty-one" instead of *"twentiun", *"thirtiun", *"fortiun". But other times it loses, and you have no choice but to memorise ikkees, ikatees, iktaalees.
ext_78: A picture of a plush animal. It looks a bit like a cross between a duck and a platypus. (Default)

[identity profile] pne.livejournal.com 2008-11-13 09:55 pm (UTC)(link)
Let's hope they go the Tongan route and chuck their previous system for a compositional one. (ISTR that Welsh also did this, except for telling time, or something. And Tongan was compositional; they just went for an even simpler version -- IIRC, something like "four-tens and seven" to "four seven".)

I remember someone ([livejournal.com profile] talinthas?) saying that even native speakers have problems with the numbers in India.

[identity profile] muckefuck.livejournal.com 2008-11-13 10:25 pm (UTC)(link)
Yes, the new system is like the Chinese, e.g. pedwardeg saith "fourten seven" ="47". But I have such fondness for the old one with its vigesimal relics that I'll be saying saith a deugain (not to mention pedwar ar bymtheg ar bedwar hugain) until I die.

Similarly, Modern Irish has seen a revival of (Latin-influenced?) non-vigesimal terms for the tens, e.g. nócha a naoi for older ceithre fichid a naoi déag.
ext_78: A picture of a plush animal. It looks a bit like a cross between a duck and a platypus. (Default)

[identity profile] pne.livejournal.com 2008-11-13 09:58 pm (UTC)(link)
(I haven't met a language yet that was non-compositional in its hundreds, but I'm sure one exists.)

I give you quinientos.

(My first thought was Maltese mitejn "200", but that's a fairly straightforward dual, so probably doesn't really count against compositionality, but then I remembered quinientos from, of all things, a Donald Duck comic in German I read many, many years ago, where one of the characters asked for quinientos taleros -- I only found out years later what that number word meant.)

[identity profile] muckefuck.livejournal.com 2008-11-13 10:06 pm (UTC)(link)
Wow, I don't remember ever coming across the word before (although there are some languages where "half thousand" is preferred to "five hundred(s)").

[identity profile] gorkabear.livejournal.com 2008-11-13 11:33 pm (UTC)(link)
Don't quote me on this, but quinientos sounds too close to quinto, to me, which is the ordinal of 500. Add the fact that the whole we use C for the Z (TH) sound because it's etimologically related to the C = K sound somewhat. So cinco / quinto / quinientos is semi-compositional, imho
ext_78: A picture of a plush animal. It looks a bit like a cross between a duck and a platypus. (Default)

[identity profile] pne.livejournal.com 2008-11-14 09:42 am (UTC)(link)
Well, all the Hindi numbers are semi-compositional, too (rather than, say, completely random) -- but they're still a headache to remember simply because they're not *completely* compositional.

[identity profile] muckefuck.livejournal.com 2008-11-14 03:14 pm (UTC)(link)
My test for (full) compositionality is: Given the components (in this instance, cinco and ciento) plus any rules of inflection and phonological adjustment (e.g. ciento -> cientos), can a speaker reliably produce the correct form? That's not the case with quinientos. Given that word, you can work out the relationship to quinto and such, but you'd never guess that form existed based on the information above and the examples of un ciento (not *uniento), doscientos, trescientos, etc.

[identity profile] gorkabear.livejournal.com 2008-11-14 04:19 pm (UTC)(link)
un ciento = cien
dos cientos = doscientos
...
cinco cientos = quinientos

uh oh... This makes this non compositional then!

[identity profile] gorkabear.livejournal.com 2008-11-13 11:34 pm (UTC)(link)
Hey, how about 70 and 90 in Metropolitan french? soixante-dix & quatre-vint-dix... and 80 = quatre-vingt

Compositional but on base 20!

(to piss off my bf I use the belgian and swiss versions of septante, huitante et nonante, which are like the rest of us romanic language speakers)

[identity profile] pklexton.livejournal.com 2008-11-14 05:08 am (UTC)(link)
Danish too - they use base twenty from 50-99.

http://en.wikipedia.org/wiki/Vigesimal

Curiously, the other Scandinavians don't. And even the Danes say "Femti" on the 50 kroner note instead of halvtreds (shortened from "halvtredsindstyve" or halfthirdtimestwenty (ie 2 and a half times twenty)), or in that lowest-common-denominator Scandinavian patois spoken in SAS airport lounges.

[identity profile] richardthinks.livejournal.com 2008-11-14 01:46 pm (UTC)(link)
At least there are no measure words to worry about, right?

[identity profile] muckefuck.livejournal.com 2008-11-14 03:19 pm (UTC)(link)
Honestly, those are easier. Only the Japanese ones really have much in the way of unpredictable adjustments.

[identity profile] richardthinks.livejournal.com 2008-11-14 03:55 pm (UTC)(link)
That's not what my friend who's learning Thai says. He has a big chart on his wall of the categories of objects that take different counters and I'm damned if I can make head or tail of it. The Chinese at least seems reasonably rational.

[identity profile] muckefuck.livejournal.com 2008-11-14 04:09 pm (UTC)(link)
That's because the category structure of classifiers is non-Aristotelian. Lakoff goes into this in great detail in his book Women, fire, and dangerous things (a description of the contents of a noun class in the Dyirbal language). He owes it to himself to read it quite apart from how it might aid him in learning Thai. (I'd assign it to you, too, but you should be writing your goddamned thesis instead.)

[identity profile] richardthinks.livejournal.com 2008-11-14 07:36 pm (UTC)(link)
I've often been tempted by the book: I've even had it on my shelf a few times, but I've always had to, had to, had to read the more directly relevant stuff first. Right now it's TBR after Bruno Latour's We Have Never Been Modern.
God knows how other people manage to cover broad literatures and all that.

[identity profile] talinthas.livejournal.com 2008-11-17 03:53 am (UTC)(link)
Gujarati is similar, but not until the 40s. 1 is ek, and twenty is vees, so 21 is ekvis. threes for 30 is regular as well. But when you get to chalis for 40, you start to run into problems. 41 is ektalis, 42 betalis, 43 tetalis, 44, chumalis (!) 45 petalis, and so on. the sixties and seventies are even more ridiculous.

It was a pain in the ass trying to learn this growing up, let me tell you.

[identity profile] lhn.livejournal.com 2008-11-20 07:59 am (UTC)(link)
(Pirahã, please leave the room you total freak)

Naturally, never having heard of Pirahã before, it was inevitable that I'd see a newspaper story relating to them (and touching on their language and some of its unusual features) within a week.

[identity profile] zompist.livejournal.com 2008-11-22 12:06 am (UTC)(link)
That is really neat.

My collection of number system oddities is here: http://www.zompist.com/families.htm

I think my favorite is Kewa, which is base 24. Many number systems are based on counting fingers... the Kewa start there but keep going, up the arms and head, ending between the eyes.