May. 14th, 2003 02:02 pm
Pet peeves only I have
Here's the "linguistically-themed rant" that I promised weeks ago. You know how it is, I've had so many fascinating things to say lately that I just didn't get to it before now.
So I'm reading a short story in that Korean anthology I bought and one of the characters mentions harvesting a root called todok. I'm curious what this is and I have the big fat definitive Korean-English dictionary to tell me.
Except that I realise from the transcription of Korean names in the table of contents that all diacritics are being heartlessly discarded. So it's not sufficient to look up todok in Martin's dictionary. Either or both the o's could be ŏ's (that is, o with a breve, representing a sound much like the u in American English tuck) and the t might or might not be aspirated. All told, there are eight possible proper spellings of the word.
So what do I do? I look in all eight places. Only half of these possible words actually exist; as luck would have it, the first alphabetically, tŏdŏk, is the one I want. (It's a relative of the bellflower without an English common name, Codonopsis lanceolata.) But that wasn't the first I came to. This was todok "strong poison", followed by todŏk "morality". The fourth near-homonymn is t'ŏdok (an onomatopoeic word expressing the notion of plodding).
I don't understand this sloppiness. Sure, any Korean reader rudimentarily acquainted with McCune-Reischauer is going to know which word is intended, just as English speakers can recognise their mother tongue in katakana drag--even when it assumes such outlandish forms as gyaru and kyandee. But what of the reader with a little learning? If she tries to ask a Korean, the result will probably be confusion. "T'odok? We don't eat anything called t'odok."
But it's easier on the editor, printer, and the average reader not to have to worry about diacritics, right? I guess. It'd also be easier for the editor or printer if they didn't have to worry about consistency in romanisation at all or, what the hell, consistency in English spelling. It would also be wrong.
Korean isn't the only language this happens to, of course. Japanese has got a tremendous number of homophones distinguished only by their kanji spellings--which means that they all become homonymns in romanisation. It also has a phonemic contrast between short vowels and long (the latter generally trascribed with a macron in Hepburn). Dump the macron--which plenty of writers do--and you at least double the number of homonymns. And let's not even talk about what chaos eliminating all the hooks, accents, carons, and breves introduces into Vietnamese.
There once was a day when diacritics were a true pain in the ass to reproduce. Non-specialty printers just didn't have the distinctive type necessary. Now, even without the full implementation of Unicode, means abound. I don't understand why a respectable printing house would chose not to employ any of them. But even my beloved Economist is under the impresson that diacritics are only for French and German and every other language community--including 400 million Spanish-speakers, for Chrissakes!--can just go screw themselves. This is why it was months before I discovered that Erdogan is really Erdoğan and the prime minister of the Czech Republic is actually Špidla.
So I'm reading a short story in that Korean anthology I bought and one of the characters mentions harvesting a root called todok. I'm curious what this is and I have the big fat definitive Korean-English dictionary to tell me.
Except that I realise from the transcription of Korean names in the table of contents that all diacritics are being heartlessly discarded. So it's not sufficient to look up todok in Martin's dictionary. Either or both the o's could be ŏ's (that is, o with a breve, representing a sound much like the u in American English tuck) and the t might or might not be aspirated. All told, there are eight possible proper spellings of the word.
So what do I do? I look in all eight places. Only half of these possible words actually exist; as luck would have it, the first alphabetically, tŏdŏk, is the one I want. (It's a relative of the bellflower without an English common name, Codonopsis lanceolata.) But that wasn't the first I came to. This was todok "strong poison", followed by todŏk "morality". The fourth near-homonymn is t'ŏdok (an onomatopoeic word expressing the notion of plodding).
I don't understand this sloppiness. Sure, any Korean reader rudimentarily acquainted with McCune-Reischauer is going to know which word is intended, just as English speakers can recognise their mother tongue in katakana drag--even when it assumes such outlandish forms as gyaru and kyandee. But what of the reader with a little learning? If she tries to ask a Korean, the result will probably be confusion. "T'odok? We don't eat anything called t'odok."
But it's easier on the editor, printer, and the average reader not to have to worry about diacritics, right? I guess. It'd also be easier for the editor or printer if they didn't have to worry about consistency in romanisation at all or, what the hell, consistency in English spelling. It would also be wrong.
Korean isn't the only language this happens to, of course. Japanese has got a tremendous number of homophones distinguished only by their kanji spellings--which means that they all become homonymns in romanisation. It also has a phonemic contrast between short vowels and long (the latter generally trascribed with a macron in Hepburn). Dump the macron--which plenty of writers do--and you at least double the number of homonymns. And let's not even talk about what chaos eliminating all the hooks, accents, carons, and breves introduces into Vietnamese.
There once was a day when diacritics were a true pain in the ass to reproduce. Non-specialty printers just didn't have the distinctive type necessary. Now, even without the full implementation of Unicode, means abound. I don't understand why a respectable printing house would chose not to employ any of them. But even my beloved Economist is under the impresson that diacritics are only for French and German and every other language community--including 400 million Spanish-speakers, for Chrissakes!--can just go screw themselves. This is why it was months before I discovered that Erdogan is really Erdoğan and the prime minister of the Czech Republic is actually Špidla.
no subject
(And this from a girl who, on her first day as one of Monshu's Merrie Bande of Catalogers, had to sheepishly ask what a diacritic was!)