Reading in Hindi and Urdu

(Note that I’ve written about a similar topic almost three and a half years ago)

I’ve been working on learning how to speak Hindi¹ for the last four or five years, and I’m now to a point where I can actually have simple conversations, not just a series of stilted question and answer exchanges. I have a bit of a silly accent, and I still struggle to understand when people speak quickly or use lots of slang. I can understand bits of pieces of dialogue in Bollywood movies, but I generally miss out on details or double-entendres. That said, I am the type of person who struggles to understand dialogue even in English-language movies, so I don’t use my ability to understand Bollywood movies as a reliable measure of my skill as a listener of Hindi.

I can read and write both Devanagari and Arabic scripts as used to write Hindi and Urdu respectively, but I am very slow. For a while I was trying to read Harry Potter in Hindi, and I could read a single page in about two or three minutes. I’m even slower with Urdu. For me, reading Urdu is especially hampered by the fact that it is generally written using the Nastaliq caligraphic form, which is not what I was originally trained to read in my first year at University when I briefly studied Arabic. To me, Nastaliq is the most beautiful script in the entire world. I love its swooping, almost exagerated lines. I love that its written on a bias. Opening the first Urdu book I bought (1984, incidentally), I remember flipping through the pages, tracing my finger over the words, delighted as I painstakingly sounded out words.

Part of the reason why I’m such a slow Hindi/Urdu reader is that I can’t recognize many words on sight, which means that I end up sounding out 90+% of the words on the page. I’m sure I did this at some point when I was learning to read in English, but I genuinely don’t remember a time before I could read. It feels like something that has always been part of me, like walking or speaking (both learned skills, I know!). What’s interesting is that when I started reading a lot in Portuguese (when I was 15 or 16) I could read at about eighty to ninety percent the speed I could read in English. Granted, I am a more proficient Portuguese speaker than Hindi speaker, but it strikes me that sight-reading isn’t as simple as having a big lookup table of word shapes in your head. Could it be that my brain is able map unfamiliar written word shapes to know words given all my experience with written English? I wonder if consonant/vowel grouping has anything to do with this. I haven’t done any extensive analysis on this, but I’m pretty sure that English and Portuguese have similar consonant/vowel groupings in words as compared to German, or Czech. What I mean by a “consonant/vowel grouping” is the smallest group of consonants and vowels that produce a meaningful sound; take “sha”, “chi”, or “ik” as examples. Polish or Czech will often have several consonents following one another in a way that would never occur in English or Portguese. My guess is that the way these little consonant/vowel groups cluster together to form words is more similar in Portuguese and English as compared to English and Czech. I wonder if the fact that both English and Portuguese are not highly agglutinative languages helped my Portuguese fluency because my brain was predisposed to seeing shorter words on the page. Moreover, I wonder if the shared Latin roots between many English and Portuguese words gave me an added advantage when learning to read in Portuguese.

Knowing how to read both Hindi and Urdu allows for some interesting comparisons between the two registers’ writing systems.

Hindi is orthographically shallow, while Urdu is orthagraphically deep.

One of the cool things about Devanagari (the writing system used for Hindi) is that there is basically a one-to-one correspondence between how words are written and how they’re pronounced; Hindi is an orthographically “shallow” language. English, on the other hand, is an orthographically “deep” language, meaning that there isn’t always a great correspondence between the word on the page and what comes out of your mouth; take the word “Worchestershire” (pronounced something like “wustasher”) as an example.

I think part of the reason why Hindi is so orthographically shallow is because Devanagari is not an alphabet, but rather an abugida. In alphabets, consonents and vowels are represented as separate letter forms. In abugidas, vowels can stand on their own, but consonants have attached to them some vowel sound². We can also group consontants together to form composite consonant sounds, like “ksh” “क्श”/”क्ष”, formed from mushing together the letters for “k” (“क”) and “sh” (“श”). Devanagari allows for incredible precision in expressing phonology (how things sound) orthographically, which I think contributes significantly to Hindi’s orthographic shallowness. This is not to say that an abugida is a prerequisite for orthographic shallowness; Spanish is also very orthographically shallow and happily hums along with the Latin alphabet.

One thing that isn’t generally part of written Hindi is where to put stress in a word. In Spanish and Portuguese we can add accents to indicate where we should put stress. My favorite example is the word “teléfono” in Spanish. By default in Spanish, we put the stress on the penultimate syllable in a word, but that would sound super weird in the case of teléfono! I know that Devanagari has a means of expressing where to put stress in words, but my understanding is that it is only used in liturgical settings.

Urdu, being written using a modified version of the Arabic writing system is very orthographically deep. In other words, there isn’t always a great correspondance between what’s on the page and what you say. Arabic is not an alphabet but rather an abjad (the name “abjad” actually comes from the first four letters of the Arabic writing system), which means that vowels are not really included. Arabic has diacritics for vowels, but these are only used in either teaching or liturgical settings. For the most part when reading Urdu, you see a series of consonants, with some long vowels. You basically have to know all words by sight, because the written form could be pronounced in any number of different ways. At first glance, this might seem problematic – how are we supposed to know how to pronounce the words on a page? It turns out that you don’t really need to have vowels written out explicitly to know what’s going on in a text (See here for a meme that exemplifies how this might play out in English). Sometimes words with different pronunciation might have the same written form, but it is generally possible to disambiguate based on context.

Having spent some time with orthographically shallow and deep languages (or language registers), one thing I’ve wondered is whether or not orthographic depth has any effect on rates of dyslexia in a population. It seems to me that investigating rates of dyslexia in Urdu and Hindi speaking populations might help answer this question provided all sorts of other sociological factors could be controlled for (I’ll leave this experiment up to my social scientist friends!).

In Urdu, retroflex and aspiration are orthographic “add-ons” while in Hindi they’re different letters all together.

Hindi/Urdu makes use of retroflex and aspirated sounds. In Hindi, we have a “d” sound, similar to English, but we also have three other varieties that either don’t exist in English or we don’t distinguish in English. The base “d”, “द”/”د” is similar to the “d” in my name, expect that you really push the tip of your tongue against the back of your teeth³. The aspirated version “ध”/”دھ” is the same, but you let out a little puff of air and the end. One way to tell if you’re doing it right is to hold your hand in front of your mouth; if you feel a small, hot jet of air hit your hand, you’ve sucessfully produced an aspirated consonant! In English we don’t distinguish between aspirated and non-aspirated consonants; it can be very difficult to hear the difference between the two. Here’s an example of when aspiration is important: the words for pale (“गोड़ा”) and horse (“घोड़ा”). The only difference between the two is the fact that the first one starts with an unaspirated “g” sound while the second starts with an aspirated “g”.

Okay, on to retroflex. This is a little trickier, because these are a set of sounds that I don’t generally make when speaking my English. Continuing with the “d” example from above, we can make it retroflex by sort of folding our tongues backwards until the tip is touching the middle of the mouth. In this position, try to make a “da” sound. In Hindi, this is represented with “ड” while in Urdu we have “ڈ”. We also have an aspirated retroflex, this time written with “ढ” and “ڈھ” respectively.

Notice that in Hindi, we represent aspirated/retroflex consonants with entirely different letters, while in Urdu we add either diacritics or an additional letter to signal these properties. I’ll make a little chart to make things more clear:

	Hindi	Urdu
unaspirated, not retroflex	द	د
aspirated, not retroflex	ध	دھ
unaspirated, retroflex	ड	ڈ
aspirated, retroflex	ढ	ڈھ

So, in Urdu, to make a letter aspirated, we just slap on a “ھ” after the letter and to make it retroflex we add a little “ط” on top of the letter.

I should note that early South Asian linguists were accutely aware of how to classify and cluster sounds and the order of the Devanagari abudiga reflects this; “ध” comes right after “द” for example.

I’m very loose with the terms “Hindi” and “Urdu”, but generally speaking when I talk about “Hindi” I’m talking about the version of Hindustani/Hindi/Urdu that is written using the Devanagari script and whose vocabulary leans more Sanskrit that Persian/Arabic. “Urdu” is the version that is written using a modified version of the Arabic script and whose vocabulary leans more Persian/Arabic. Depending on the context, I’ll say that I’m speaking Hindi or Urdu without really making any changes to how I’m speaking the language. Other times I’ll hear people speaking Urdu that has so many Persian/Arabic words that I struggle to understand what’s going on – the same is true of Hindi. ↩
There is a “base case” when a consonant doesn’t have an apparent attached vowel. This means that you add a schwa. In Hindi this means adding a short “aa” sound. ↩
I should note that I speak midwestern US-American English, so when I’m talking about making sounds in English, this is the place I’m coming from. Your variety of spoken English may involve different tongue placement when producing the same written letter. Whether English is your native language or your third langauge, the way you speak it is every bit as valid as my way of speaking it, and don’t let anyone tell you different. Fuck cultural hegemonies and fuck imperialism. ↩