Learning Langueges

The language of a foreign text can often be identified by looking up characters specific to that language.

ABCDEFGHIJKLMNOPQRSTUVWXYZ (Latin alphabet)

and no other – English, Indonesian, Latin, Malay, Swahili, Zulu

AEIOUHKLMNPW' Hawaiian alphabet - Hawaiian

àäèéëïijöü – Dutch (Except for the ligature ij, these letters are very rare in Dutch. Even fairly long Dutch texts often have no diacritics.)

áêéèëïíîôóúû Afrikaans

êôúû – West Frisian

ÆØÅæøå – Danish, Norwegian

single diacritics, mostly umlauts

ÄÖäö – Finnish (BCDFGQWXZÅbcfgqwxzå are found only in names and loanwords, occasionally also ŠšŽž)

ÅÄÖåäö – Swedish (occasionally é)

ÄÖÕÜäöõü – Estonian (BCDFGQWXYZcfqwxyz are found only in names and loanwords, occasionally also ŠšŽž)

ÄÖÜẞäöüß – German

Circumflexes

ÇÊÎŞÛçêîşû – Kurdish

ĂÂÎȘȚăâîșț – Romanian

ÂÊÎÔÛŴŶÁÉÍÏâêîôûŵŷáéíï – Welsh; (ÓÚẂÝÀÈÌÒÙẀỲÄËÖÜẄŸóúẃýàèìòùẁỳäëöüẅÿ used also but much less commonly)

ĈĜĤĴŜŬĉĝĥĵŝŭ – Esperanto

Three or more types of diacritics

ÇĞİÖŞÜçğıöşü – Turkish

ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö – Icelandic

ÁÐÍÓÚÝÆØáðíóúýæø – Faroese

ÁÉÍÓÖŐÚÜŰáéíóöőúüű – Hungarian

ÀÇÉÈÍÓÒÚÜÏàçéèíóòúüï· – Catalan

ÀÂÆÇÉÈÊËÎÏÔŒÙÛÜŸàâæçéèêëîïôœùûüÿ – French; (Ÿ and ÿ are found only in certain proper names)

ÁÀÇÉÈÍÓÒÚËÜÏáàçéèíóòúëüï (· only in Gascon dialect) – Occitan

ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü Brazilian and k, w and y not in native words) – Portuguese

ÁÉÍÑÓÚÜáéíñóúü ¡¿ – Spanish

ÀÉÈÌÒÙàéèìòù – Italian

ÁÉÍÓÚáéíóú (ṗṡḋḟġċḃṁ can also be used instead of séimhú) – Irish

ÁÉÍÓÚÝÃẼĨÕŨỸÑG̃áéíóúýãẽĩõũỹñg̃ - Guarani (the only language to use g̃)

ÁĄĄ́ÉĘĘ́ÍĮĮ́ŁŃ áąą́éęę́íįį́łń (FQRVfqrv not in native words) – Southern Athabaskan languages

’ÓǪǪ́ āą̄ēę̄īį̄óōǫǫ́ǭúū – Western Apache

'ÓǪǪ́ óǫǫ́ – Navajo

’ÚŲŲ́ úųų́ – Chiricahua/Mescalero

ąłńóż Lechitic languages

ąćęłńóśźż Polish

ćśůź Silesian

ãéëòôù Kashubian

A, Ą, Ã, B, C, D, E, É, Ë, F, G, H, I, J, K, L, Ł, M, N, Ń, O, Ò, Ó, Ô, P, R, S, T, U, Ù, W, Y, Z, Ż – Kashubian

ČŠŽ

and no other – Slovene

ĆĐ – Bosnian, Croatian, Serbian Latin

ÁĎÉĚÍŇÓŘŤÚŮÝáďéěíňóřťúůý – Czech

ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý – Slovak

ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū – Latvian; (ŌŖ and ōŗ no longer used in most modern day Latvian)

ĄĘĖĮŲŪąęėįųū – Lithuanian

ĐÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬÈẺẼÉẸÊỀỂỄẾỆÌỈĨÍỊÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢÙỦŨÚỤƯỪỬỮỨỰỲỶỸÝỴ đàảãáạăằẳẵắặâầẩẫấậèẻẽéẹêềểễếệìỉĩíịòỏõóọồổỗốơờởỡớợùủũúụưừửữứựỳỷỹýỵ – Vietnamese

ꞗĕŏŭo᷄ơ᷄u᷄ – Middle Vietnamese

ā ē ī ō ū – May be seen in some Japanese texts in Rōmaji or transcriptions (see below) or Hawaiian and Māori texts.

é – Sundanese

ñ - Basque

أ ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه ؤ و ئ ى ي ء Arabic script

Arabic, Malay (Jawi), Kurdish (Soranî), Panjabi / Punjabi, Pashto, Sindhi, Urdu, others.

پ چ ژ گ – Persian (Farsi)

Brahmic family of scripts

Bengali script

অ আ কা কি কী উ কু ঊ কূ ঋ কৃ এ কে ঐ কৈ ও কো ঔ কৌ ক্ কত্‍ কং কঃ কঁ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ৰ ল ৱ শ ষ স হ য় ড় ঢ় ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯

used to write Bengali and Assamese.

Devanāgarī

अ आ इ ई उ ऊ ऋ ॠ ऌ ॡ ऍ ऎ ए ऐ ऑ ऒ ओ ओ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल ळ व श ष स ह ० १ २ ३ ४ ५ ६ ७ ८ ९ प् पँ पं पः प़ पऽ

used to write, either along with other scripts or exclusively, several Indian languages including Sanskrit, Hindi, Maithili, Magahi Marathi, Kashmiri, Sindhi, Bhili, Konkani, Bhojpuri and Nepali from Nepal.

Gurmukhi

ਅਆਇਈਉਊਏਐਓਔਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਲ਼ਵਸ਼ਸਹ

primarily used to write Punjabi as well as Braj Bhasha, Khariboli (and other Hindustani dialects), Sanskrit and Sindhi.

Gujarati script

અ આ ઇ ઈ ઉ ઊ ઋ ઌ ઍ એ ઐ ઑ ઓ ઔ ક ખ ગ ઘ ઙ ચ છ જ ઝ ઞ ટ ઠ ડ ઢ ણ ત થ દ ધ ન પ ફ બ ભ મ ય ર લ ળ વ શ ષ સ હ ૠ ૡૢૣ

used to write Gujarati and Kachchi

Tibetan script

ཀ ཁ ག ང ཅ ཆ ཇ ཉ ཏ ཐ ད ན པ ཕ བ མ ཙ ཚ ཛ ཝ ཞ ཟ འ ཡ ར ལ ཤ ས ཧ ཨ

used to write Standard Tibetan, Dzongkha (Bhutanese), and Sikkimese

កខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមសហយរលឡអវអ្កអ្ខអ្គអ្ឃអ្ងអ្ចអ្ឆអ្ឈអ្ញអ្ឌអ្ឋអ្ឌអ្ឃអ្ណអ្តអ្ថអ្ទអ្ធអ្នអ្បអ្ផអ្ពអ្ភអ្មអ្សអ្ហអ្យអ្រអ្យអ្លអ្អអ្វ អក្សរខ្មែរ (Khmer alphabet) - Khmer

กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะา฿เแโใไๅๆ๏<phone>๚๛ (Thai script) - Thai

ꦄꦅꦆꦇꦈꦉꦊꦋꦌꦍꦎꦏꦐꦑꦒꦓꦔꦕꦖꦗꦘꦙꦚꦛꦜꦝꦞꦟꦠꦡꦢꦣꦤꦥꦦꦧꦨꦩꦪꦫꦬꦭꦮꦯꦰꦱꦲ Javanese Script, also written in Arabic and English script- very similar to Balinese script in letters

ᮃᮄᮅᮆᮇᮈᮉᮊᮋᮌᮍᮎᮏᮐᮑᮒᮓᮔᮕᮖᮗᮘᮙᮚᮛᮜᮝᮞᮟᮠ Sundanese script, also written in Arabic and English script

ހށނރބޅކއވމފދތލގޏސޑޒޓޔޕޖޗ (Thaana) — Dhivehi

АБВГДЕЖЗИКЛМНОПРСТУФХЦЧШ (Cyrillic alphabet)

ЙЩЬЮЯ

Ъ – Bulgarian

ЁЫЭ

Ў, no Щ, І instead of И (Ґ in some variants) – Belarusian

rarely Ъ – Russian

ҐЄІЇ – Ukrainian

ЉЊЏ, Ј instead of Й (Vuk Karadžić's reform)

ЃЌЅ – Macedonian

ЋЂ – Serbian

ЄꙂꙀЗІЇꙈОуꙊѠЩЪꙐЬѢЮꙖѤѦѨѪѬѮѰѲѴҀ – Old Church Slavonic, Church Slavonic

Ӂ – Romanian in Transnistria (elsewhere in Latin)

ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ αβγδεζηθικλμνξοπρσςτυφχψω (Greek Alphabet) – Greek

אבגדהוזחטיכלמנסעפצקרשת (Hebrew alphabet)

and maybe some odd dots and lines above, below, or inside characters – Hebrew

פֿ; dots/lines below letters appearing only with א,י, and ו – Yiddish

no dots or lines around the letters, and more than a few words end with א (i.e., they have it at the leftmost position) – Aramaic

Ladino

漢字文化圈 – Some East Asian Languages

and no other – Chinese

with あいうえおの Hiragana and/or アイウエオノ Katakana – Japanese

위키백과에 (note commonplace ellipses and circles) Korean

ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨˋㄧㄣㄈㄨˊㄏㄠˋ (Bopomofo)

ㄪㄫㄬ -- not Mandarin

Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ (Armenian alphabet) – Armenian

ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ (Georgian alphabet) – Georgian

ⴰⴱⴲⴳⴴⴵⴶⴷⴸⴹⴺⴻⴼⴽⴾⴿⵀⵁⵂⵃⵄⵅⵆⵇⵈⵉⵊⵋⵌⵍⵎⵐⵑⵒⵓⵔⵕⵖⵗⵘⵙⵚⵛⵜⵝⵞⵠⵡⵢⵣⵤⵥⵦⵧ Tifinagh, a script used for Tamazight (Berber)

English words: a, an, and, in, of, on, the, that, to, is, what, I (I is always capital when talking about oneself)

letter sequences: th, ch, sh, wh, ough, augh, qu

word endings: -ing, -tion, -ed, -age, -s, -’s, -’ve, -n’t, -’d

vast majority of words end with a consonant, or sometimes with an e. Some common exceptions: who, to, so, no, do, a, and a few names like Julia.

diacritics or accents only in loanwords (piñata)

Dutch (Nederlands)

letter sequences ij (capitalized as IJ, and also found as a ligature, IJ or ij), ei, ou, au, oe, doubled vowels (but not ii), kw, ch, sch, oei, ooi, aai and uw (especially eeuw, ieuw, auw, and ouw).

all consonants, except h, j, q, v, w, x and z can be doubled.

the letters c (except in the sequence (s)ch), q, x and y are almost only found in loanwords.

words: het, op, en, een, voor (and compounds of voor).

word endings: -tje, -sje, -ing, -en, -lijk,

at the start of words: z-, v-, ge-

t/m occasionally occurs between two points in time or between numbers (e.g. house numbers).

West Frisian (Frysk)

letter sequences: ij, ei, oa

words: yn

Afrikaans (Afrikaans)

Words: 'n, as, vir, nie.

Similar to Dutch, but:

the common Dutch letters c and z are rare and used only in loanwords (e.g. chalet);

the common Dutch vowel ij is not used; instead, i and y are used (e.g. -lik, sy);

the common Dutch word ending -en is rare, being replaced by -e.

German (Deutsch)

umlauts (ä, ö, ü), ess-zett (ß)

letter sequences: ch, ck, sch, tsch, tz, ss,

common words: der, die, das, den, dem, des, er, sie, es, ist, ich, du, aber

common endings: -en, -er, -ern, -st, -ung, -chen, -tät

rare letters: x, y (except in loanwords)

letter c rarely used except in the sequences listed above and in loanwords

long compound words

a period (.) after ordinal numbers, e.g. 3. Oktober

many capitalised words in the middle of sentences since German capitalizes all nouns.

Swedish (Svenska)

letters å, ä, ö, rarely é

common words: och, i, att, det, en, som, är, av, den, på, om, inte, men

common endings: -ning, -lig, -isk, -ande, -ade, -era, -rna

common surname endings: -sson, -berg, -borg, -gren, -lund, -lind, -ström, -kvist/qvist/quist

long compound words

letter sequences: stj, sj, skj, tj, ck, än

no use of characters w, z except for foreign proper nouns and some loanwords but x is used, unlike Danish and Norwegian, which replace it with ks

doubling of consonants common, but doubling of vowels very rare

Danish (Dansk)

letters æ, ø, å

common words: af, og, til, er, på, med, det, den;

common endings: -tion, -ing, -else, -hed;

long compound words;

no use of character q, w, x and z except for foreign proper nouns and some loanwords;

to distinguish from Norwegian: uses letter combination øj; frequent use of æ; spellings of borrowed foreign words are retained (in particular use of c), such as centralstation.

doubling of consonants common (but not at the end of words, unlike Norwegian and Swedish), but doubling of vowels very rare

pre-1948 orthography: aa was used instead of å; all nouns were capitalized

Norwegian (Norsk)

letters æ, ø, å

common words: av, ble, er, og, en, et, men, i, å, for, eller;

common endings: -sjon, -ing, -else, -het;

long compound words;

no use of character c, w, z and x except for foreign proper nouns and some loanwords;

two versions of the language: Bokmål (much closer to Danish) and Nynorsk – for example ikke, lørdag, Norge (Bokmål) vs. ikkje, laurdag, Noreg (Nynorsk); Nynorsk uses the word òg; printed materials almost always published in Bokmål only;

to distinguish from Danish: uses letter combination øy; less frequent use of æ (mainly but not exclusively before r); spellings of borrowed foreign words are ‘Norsified’ (in particular removing use of c), such as sentralstasjon.

doubling of consonants common (including the end of words), but doubling of vowels very rare

Icelandic (Íslenska)

letters á, ð, é, í, ó, ú, ý, þ, æ, ö

common beginnings: fj-, gj-, hj-, hl-, hr-, hv-, kj-, and sj-,

common endings: -ar (especially -nar), -ir (especially -nir), -ur, -nn (especially -inn)

no use of character c, q, w, or z except for foreign proper nouns, some loanwords, and, in the case of z, older texts.

doubling of consonants common, but doubling of vowels very rare

Faroese (Føroyskt)

letters á, ð, í, ó, ú, ý, æ, ø

letter combinations: ggj, oy, skt

to distinguish from Icelandic: does not use é or þ, uses ø instead of ö (occasionally rendered as ö on road signs, or even ő).

doubling of consonants common, but doubling of vowels very rare