Learning Langueges
The language of a foreign text can often be identified by looking up characters specific to that language.
ABCDEFGHIJKLMNOPQRSTUVWXYZ (Latin alphabet)
and no other – English, Indonesian, Latin, Malay, Swahili, Zulu
AEIOUHKLMNPW' Hawaiian alphabet - Hawaiian
àäèéëïijöü – Dutch (Except for the ligature ij, these letters are very rare in Dutch. Even fairly long Dutch texts often have no diacritics.)
áêéèëïíîôóúû Afrikaans
êôúû – West Frisian
ÆØÅæøå – Danish, Norwegian
single diacritics, mostly umlauts
ÄÖäö – Finnish (BCDFGQWXZÅbcfgqwxzå are found only in names and loanwords, occasionally also ŠšŽž)
ÅÄÖåäö – Swedish (occasionally é)
ÄÖÕÜäöõü – Estonian (BCDFGQWXYZcfqwxyz are found only in names and loanwords, occasionally also ŠšŽž)
ÄÖÜẞäöüß – German
Circumflexes
ÇÊÎŞÛçêîşû – Kurdish
ĂÂÎȘȚăâîșț – Romanian
ÂÊÎÔÛŴŶÁÉÍÏâêîôûŵŷáéíï – Welsh; (ÓÚẂÝÀÈÌÒÙẀỲÄËÖÜẄŸóúẃýàèìòùẁỳäëöüẅÿ used also but much less commonly)
ĈĜĤĴŜŬĉĝĥĵŝŭ – Esperanto
Three or more types of diacritics
ÇĞİÖŞÜçğıöşü – Turkish
ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö – Icelandic
ÁÐÍÓÚÝÆØáðíóúýæø – Faroese
ÁÉÍÓÖŐÚÜŰáéíóöőúüű – Hungarian
ÀÇÉÈÍÓÒÚÜÏàçéèíóòúüï· – Catalan
ÀÂÆÇÉÈÊËÎÏÔŒÙÛÜŸàâæçéèêëîïôœùûüÿ – French; (Ÿ and ÿ are found only in certain proper names)
ÁÀÇÉÈÍÓÒÚËÜÏáàçéèíóòúëüï (· only in Gascon dialect) – Occitan
ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü Brazilian and k, w and y not in native words) – Portuguese
ÁÉÍÑÓÚÜáéíñóúü ¡¿ – Spanish
ÀÉÈÌÒÙàéèìòù – Italian
ÁÉÍÓÚáéíóú (ṗṡḋḟġċḃṁ can also be used instead of séimhú) – Irish
ÁÉÍÓÚÝÃẼĨÕŨỸÑG̃áéíóúýãẽĩõũỹñg̃ - Guarani (the only language to use g̃)
ÁĄĄ́ÉĘĘ́ÍĮĮ́ŁŃ áąą́éęę́íįį́łń (FQRVfqrv not in native words) – Southern Athabaskan languages
’ÓǪǪ́ āą̄ēę̄īį̄óōǫǫ́ǭúū – Western Apache
'ÓǪǪ́ óǫǫ́ – Navajo
’ÚŲŲ́ úųų́ – Chiricahua/Mescalero
ąłńóż Lechitic languages
ąćęłńóśźż Polish
ćśůź Silesian
ãéëòôù Kashubian
A, Ą, Ã, B, C, D, E, É, Ë, F, G, H, I, J, K, L, Ł, M, N, Ń, O, Ò, Ó, Ô, P, R, S, T, U, Ù, W, Y, Z, Ż – Kashubian
ČŠŽ
and no other – Slovene
ĆĐ – Bosnian, Croatian, Serbian Latin
ÁĎÉĚÍŇÓŘŤÚŮÝáďéěíňóřťúůý – Czech
ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý – Slovak
ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū – Latvian; (ŌŖ and ōŗ no longer used in most modern day Latvian)
ĄĘĖĮŲŪąęėįųū – Lithuanian
ĐÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬÈẺẼÉẸÊỀỂỄẾỆÌỈĨÍỊÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢÙỦŨÚỤƯỪỬỮỨỰỲỶỸÝỴ đàảãáạăằẳẵắặâầẩẫấậèẻẽéẹêềểễếệìỉĩíịòỏõóọồổỗốơờởỡớợùủũúụưừửữứựỳỷỹýỵ – Vietnamese
ꞗĕŏŭo᷄ơ᷄u᷄ – Middle Vietnamese
ā ē ī ō ū – May be seen in some Japanese texts in Rōmaji or transcriptions (see below) or Hawaiian and Māori texts.
é – Sundanese
ñ - Basque
أ ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه ؤ و ئ ى ي ء Arabic script
Arabic, Malay (Jawi), Kurdish (Soranî), Panjabi / Punjabi, Pashto, Sindhi, Urdu, others.
پ چ ژ گ – Persian (Farsi)
Brahmic family of scripts
Bengali script
অ আ কা কি কী উ কু ঊ কূ ঋ কৃ এ কে ঐ কৈ ও কো ঔ কৌ ক্ কত্ কং কঃ কঁ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ৰ ল ৱ শ ষ স হ য় ড় ঢ় ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯
used to write Bengali and Assamese.
Devanāgarī
अ आ इ ई उ ऊ ऋ ॠ ऌ ॡ ऍ ऎ ए ऐ ऑ ऒ ओ ओ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल ळ व श ष स ह ० १ २ ३ ४ ५ ६ ७ ८ ९ प् पँ पं पः प़ पऽ
used to write, either along with other scripts or exclusively, several Indian languages including Sanskrit, Hindi, Maithili, Magahi Marathi, Kashmiri, Sindhi, Bhili, Konkani, Bhojpuri and Nepali from Nepal.
Gurmukhi
ਅਆਇਈਉਊਏਐਓਔਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਲ਼ਵਸ਼ਸਹ
primarily used to write Punjabi as well as Braj Bhasha, Khariboli (and other Hindustani dialects), Sanskrit and Sindhi.
Gujarati script
અ આ ઇ ઈ ઉ ઊ ઋ ઌ ઍ એ ઐ ઑ ઓ ઔ ક ખ ગ ઘ ઙ ચ છ જ ઝ ઞ ટ ઠ ડ ઢ ણ ત થ દ ધ ન પ ફ બ ભ મ ય ર લ ળ વ શ ષ સ હ ૠ ૡૢૣ
used to write Gujarati and Kachchi
Tibetan script
ཀ ཁ ག ང ཅ ཆ ཇ ཉ ཏ ཐ ད ན པ ཕ བ མ ཙ ཚ ཛ ཝ ཞ ཟ འ ཡ ར ལ ཤ ས ཧ ཨ
used to write Standard Tibetan, Dzongkha (Bhutanese), and Sikkimese
កខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមសហយរលឡអវអ្កអ្ខអ្គអ្ឃអ្ងអ្ចអ្ឆអ្ឈអ្ញអ្ឌអ្ឋអ្ឌអ្ឃអ្ណអ្តអ្ថអ្ទអ្ធអ្នអ្បអ្ផអ្ពអ្ភអ្មអ្សអ្ហអ្យអ្រអ្យអ្លអ្អអ្វ អក្សរខ្មែរ (Khmer alphabet) - Khmer
กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะา฿เแโใไๅๆ๏<phone>๚๛ (Thai script) - Thai
ꦄꦅꦆꦇꦈꦉꦊꦋꦌꦍꦎꦏꦐꦑꦒꦓꦔꦕꦖꦗꦘꦙꦚꦛꦜꦝꦞꦟꦠꦡꦢꦣꦤꦥꦦꦧꦨꦩꦪꦫꦬꦭꦮꦯꦰꦱꦲ Javanese Script, also written in Arabic and English script- very similar to Balinese script in letters
ᮃᮄᮅᮆᮇᮈᮉᮊᮋᮌᮍᮎᮏᮐᮑᮒᮓᮔᮕᮖᮗᮘᮙᮚᮛᮜᮝᮞᮟᮠ Sundanese script, also written in Arabic and English script
ހށނރބޅކއވމފދތލގޏސޑޒޓޔޕޖޗ (Thaana) — Dhivehi
АБВГДЕЖЗИКЛМНОПРСТУФХЦЧШ (Cyrillic alphabet)
ЙЩЬЮЯ
Ъ – Bulgarian
ЁЫЭ
Ў, no Щ, І instead of И (Ґ in some variants) – Belarusian
rarely Ъ – Russian
ҐЄІЇ – Ukrainian
ЉЊЏ, Ј instead of Й (Vuk Karadžić's reform)
ЃЌЅ – Macedonian
ЋЂ – Serbian
ЄꙂꙀЗІЇꙈОуꙊѠЩЪꙐЬѢЮꙖѤѦѨѪѬѮѰѲѴҀ – Old Church Slavonic, Church Slavonic
Ӂ – Romanian in Transnistria (elsewhere in Latin)
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ αβγδεζηθικλμνξοπρσςτυφχψω (Greek Alphabet) – Greek
אבגדהוזחטיכלמנסעפצקרשת (Hebrew alphabet)
and maybe some odd dots and lines above, below, or inside characters – Hebrew
פֿ; dots/lines below letters appearing only with א,י, and ו – Yiddish
no dots or lines around the letters, and more than a few words end with א (i.e., they have it at the leftmost position) – Aramaic
Ladino
漢字文化圈 – Some East Asian Languages
and no other – Chinese
with あいうえおの Hiragana and/or アイウエオノ Katakana – Japanese
위키백과에 (note commonplace ellipses and circles) Korean
ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨˋㄧㄣㄈㄨˊㄏㄠˋ (Bopomofo)
ㄪㄫㄬ -- not Mandarin
Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ (Armenian alphabet) – Armenian
ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ (Georgian alphabet) – Georgian
ⴰⴱⴲⴳⴴⴵⴶⴷⴸⴹⴺⴻⴼⴽⴾⴿⵀⵁⵂⵃⵄⵅⵆⵇⵈⵉⵊⵋⵌⵍⵎⵐⵑⵒⵓⵔⵕⵖⵗⵘⵙⵚⵛⵜⵝⵞⵠⵡⵢⵣⵤⵥⵦⵧ Tifinagh, a script used for Tamazight (Berber)
English words: a, an, and, in, of, on, the, that, to, is, what, I (I is always capital when talking about oneself)
letter sequences: th, ch, sh, wh, ough, augh, qu
word endings: -ing, -tion, -ed, -age, -s, -’s, -’ve, -n’t, -’d
vast majority of words end with a consonant, or sometimes with an e. Some common exceptions: who, to, so, no, do, a, and a few names like Julia.
diacritics or accents only in loanwords (piñata)
Dutch (Nederlands)
letter sequences ij (capitalized as IJ, and also found as a ligature, IJ or ij), ei, ou, au, oe, doubled vowels (but not ii), kw, ch, sch, oei, ooi, aai and uw (especially eeuw, ieuw, auw, and ouw).
all consonants, except h, j, q, v, w, x and z can be doubled.
the letters c (except in the sequence (s)ch), q, x and y are almost only found in loanwords.
words: het, op, en, een, voor (and compounds of voor).
word endings: -tje, -sje, -ing, -en, -lijk,
at the start of words: z-, v-, ge-
t/m occasionally occurs between two points in time or between numbers (e.g. house numbers).
West Frisian (Frysk)
letter sequences: ij, ei, oa
words: yn
Afrikaans (Afrikaans)
Words: 'n, as, vir, nie.
Similar to Dutch, but:
the common Dutch letters c and z are rare and used only in loanwords (e.g. chalet);
the common Dutch vowel ij is not used; instead, i and y are used (e.g. -lik, sy);
the common Dutch word ending -en is rare, being replaced by -e.
German (Deutsch)
umlauts (ä, ö, ü), ess-zett (ß)
letter sequences: ch, ck, sch, tsch, tz, ss,
common words: der, die, das, den, dem, des, er, sie, es, ist, ich, du, aber
common endings: -en, -er, -ern, -st, -ung, -chen, -tät
rare letters: x, y (except in loanwords)
letter c rarely used except in the sequences listed above and in loanwords
long compound words
a period (.) after ordinal numbers, e.g. 3. Oktober
many capitalised words in the middle of sentences since German capitalizes all nouns.
Swedish (Svenska)
letters å, ä, ö, rarely é
common words: och, i, att, det, en, som, är, av, den, på, om, inte, men
common endings: -ning, -lig, -isk, -ande, -ade, -era, -rna
common surname endings: -sson, -berg, -borg, -gren, -lund, -lind, -ström, -kvist/qvist/quist
long compound words
letter sequences: stj, sj, skj, tj, ck, än
no use of characters w, z except for foreign proper nouns and some loanwords but x is used, unlike Danish and Norwegian, which replace it with ks
doubling of consonants common, but doubling of vowels very rare
Danish (Dansk)
letters æ, ø, å
common words: af, og, til, er, på, med, det, den;
common endings: -tion, -ing, -else, -hed;
long compound words;
no use of character q, w, x and z except for foreign proper nouns and some loanwords;
to distinguish from Norwegian: uses letter combination øj; frequent use of æ; spellings of borrowed foreign words are retained (in particular use of c), such as centralstation.
doubling of consonants common (but not at the end of words, unlike Norwegian and Swedish), but doubling of vowels very rare
pre-1948 orthography: aa was used instead of å; all nouns were capitalized
Norwegian (Norsk)
letters æ, ø, å
common words: av, ble, er, og, en, et, men, i, å, for, eller;
common endings: -sjon, -ing, -else, -het;
long compound words;
no use of character c, w, z and x except for foreign proper nouns and some loanwords;
two versions of the language: Bokmål (much closer to Danish) and Nynorsk – for example ikke, lørdag, Norge (Bokmål) vs. ikkje, laurdag, Noreg (Nynorsk); Nynorsk uses the word òg; printed materials almost always published in Bokmål only;
to distinguish from Danish: uses letter combination øy; less frequent use of æ (mainly but not exclusively before r); spellings of borrowed foreign words are ‘Norsified’ (in particular removing use of c), such as sentralstasjon.
doubling of consonants common (including the end of words), but doubling of vowels very rare
Icelandic (Íslenska)
letters á, ð, é, í, ó, ú, ý, þ, æ, ö
common beginnings: fj-, gj-, hj-, hl-, hr-, hv-, kj-, and sj-,
common endings: -ar (especially -nar), -ir (especially -nir), -ur, -nn (especially -inn)
no use of character c, q, w, or z except for foreign proper nouns, some loanwords, and, in the case of z, older texts.
doubling of consonants common, but doubling of vowels very rare
Faroese (Føroyskt)
letters á, ð, í, ó, ú, ý, æ, ø
letter combinations: ggj, oy, skt
to distinguish from Icelandic: does not use é or þ, uses ø instead of ö (occasionally rendered as ö on road signs, or even ő).
doubling of consonants common, but doubling of vowels very rare