The Farsi language contains a significant number of Arabic origin words. As any learner of Farsi as a second language knows, these words can be some of the most difficult to remember; the repeated patterns of recurring affixes, ( م، ت، ا، و، ی، است etc.), can make Arabic origin words blend together in a student’s mind. Fortunately, the morphological patterns that make these words tricky can also be used to aide vocabulary retention. With few exceptions, Arabic origin words are based on a tri-consonant root which is modified by anywhere from zero to three affixes. The utility in this is that a given set of affixes often has a similar effect on the meaning of different roots. Likewise, words derived from the same root but modified by different affix sets also often retain some semantic connection. Of course this morphological to semantic correlation is not absolute and many exceptions do exist, but it is pervasive enough to be useful.

Basic Description

The Parsarabica web-app contains several thousand Farsi Arabic origin words, displayed by their tri-consonant root, by their morphological pattern, or their phonological pattern. When a user enters an Arabic origin word and selects “Root”, Parsarabica returns a page containing the tri-consonant root of the entered word along with its derivatives (co-derivatives). If “Morph” is selected, Parsarabica returns all the words with the same morphological pattern as the entered word (co-morphs). If “Phono” is selected, Parsarabica returns a table with all the words sharing both the same set of affixes and the same short vowels (co-phonos). In all three cases, each word is displayed in a table row with a link to its other entries in Parsarabica, as well as a link to a query of,, and Google Translate. The left-most column contains a graphic indicating the relative frequency of the word in the Uppsala Persian Corpus.


There are several ways that Parsarabica may benefit you as a Farsi language learner. The most obvious is that you can use it to familiarize yourself with tri-consonant roots and the affixes that modify them. You are likely already aware of a few rules governing Arabic origin word construction, and the best way to learn these rules it to see them manifest in as many words as possible. Viewing the Root or Morph table for a word and then following the links to other words in the table will allow you to see co-morphs and co-derivatives adjacent to each other. This will help you make distinctions among co-morphs, which are easy to conflate, and make associations between co-derivatives, co-derivatives which can appear unrelated, but in fact often share a semantic connection to the root concept.

You can also use Parsarabica for finding the root of a word to facilitate retention. By relating the word to its root and co-derivates (some of which you may already know) you can strengthen your recall of the word by forming multiple connections to it. As the you explore the definitions of the co-derivatives you may find it useful to think of co-derivatives as falling under an umbrella concept that encompasses most of the words under it.

A third way for you to use Parsarabica is to learning to predict the short vowels of new words. Short vowels are rarely written in Farsi, so generally pronunciation must be memorized. Parsarabica provides a way to reduce this task by exhibiting the degree to which each morphological pattern has a consistent short vowel pattern. Morphological patterns containing only one short vowel pattern only require the student to associate the pronunciation with the morph, while morphs with many short vowel patterns may require the student to memorize pronunciation for individual words.

Making Parsarabica

While learning Farsi as a student, I found Arabic origin words to be challenging. Gradually, with more exposure as well as guidance from my professors, I began to create a web of these words in my mind. Whenever I learned a new Arabic origin word I would figure out what its root was and look for other words with the same root. Unfortunately the only way I could do this was to guess which other affix sets might modify the root, and type this into Google Translate or another dictionary. This was tedious and inefficient; I searched for many constructions that don't exist as Farsi words and missed countless others that would have been useful vocabulary to learn. I wanted find an application that allowed me to type in a word, get the root, see what other words share this root, and what other words share this pattern of root modification. After searching in vain for such a tool, I eventually decided to create it myself. In my spare time over the past couple years I’ve learned enough Python, HTML and jQuery to develop Parsarabica using the Flask micro-framework and deploy it to an EC2 instance behind an Nginx reverse proxy, where it is freely available for anyone to use. The logic of Parsarabica is fairly simple: it loops through a list of roots and generates all possible Arabic constructions by applying each of the existing affix sets to the root. The construction is then checked against the Uppsala Persian Corpus to see if it actually exists as a Farsi word, and if so how frequently it appeared in the corpus. Each word is tagged with key value pairs denoting a variety of attributes and then inserted into a database, which is queried by the user.