Modern Turkish spells all words of Turkish, Arabic, or Farsi origin according to their pronunciation. When it comes to converting from one system to another, this creates a problem that can be addressed with the aid of regular expressions.

For example, in Ottoman, a word is spelled as mnwr (mim-nun-vav-ra), following the Arabic orthography, but in Modern Turkish, the spelling reflects the pronunciation as münevver. Since a one-to-one mapping is not possible between these two writing systems, a set of possible Ottoman spellings must be produced using a regular expression.

When the parser sees münevver, it should convert this to a pattern like mv?nh?(a1)?ww?h?r. This pattern produces a set of strings, where mvnha1vvhr might be the longest and mnwr the shortest. By performing a dictionary search, the system can verify that there is a word in Ottoman spelled as mnwr, thus selecting it as the correct spelling.

The dictionary in our study consists of word labels. The system will look up a set of handwritten word images after this label lookup and search for these images in the document. It can also create a set of candidate images from the regular expression by rendering each candidate spelling.