Paper Review: High Performance Layout Analysis for Arabic and Urdu

Posted on 2012-07-25 :: Tags: layout-analysis, arabic, urdu, document-processing, ocr

Authors: Syed Saqib Bukhari, Faisal Shafait, and Thomas M. Breuel

Keywords:

ridge
printed text
non-text segmentation
gaussian-filter bank
reading order

Q1: How is line skew determined?

There is a $\theta$ parameter in the Gaussian kernel which is used to produce ridges. This may be used in detecting the skew, but since it’s constant for an entire page, a varying line skew will probably decrease its performance.

Q2: How are non-text portions detected?

The paper does not include a description but cites “S. S. Bukhari, F. Shafait, and T. M. Breuel, ‘Improved document image segmentation algorithm using multiresolution morphology,’ in Proc. SPIE Document Recognition and Retrieval XVIII, San Jose, CA, USA, Jan. 2011” as a source for an improved technique.

Q3: Which heuristics are used in reading order determination?

Breuel is reported to have an algorithm in “T. M. Breuel, ‘High performance document layout analysis,’ in Symposium on Document Image Understanding Technology, Greenbelt, MD, USA, April 2003.” The paper says the authors modified the algorithm for right-to-left scripts. No further details are provided.

Q4: How large is the dataset, and what does it contain?

25 Arabic documents and 20 Urdu documents are used.

Q5: Are there any techniques applicable to divans?

There might be, if any of them were described in detail. We already have more sophisticated text line detection techniques. For the others, I’ll need to read the cited works.

Table of Contents

Q1: How is line skew determined?

Q2: How are non-text portions detected?

Q3: Which heuristics are used in reading order determination?

Q4: How large is the dataset, and what does it contain?

Q5: Are there any techniques applicable to divans?