Table of Contents
- Authors: Syed Saqib Bukhari, Faisal Shafait and Thomas M. Breuel
- Keywords:
- Q1: How line skew is determined?
- Q2: How non-text portions are detected?
- Q3: Which heuristics are used in reading order determination?
- Q4: How large is the dataset and what does it contain?
- Q5: Are there any techniques applicable to divans?
Authors: Syed Saqib Bukhari, Faisal Shafait and Thomas M. Breuel
Keywords:
- ridge
- printed text
- non-text segmentation
- gaussian-filter bank
- reading order
Q1: How line skew is determined?
There is a
Q2: How non-text portions are detected?
The paper does not include a description, but cites "S. S. Bukhari, F. Shafait, and T. M. Breuel, “Improved document image segmentation algorithm using multiresolu- tion morphology,” in Proc. SPIE Document Recognition and Retrieval XVIII, San Jose, CA, USA, Jan. 2011" as a source for an improved technique.
Q3: Which heuristics are used in reading order determination?
Breuel is reported to have an algorithm in "T. M. Breuel, “High performance document layout analysis,” in Symposium on Document Image Understanding Technol- ogy, Greenbelt, MD, USA, April 2003." The paper says the authors modified the algorithm for right-to-left scripts. No details again.
Q4: How large is the dataset and what does it contain?
25 Arabic documents and 20 Urdu documents are used.
Q5: Are there any techniques applicable to divans?
There might be, if any of them described in detail. We already have more sophisticated text line detection techniques. For the other I'll need to read the cited works.