Skip to content

Meet Surya: A Multilingual Text Line Detection AI Model for Documents Pragati Jhunjhunwala Artificial Intelligence Category – MarkTechPost

  • by

In a recent tweet from the founder of Dataquest.io, Vik Paruchuri recently publicized the launch of a multilingual document OCR toolkit, Surya. The framework can efficiently detect line-level bboxes and column breaks in documents, scanned images, or presentations. The existing text detection models like Tesseract work at the word or character level, while this open-source AI works at the line level. The biggest challenge in building a text-line detection model is the unavailability of a hundred percent correct datasets with line-level annotations. 

Surya is an encoder-decoder model using an image of the document as input and produces an image with boxes drawn around the line boxes on the original input image. The initial layers of the decoder contain SegFormer, a transformer for semantic segmentation, while the 2d convolutional layer with batch-normalization layers makes the end of the decoder network. Before using the image or PDF, the pages are split into segments to the maximum dimension of the image and undergo various pre-processing. 

For model evaluation for the accuracy of bboxes, researchers used precision and recall on the coverage area instead of the traditional IoU metric (Intersection over union). The precision calculates how well predicted bboxes cover ground truth bboxes and recall calculates how well ground truth bboxes cover predicted bboxes. Surya is compared with Tesseract, experiments suggested that the precision of Surya is much higher than that of Tesseract, and Tesseract’s recall is slightly more than that of Surya but overall Surya outperforms Tesseract. Another advantage of Surya over the Tesseract model is that it can work both on CPU and GPU and is much faster than Tesseract.

Surya, named after the Hindu God of the Sun, has successfully worked on multiple languages and is expected to work on almost all languages. The limitation of this model is not likely to work on photos or other images as it is specialized on documents. Experiments also show it does not work well with images that look like ads. In spite of this limitation, the model is still of great use and can be further expanded to text detection, table, and chart detection.

Announcing surya – a multilingual text line detection model for documents. It gives you accurate line-level bboxes and column breaks.

Find it here – https://t.co/DD2HfwIG9i . pic.twitter.com/HVNkYdCixL

— Vik Paruchuri (@VikParuchuri) January 12, 2024

The post Meet Surya: A Multilingual Text Line Detection AI Model for Documents appeared first on MarkTechPost.

 In a recent tweet from the founder of Dataquest.io, Vik Paruchuri recently publicized the launch of a multilingual document OCR toolkit, Surya. The framework can efficiently detect line-level bboxes and column breaks in documents, scanned images, or presentations. The existing text detection models like Tesseract work at the word or character level, while this open-source
The post Meet Surya: A Multilingual Text Line Detection AI Model for Documents appeared first on MarkTechPost.  Read More AI Shorts, Applications, Artificial Intelligence, Editors Pick, Staff, Tech News, Technology, Uncategorized 

Leave a Reply

Your email address will not be published. Required fields are marked *