OCR Text Extraction

International Journal of Engineering and Management Research

View Publication Info
 
 
Field Value
 
Title OCR Text Extraction
 
Creator Alan Jiju
Shaun Tuscano
Chetana Badgujar
 
Subject OpenCV
Optical Character Reader (OCR)
Tesseract
Document Detection
 
Description This research tries to find out a methodology through which any data from the daily-use printed bills and invoices can be extracted. The data from these bills or invoices can be used extensively later on – such as machine learning or statistical analysis. This research focuses on extraction of final bill-amount, itinerary, date and similar data from bills and invoices as they encapsulate an ample amount of information about the users purchases, likes or dislikes etc. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine. Tesseract intends to apply Text Segmentation in order to extract written text in various fonts and languages. Our methodology proves to be highly accurate while tested on a variety of input images of bills and invoices.
 
Publisher Vandana Publications
 
Date 2021-04-30
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article
 
Format application/pdf
 
Identifier https://www.ijemr.net/ojs/index.php/ojs/article/view/747
10.31033/ijemr.11.2.11
 
Source International Journal of Engineering and Management Research; Vol. 11 No. 2 (2021): April Issue; 83-86
2250-0758
2394-6962
 
Language eng
 
Relation https://www.ijemr.net/ojs/index.php/ojs/article/view/747/837
 
Rights Copyright (c) 2021 International Journal of Engineering and Management Research
https://creativecommons.org/licenses/by-nc-nd/4.0
 

Contact Us

The PKP Index is an initiative of the Public Knowledge Project.

For PKP Publishing Services please use the PKP|PS contact form.

For support with PKP software we encourage users to consult our wiki for documentation and search our support forums.

For any other correspondence feel free to contact us using the PKP contact form.

Find Us

Twitter

Copyright © 2015-2018 Simon Fraser University Library