Back
Artificial Intelligence

Optical Character Recognition

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast)

Stack (OS & PL & FW)

Linux/Windows

Python

Tesseract/EasyOCR

Dataset:

Huge amount of images of the target language font with corresponding text files to train model.

Hardware (Resources) (Storage & Compute Power & Time)

For training we need 64 GB ram with the latest processors.

For just inference we can run it on any minimum requirements.

Workflow (Processing)

1. Collect the data set with images and text files.

2. Make a bounding box and LSTM files of images.

3. Train the model.

4. Test the model and its accuracy.

5. Integrate with the API.

End To End (Development & Integration in a System)

API Development

Environment Setup

Engine Installation and Configuration Model

Deployment Server and Route Setup

Testing

Deployment (Server / API)

Linux/Windows with Python Flask API

Applications (General Real World Use)

Number plate recognition

Digitization of hard copies/ books customized font creation language translation

Use Case (Our Specific)

Scanning printed documents into versions that can be edited with word processors. Text extraction from newspapers and news tickers.

Task

Optical Character Recognition

  • Strategy

    EasyOCR, Tesseract

Leave a Reply

Your email address will not be published. Required fields are marked *