Undergrad Research Project - Text Detection for Object Recognition

Fall 2017

Student
Sue Hur
Advisor
Kris Kitani
Project description

Motivation/Problem Statement: Recognizing objects can be challenging especially when the objects share visual similarities. For example, distinguishing the type of teas in a similarly sized and shaped box can be difficult if we are using the whole image of the object in training the system to distinguish objects.

Proposed Research: In this research, with the goal of improving the accuracy of object recognition, I will focus utilizing text information from 2D images to distinguish objects based on their labels. I will use existing algorithms such as maximally stable extremal regions (MSER) feature detection [1] for detecting text regions. The approach for the research is written below:

Still Images. Text Detection: Checking if there is a text (e.g., a single word) in the input image Text Localization: If text is detected, find the bounding box of the text

Image Streams (video data). Real-time Text Detection: Checking if there is text in the clips of the input video (approx. 30 fps). Real-time Text Localization: If text is detected, find the bounding box of the text

Evaluation. I will compare the accuracy of correctly recognizing and distinguishing objects between two recognition systems below: Image-based Object Recognition: Image-based object recognition uses photos of objects as feeds to train the machine learning algorithm Text-based Object Recognition: Text-based object recognition uses texts (labels) of objects as feeds to train the machine learning algorithm.

Expected Outcomes: A written report or presentation slides of the empirical comparisons of each methods in terms of accuracy.

References: [1] Chen, Huizhong, et al. "Robust Text Detection in Natural Images with Edge-Enhanced Maximally Stable Extremal Regions." Image Processing (ICIP), 2011 18th IEEE International Conference on. IEEE, 2011.

Return to project list