OCR для номеров с серым шумным фоном

Question

OCR для номеров с серым шумным фоном

4069

ItsMe 2014-06-19 в 19:39

Я попытался запустить OCR на нескольких отсканированных листах с номерами, подобными этому изображению (все с одним фоном, только цифры):

OCR для номеров с серым шумным фоном

Но все испытания провалились! Я пробовал использовать OCR в автономном режиме: Gocr, Tesseract и несколько онлайн-распознавателей; но все ПОЛНОСТЬЮ провалилось!

Что я должен делать?

7

3 ответа на вопрос

8

2

Vitalik 2014-07-15 в 13:10

I tried to recognise your image with OCR technology by ABBYY: OCR SDK result

More information about ABBYY's products you can find at abbyy.com.
I work for ABBYY and ready to help, if you have questions.

Есть ли режим только для цифр? Чтобы увеличить скорость обнаружения поцарапанных изображений? ItsMe 10 лет назад 0

0

jram 2018-10-30 в 12:33

 import cv2 import numpy as np import pytesseract  im= cv2.imread('noisyNumbers.png',cv2.IMREAD_GRAYSCALE)  cv2.imshow('Gray', im) cv2.imwrite('noisyNumbers.jpg', im)  print(pytesseract.image_to_string(Image.open('noisyNumbers.jpg')))

Добро пожаловать в Супер пользователя! Можете ли вы [отредактировать] свой ответ, чтобы объяснить код, который вы дали выше? Спасибо! bertieb 6 лет назад 1

Accepted Answer · 2014-06-19 20:30:26

First you must tweak those images. I recommend a batch tool like XnViewMP which is free and multiplatform.

It has a file explorer. Select all your images, then go to Tools - Batch convert. Add actions like I did:

XNViewMP - Batch convert - Actions tab

Here are my actions:

HLS - make it grayscale:
- Hue: 0
- Lightness: 0
- Saturation: -127
Levels - lower black level a bit so that the gray noise will disappear
- Black point: 0
- White point: 212 - may vary depending on image
Reduce noise filter
Adjust for increasing the contrast
- Brightness: 0
- Contrast: 127 - this one matters
- Gamma: 1.06
Minimum for making the black thicker
- Filter size: 5x5 - may vary depending on image

Don't forget to save as tiff (See Output tab). After that I run tesseract:

tesseract test.tif text -psm 7

Note I selected PSM mode 7: Treat the image as a single text line. If you have multiple lines you'll probably need to use mode 6 or 3.

And here are the contents of text.txt output file:

570 394 666 638 043

OCR для номеров с серым шумным фоном

3 ответа на вопрос

Похожие вопросы