Tesseract 3.03 данные английского языка

Question

tesseract-ocr

Tesseract 3.03 данные английского языка

5708

MarAja 2014-05-26 в 11:44

Tesseract 3.03 был выпущен недавно, и я только что установил его. Тем не менее, данные на английском языке не предоставляются при загрузке (с https://launchpad.net/ubuntu/+source/tesseract/3.03.03-1 ). На веб-сайте Tesseract есть ссылка «Загрузить», но вы можете найти только «Данные по английскому языку для Tesseract 3.02». Где я могу найти их на 3.03?

4

2 ответа на вопрос

4

1

Alasdair 2014-07-16 в 11:06

Вы можете использовать языковые данные с 3.02 по 3.03 RC.

Также обратите внимание, что 3.03 еще не был выпущен официально. Это сборка RC.

Accepted Answer · 2014-10-01 15:06:59

As mentioned by others you can use 3.02's english language packs for 3.03. Below are the instructions:

Download and unzip from here : 1

Install pre-requisites and unzip

`sudo apt-fast install -y libicu-dev libpango1.0-dev libcairo2-dev` `tar xfv tesseract-ocr-3.02.eng.tar.gz`

Extract Tesseract's English data pack to tessdata directory inside tesseract-3.03 directory. Assuming both(English language data and tesseract source .tar.gz files) are in the same folder

tar zxvf tesseract-ocr-3.02.eng.tar.gz

mv tesseract-ocr/tessdata/. tesseract-3.03/tessdata/

4.Go back to tesseract's directory and finish the installation

cd tesseract-3.03 ./autogen.sh ./configure make -j sudo make install LANGS="eng" sudo ldconfig

Now test your installation with the test image in the directory

tesseract phototest.tif ans -l eng cat ans.txt

Output:

This is a lot of 12 point text to test the ocr code and see if it works on all types of file format.

The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.

NOTE: some lines have wrong formatting...any advice to correct those would be great

Tesseract 3.03 данные английского языка

2 ответа на вопрос

Похожие вопросы