As mentioned by others you can use 3.02's english language packs for 3.03. Below are the instructions:
- Download and unzip from here : 1
Install pre-requisites and unzip
`sudo apt-fast install -y libicu-dev libpango1.0-dev libcairo2-dev` `tar xfv tesseract-ocr-3.02.eng.tar.gz`
Extract Tesseract's English data pack to tessdata directory inside tesseract-3.03 directory. Assuming both(English language data and tesseract source .tar.gz files) are in the same folder
tar zxvf tesseract-ocr-3.02.eng.tar.gz
mv tesseract-ocr/tessdata/. tesseract-3.03/tessdata/
4.Go back to tesseract's directory and finish the installation
cd tesseract-3.03 ./autogen.sh ./configure make -j sudo make install LANGS="eng" sudo ldconfig
Now test your installation with the test image in the directory
tesseract phototest.tif ans -l eng cat ans.txt
Output:
This is a lot of 12 point text to test the ocr code and see if it works on all types of file format.
The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.
NOTE: some lines have wrong formatting...any advice to correct those would be great