DECA-188 | Fluid Project Issues Archive

Metadata

Source: DECA-188
Type: Bug
Priority: Major
Status: Open
Resolution: N/A
Assignee: N/A
Reporter: Jonathan Hung
Created: 2011-10-31T12:13:01.330-0400
Updated: 2013-01-27T12:21:44.954-0500
Versions: 0.5

0.6

0.7
Fixed Versions: Future
Component: genpdf

Description

The quality of the OCR'ed text for Type 2 PDF is poor when using reasonably well photographed documents. The expectation is to have more legible / machine readable text generated.

Image 1 - generated using original computer generated document (See attached 1-1-1.png).
Image 2 - photograph of Image 1 (see: http://source.fluidproject.org/svn/design/decapod/testing-images/2-1-1.png ).
PDF 1 - the Type 2 generated PDF of Image 1 (See attached 1-1-1-t2.pdf).
PDF 2 - the Type 2 generated PDF of Image 2 (See attached 2-1-1[t2].pdf).

Type 2 results for PDF 1 and PDF 2 should be comparable quality?

Attachments

Comments

Jonathan Hung commented 2011-10-31T12:15:10.263-0400

Original computer generated image.
Jonathan Hung commented 2011-10-31T12:59:44.099-0400

generated PDF of image 1-1-1.png
Jonathan Hung commented 2011-10-31T13:01:01.362-0400

generated PDF of image 2-1-1.png (photograph of computer generated document).
tamir@tamirhassan.com commented 2013-01-27T12:21:44.954-0500

I've tried it out on the current (latest) version and can't notice any significant difference between the OCR quality of either document.