DECA-188: Type 2 output quality is poor for well photographed and scanned documents

Metadata

Source
DECA-188
Type
Bug
Priority
Major
Status
Open
Resolution
N/A
Assignee
N/A
Reporter
Jonathan Hung
Created
2011-10-31T12:13:01.330-0400
Updated
2013-01-27T12:21:44.954-0500
Versions
  1. 0.5
  2. 0.6
  3. 0.7
Fixed Versions
  1. Future
Component
  1. genpdf

Description

The quality of the OCR'ed text for Type 2 PDF is poor when using reasonably well photographed documents. The expectation is to have more legible / machine readable text generated.

Image 1 - generated using original computer generated document (See attached 1-1-1.png).
Image 2 - photograph of Image 1 (see: http://source.fluidproject.org/svn/design/decapod/testing-images/2-1-1.png ).
PDF 1 - the Type 2 generated PDF of Image 1 (See attached 1-1-1-t2.pdf).
PDF 2 - the Type 2 generated PDF of Image 2 (See attached 2-1-1[t2].pdf).

Type 2 results for PDF 1 and PDF 2 should be comparable quality?

Comments

  • Jonathan Hung commented 2011-10-31T12:15:10.263-0400

    Original computer generated image.

  • Jonathan Hung commented 2011-10-31T12:59:44.099-0400

    generated PDF of image 1-1-1.png

  • Jonathan Hung commented 2011-10-31T13:01:01.362-0400

    generated PDF of image 2-1-1.png (photograph of computer generated document).

  • tamir@tamirhassan.com commented 2013-01-27T12:21:44.954-0500

    I've tried it out on the current (latest) version and can't notice any significant difference between the OCR quality of either document.