PDF OCR

Optical Character Recognition is commonly used for recognizing text in scanned documents. You can use the OCR API for recognizing texts in scanned documents, images .etc. PDF files can be created with scanned images and pictures of text, without much difference in the quality of content from the source image. OCR method is aptly used in this feature of PDF4me.

Code sample

Try the API in the language you prefer

  • C#
  • Java
  • JavaScript
  • PHP
  • Python
  • Ruby
// setup recognizeDocument object
var recognizeDocument = new RecognizeDocument()
{
    // document
    Document = new Document()
    {
        DocData = File.ReadAllBytes("myPdf.pdf"),
        Name = "myPdf.pdf",
    },
    // action
    OcrAction = new OcrAction()
    {
        OutputType = OcrActionOutputType.PdfSearchable
    },
};

// conversion
var res = Pdf4me.Instance.OcrClient.RecognizeDocumentAsync(recognizeDocument);

// extract the json and write it to disk
File.WriteAllText("generatedPdf.pdf", res.StructuredDataJson);

Important Links

Swagger: PDF OCR