Quick PDF logo

DAExtractPageText

Extraction, Direct access functionality, Page manipulation

Description

This function provides two different methods for extracting text from the selected page, and presents the results in a variety of formats.

The DASetTextExtractionWordGap, DASetTextExtractionOptions and DASetTextExtractionArea functions can be used to adjust the text extraction process.

Syntax

Delphi

function TQuickPDF0813.DAExtractPageText(FileHandle, PageRef, 
  Options: Integer): WideString;

ActiveX

Function QuickPDF0813.PDFLibrary::DAExtractPageText(FileHandle As Long,
  PageRef As Long, Options As Long) As String

DLL

wchar_t * QuickPDFDAExtractPageText(int InstanceID, int FileHandle, int PageRef,
  int Options)

Parameters

FileHandle A handle returned by the DAOpenFile, DAOpenFileReadOnly or DAOpenFromStream functions
PageRef A page reference returned by the DAFindPage or DANewPage functions
Options Using the standard text extraction algorithm:
0 = Extract text in human readable format
1 = Deprecated
2 = Return a CSV string including font, color, size and position of each piece of text on the page
Using the more accurate text extraction algorithm:
3 = Return a CSV string for each piece of text on the page with the following format:
Font Name, Text Color, Text Size, X1, Y1, X2, Y2, X3, Y3, X4, Y4, Text
The co-ordinates are the four points bounding the text, measured in points (1/72 inch) with the bottom-left corner of the page as the origin. Co-ordinate order is anti-clockwise with the bottom left corner first.
4 = Similar to option 3, but individual words are returned, making searching for words easier
5 = Similar to option 3 but character widths are output after each line
6 = Similar to option 4 but character widths are output after each line
7 = Extract text in human readable format with improved accuracy compared to option 0
8 = Similar to option 7 but without layout formatting

Copyright © 2011 Debenu. All rights reserved. AboutContactBlogNewsletterSupportBuyForum