Text Extraction

Extract text using the GetText method of the PDFDocument class.

Unlicensed versions of DynamicPDF Core Suite has a 256-character limitation. This limitation is removed for licensed copies of DynamicPDF Core Suite.

To extract text from a particular PDF page, use the GetText method of the PdfPage class. The text returned from the GetText method is a string. Examples of both are provided below.

The following points are essential when using one of the GetText methods listed above for extracting text from within a PDF.

  • The text part of an image, a form field, or a note/comment is not extracted.
  • Text is extracted from a PDF in the order the PDF operators are loaded in the existing PDF.
  • During evaluation mode, text extraction is limited to 256 characters.

Example Extracting from Document

The following example illustrates extracting text from an existing PDF document.

// Create the PDF document object
PdfDocument pdfA = new PdfDocument(pdfFilePath);

// Call the GetText method from PDF document object to get the text from the document
string extractedText = pdfA.GetText();
'Create PDF document object
Dim pdfA As PdfDocument = New PdfDocument(pdfFilePath)

'Call the GetText method from PDF document object to get the text from the document
Dim extractedText As String = pdfA.GetText()    

Refer to the PdfDocument.GetText API documentation for a complete example.

Example Extracting from Page

If extracting text from a specific page, the following code illustrates extracting text from a specified page within a PDF. Note that by calling the specific Page of the PdfDocument instance's Pages property, it returns the particular PDFPage, which then calls its GetText method to extract the text from that page.

// Create the PDF document object
PdfDocument pdfA = new PdfDocument(pdfFilePath);

// Call the GetText method of the PDF page to get the text from that page
string extractedText = pdfA.Pages[1].GetText();
'Create the PDF document object
Dim pdfA As PdfDocument = New PdfDocument(pdfFilePath)

'Call the GetText method of the PDF page to get the text from that page
Dim extractedText As String = pdfA.Pages.Item(1).GetText()      

Refer to the PdfPage.GetText API documentation for a complete example.

The GetText method is also overloaded to extract text from a specific area within a page.

public string GetText(float x, float y, float width, float height)

The following code illustrates extracting text from a specific area within a page.

// Create the PDF document object
PdfDocument pdfA = new PdfDocument(pdfFilePath);

// Call the GetText method of the PDF page to get the text from the specified area 
string extractedText = pdfA.Pages[1].GetText(x, y, width, height);
' Create the PDF document object
Dim pdfA As PdfDocument = New PdfDocument(pdfFilePath)

' Call the GetText method of the PDF page to get the text from the specified area 
Dim extractedText As String = pdfA.Pages.Item(1).GetText(X, Y, Width, Height)

In this topic