PdfDocument.GetText

Overloads

GetText()Gets the text in all the pages.
GetText(TextExtractionOrder)Gets the text in all the pages.

GetText()

Gets the text in all the pages.

public string GetText()
Function GetText() As String

Returns

String

A string containing the text in all the pages.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

Examples

The following example will extract the entire text in the given PDF documents.
Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
         
Module MyModule
         
    Sub Main()
         
        ' Create PDF document object         
        Dim pdfA As PdfDocument = New PdfDocument( "C:\TimeMachine.pdf")        
         
        ' Call GetText method from PDF document object to get the text from the document
        Dim extractedText As String = pdfA.GetText()
         		
    End Sub
End Module
using System;
using ceTe.DynamicPDF.Merger;

Public Class Example
{
    public static void GetText(string inputPath)
    {
        // Create PDF document object
        PdfDocument pdfA = new PdfDocument(inputPath);

        // Call GetText method from PDF document object to get the text from the document
        string extractedText = pdfA.GetText();
    }
}

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes. With some of the .Net runtimes (example: .Net Core 2.0), Text extraction will give the error "No data is available for encoding 1252. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.". To resolve this error refer the user manual page Encoding Considerations.

GetText(TextExtractionOrder)

Gets the text in all the pages.

public string GetText(TextExtractionOrder textExtractionOrder)
Function GetText(textExtractionOrder As TextExtractionOrder) As String

Parameters

textExtractionOrder
TextExtractionOrder

Order in which text has to be extracted.

Returns

String

A string containing the text in all the pages.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

Examples

The following example will extract the entire text in the given PDF documents.
Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
         
Module MyModule
         
    Sub Main()
         
        ' Create PDF document object         
        Dim pdfA As PdfDocument = New PdfDocument( "C:\TimeMachine.pdf")        
         
        ' Call GetText method from PDF document object to get the text from the document
        Dim extractedText As String = pdfA.GetText(TextExtractionOrder.Visible)
         		
    End Sub
End Module
using System;
using ceTe.DynamicPDF.Merger;

Public Class Example
{
    public static void GetText(string inputPath)
    {
        // Create PDF document object
        PdfDocument pdfA = new PdfDocument(inputPath);

        // Call GetText method from PDF document object to get the text from the document
        string extractedText = pdfA.GetText(TextExtractionOrder.Visible);
    }
}

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes. With some of the .Net runtimes (example: .Net Core 2.0), Text extraction will give the error "No data is available for encoding 1252. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.". To resolve this error refer the user manual page Encoding Considerations.

See Also

PdfDocument
ceTe.DynamicPDF.Merger

In this topic