PdfPage.GetText

Overloads

GetText()Gets the text in the page.
GetText(Single, Single, Single, Single)Gets the text in the specified rectangle of the page.
GetText(Single, Single, Single, Single, TextExtractionOrder)Gets the text in the specified rectangle of the page.
GetText(TextExtractionOrder)Gets the text in the page.

GetText()

Gets the text in the page.

public string GetText()
Function GetText() As String

Returns

String

A string containing the text in the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

Examples

The following example will extract the entire text of the specified page in the given PDF documents.
Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
         
Module MyModule
         
    Sub Main()
         
        ' Create PDF document object         
        Dim pdfA As PdfDocument = New PdfDocument( "C:\Invoice.pdf")        
         
        ' Call GetText method from PDF document object to get the text from the document
        Dim extractedText As String = pdfA.Pages.Item(1).GetText()
         		
    End Sub
End Module
using System;
using ceTe.DynamicPDF.Merger;

public class Example 
{
    public static void GetText(string inputPath)
    {
        // Create PDF document object
        PdfDocument pdfA = new PdfDocument(inputPath);

        // Call GetText method from PDF document object to get the text from the document
        string extractedText = pdfA.Pages[0].GetText();
    }
}

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes. With some of the .Net runtimes (example: .Net Core 2.0), Text extraction will give the error "No data is available for encoding 1252. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.". To resolve this error refer the user manual page Encoding Considerations.

GetText(Single, Single, Single, Single)

Gets the text in the specified rectangle of the page.

public string GetText(float x, float y, float width, float height)
Function GetText(x As Single, y As Single, width As Single, height As Single) As String

Parameters

x
Single

X coordinate of the rectangle.

y
Single

Y coordinate of the rectangle.

width
Single

Width of the rectangle.

height
Single

Height of the rectangle.

Returns

String

A string containing the text in the specified rectangle of the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes.

GetText(Single, Single, Single, Single, TextExtractionOrder)

Gets the text in the specified rectangle of the page.

public string GetText(float x, float y, float width, float height, TextExtractionOrder textExtractionOrder)
Function GetText(x As Single, y As Single, width As Single, height As Single, textExtractionOrder As TextExtractionOrder) As String

Parameters

x
Single

X coordinate of the rectangle.

y
Single

Y coordinate of the rectangle.

width
Single

Width of the rectangle.

height
Single

Height of the rectangle.

textExtractionOrder
TextExtractionOrder

Order in which text has to be extracted.

Returns

String

A string containing the text in the specified rectangle of the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes.

GetText(TextExtractionOrder)

Gets the text in the page.

public string GetText(TextExtractionOrder textExtractionOrder)
Function GetText(textExtractionOrder As TextExtractionOrder) As String

Parameters

textExtractionOrder
TextExtractionOrder

Order in which text has to be extracted.

Returns

String

A string containing the text in the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

Examples

The following example will extract the entire text of the specified page in the given PDF documents.
Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
         
Module MyModule
         
    Sub Main()
         
        ' Create PDF document object         
        Dim pdfA As PdfDocument = New PdfDocument( "C:\Invoice.pdf")        
         
        ' Call GetText method from PDF document object to get the text from the document
        Dim extractedText As String = pdfA.Pages.Item(1).GetText()
         		
    End Sub
End Module
using System;
using ceTe.DynamicPDF.Merger;

public class Example 
{
    public static void GetText(string inputPath)
    {
        // Create PDF document object
        PdfDocument pdfA = new PdfDocument(inputPath);

        // Call GetText method from PDF document object to get the text from the document
        string extractedText = pdfA.Pages[0].GetText();
    }
}

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes. With some of the .Net runtimes (example: .Net Core 2.0), Text extraction will give the error "No data is available for encoding 1252. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.". To resolve this error refer the user manual page Encoding Considerations.

See Also

PdfPage
ceTe.DynamicPDF.Merger

In this topic