PdfPage.GetText

Namespace:: ceTe.DynamicPDF.Merger

Assemblies:: DynamicPDF.CoreSuite.dll

Overloads

GetText()	Gets the text in the page.
GetText(Single, Single, Single, Single)	Gets the text in the specified rectangle of the page.
GetText(Single, Single, Single, Single, TextExtractionOrder)	Gets the text in the specified rectangle of the page.
GetText(TextExtractionOrder)	Gets the text in the page.

GetText()

Gets the text in the page.

public string GetText()

Function GetText() As String

Returns

String

A string containing the text in the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

An active DynamicPDF Ultimate Subscription
An active DynamicPDF Professional or Professional Plus Subscription with DynamicPDF Core Suite selected.
A DynamicPDF Core Suite for .NET v12.X Developer License.

Examples

The following example will extract the entire text of the specified page in the given PDF documents.

Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
         
Module MyModule
         
    Sub Main()
         
        ' Create PDF document object         
        Dim pdfA As PdfDocument = New PdfDocument( "C:\Invoice.pdf")        
         
        ' Call GetText method from PDF document object to get the text from the document
        Dim extractedText As String = pdfA.Pages.Item(1).GetText()
         		
    End Sub
End Module

using System;
using ceTe.DynamicPDF.Merger;

public class Example 
{
    public static void GetText(string inputPath)
    {
        // Create PDF document object
        PdfDocument pdfA = new PdfDocument(inputPath);

        // Call GetText method from PDF document object to get the text from the document
        string extractedText = pdfA.Pages[0].GetText();
    }
}

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes. With some of the .Net runtimes (example: .Net Core 2.0), Text extraction will give the error "No data is available for encoding 1252. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.". To resolve this error refer the user manual page Encoding Considerations.

GetText(Single, Single, Single, Single)

Gets the text in the specified rectangle of the page.

public string GetText(float x, float y, float width, float height)

Function GetText(x As Single, y As Single, width As Single, height As Single) As String

Parameters

x: Single

X coordinate of the rectangle.

y: Single

Y coordinate of the rectangle.

width: Single

Width of the rectangle.

height: Single

Height of the rectangle.

Returns

String

A string containing the text in the specified rectangle of the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

An active DynamicPDF Ultimate Subscription
An active DynamicPDF Professional or Professional Plus Subscription with DynamicPDF Core Suite selected.
A DynamicPDF Core Suite for .NET v12.X Developer License.

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes.

GetText(Single, Single, Single, Single, TextExtractionOrder)

Gets the text in the specified rectangle of the page.

public string GetText(float x, float y, float width, float height, TextExtractionOrder textExtractionOrder)

Function GetText(x As Single, y As Single, width As Single, height As Single, textExtractionOrder As TextExtractionOrder) As String

Parameters

x: Single

X coordinate of the rectangle.

y: Single

Y coordinate of the rectangle.

width: Single

Width of the rectangle.

height: Single

Height of the rectangle.

textExtractionOrder: TextExtractionOrder

Order in which text has to be extracted.

Returns

String

A string containing the text in the specified rectangle of the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

An active DynamicPDF Ultimate Subscription
An active DynamicPDF Professional or Professional Plus Subscription with DynamicPDF Core Suite selected.
A DynamicPDF Core Suite for .NET v12.X Developer License.

Remarks

This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes.

GetText(TextExtractionOrder)

Gets the text in the page.

public string GetText(TextExtractionOrder textExtractionOrder)

Function GetText(textExtractionOrder As TextExtractionOrder) As String

Parameters

textExtractionOrder: TextExtractionOrder

Order in which text has to be extracted.

Returns

String

A string containing the text in the page.

Licensing Info

This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:

An active DynamicPDF Ultimate Subscription
An active DynamicPDF Professional or Professional Plus Subscription with DynamicPDF Core Suite selected.
A DynamicPDF Core Suite for .NET v12.X Developer License.

Examples

The following example will extract the entire text of the specified page in the given PDF documents.

Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
         
Module MyModule
         
    Sub Main()
         
        ' Create PDF document object         
        Dim pdfA As PdfDocument = New PdfDocument( "C:\Invoice.pdf")        
         
        ' Call GetText method from PDF document object to get the text from the document
        Dim extractedText As String = pdfA.Pages.Item(1).GetText()
         		
    End Sub
End Module

using System;
using ceTe.DynamicPDF.Merger;

public class Example 
{
    public static void GetText(string inputPath)
    {
        // Create PDF document object
        PdfDocument pdfA = new PdfDocument(inputPath);

        // Call GetText method from PDF document object to get the text from the document
        string extractedText = pdfA.Pages[0].GetText();
    }
}

PdfPage.GetText

Overloads

GetText()

Returns

Licensing Info

Examples

Remarks

GetText(Single, Single, Single, Single)

Parameters

Returns

Licensing Info

Remarks

GetText(Single, Single, Single, Single, TextExtractionOrder)

Parameters

Returns

Licensing Info

Remarks

GetText(TextExtractionOrder)

Parameters

Returns

Licensing Info

Examples

Remarks

See Also

In this topic