PdfDocument.GetText
Overloads
GetText() | Gets the text in all the pages. |
GetText(TextExtractionOrder) | Gets the text in all the pages. |
GetText()
Gets the text in all the pages.
public string GetText()
Function GetText() As String
Returns
A string containing the text in all the pages.
Licensing Info
This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:
- An active DynamicPDF Ultimate Subscription
- An active DynamicPDF Professional or Professional Plus Subscription with DynamicPDF Core Suite selected.
- A DynamicPDF Core Suite for .NET v12.X Developer License.
Examples
The following example will extract the entire text in the given PDF documents.Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
Module MyModule
Sub Main()
' Create PDF document object
Dim pdfA As PdfDocument = New PdfDocument( "C:\TimeMachine.pdf")
' Call GetText method from PDF document object to get the text from the document
Dim extractedText As String = pdfA.GetText()
End Sub
End Module
using System;
using ceTe.DynamicPDF.Merger;
Public Class Example
{
public static void GetText(string inputPath)
{
// Create PDF document object
PdfDocument pdfA = new PdfDocument(inputPath);
// Call GetText method from PDF document object to get the text from the document
string extractedText = pdfA.GetText();
}
}
Remarks
This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes. With some of the .Net runtimes (example: .Net Core 2.0), Text extraction will give the error "No data is available for encoding 1252. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.". To resolve this error refer the user manual page Encoding Considerations.
GetText(TextExtractionOrder)
Gets the text in all the pages.
public string GetText(TextExtractionOrder textExtractionOrder)
Function GetText(textExtractionOrder As TextExtractionOrder) As String
Parameters
- textExtractionOrder
- TextExtractionOrder
Order in which text has to be extracted.
Returns
A string containing the text in all the pages.
Licensing Info
This method is a full DynamicPDF Core Suite feature. One of the following is required for non-evaluation usage:
- An active DynamicPDF Ultimate Subscription
- An active DynamicPDF Professional or Professional Plus Subscription with DynamicPDF Core Suite selected.
- A DynamicPDF Core Suite for .NET v12.X Developer License.
Examples
The following example will extract the entire text in the given PDF documents.Imports System
Imports ceTe.DynamicPDF
Imports ceTe.DynamicPDF.Merger
Module MyModule
Sub Main()
' Create PDF document object
Dim pdfA As PdfDocument = New PdfDocument( "C:\TimeMachine.pdf")
' Call GetText method from PDF document object to get the text from the document
Dim extractedText As String = pdfA.GetText(TextExtractionOrder.Visible)
End Sub
End Module
using System;
using ceTe.DynamicPDF.Merger;
Public Class Example
{
public static void GetText(string inputPath)
{
// Create PDF document object
PdfDocument pdfA = new PdfDocument(inputPath);
// Call GetText method from PDF document object to get the text from the document
string extractedText = pdfA.GetText(TextExtractionOrder.Visible);
}
}
Remarks
This method can be used to extract the text in the same order the pdf operators are loaded. Text extraction skips characters that are morethan 2 bytes. With some of the .Net runtimes (example: .Net Core 2.0), Text extraction will give the error "No data is available for encoding 1252. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.". To resolve this error refer the user manual page Encoding Considerations.