Text Extraction

The text extraction feature allows you to remove text from within a PDF document. Text can be extracted from an entire PDF document (using the GetText method of the PDFDocument class) or from within a certain page of a PDF (using the GetText method of the PdfPage class) or a particular area within a page (using GetText method of the PDF page). The text returned from the GetText method is a string.

Keep in mind the following points when using a GetText method for extracting text from within a PDF.

The following example illustrates extracting text from an existing PDF document.

// Create the PDF document object
PdfDocument pdfA = new PdfDocument(pdfFilePath);

// Call the GetText method from PDF document object to get the text from the document
string extractedText = pdfA.GetText();
' Create PDF document object
Dim pdfA As PdfDocument = New PdfDocument(pdfFilePath)

' Call the GetText method from PDF document object to get the text from the document
Dim extractedText As String = pdfA.GetText()    

The following code extracts text from a specified page within a PDF.

// Create the PDF document object
PdfDocument pdfA = new PdfDocument(pdfFilePath);

// Call the GetText method of the PDF page to get the text from that page
string extractedText = pdfA.Pages[1].GetText();
' Create the PDF document object
Dim pdfA As PdfDocument = New PdfDocument(pdfFilePath)

' Call the GetText method of the PDF page to get the text from that page
Dim extractedText As String = pdfA.Pages.Item(1).GetText()      

The following code extracts text from the specified area within a page. 

// Create the PDF document object
PdfDocument pdfA = new PdfDocument(pdfFilePath);

// Call the GetText method of the PDF page to get the text from the specified area 
string extractedText = pdfA.Pages[1].GetText(x, y, width, height);
' Create the PDF document object
Dim pdfA As PdfDocument = New PdfDocument(pdfFilePath)

' Call the GetText method of the PDF page to get the text from the specified area 
Dim extractedText As String = pdfA.Pages.Item(1).GetText(X, Y, Width, Height)

In this topic