Skip to main content

DynamicPDF Blog

Parallel Processing Using DynamicPDF HTML Converter

· 7 min read
James A. Brannan
Developer Evangelist

DynamicPDF HTML Converter for .NET often takes between 1 and 5 seconds to convert a single HTML document to PDF, depending on factors such as network download time, page complexity, and overall processing requirements. This is expected because the HTML Converter uses the Chromium rendering engine to fully process HTML documents before generating the PDF output.

When processing large numbers of HTML documents, total conversion time can quickly grow if documents are converted sequentially. For example, converting thousands of HTML files one at a time could easily require hours of processing time. In these scenarios, using parallel processing and batch execution can significantly improve overall throughput and reduce total conversion time.

.NET applications often need to process large workloads efficiently, especially when performing operations such as file conversion, rendering, or data processing. This is true when processing large groups of HTML documents to PDFs as well. One of the most effective ways to improve throughput is through parallel processing. Instead of converting documents one at a time in a strictly sequential workflow, .NET allows multiple operations to execute concurrently using the Task Parallel Library (TPL) and asynchronous programming features such as async and await.

In the following example, each HTML-to-PDF conversion is started as a separate task by calling Task.Run together with Converter.ConvertAsync. This allows several conversions to run at the same time rather than waiting for one conversion to finish before starting the next. Because HTML conversion can involve browser rendering, resource loading, JavaScript execution, and PDF generation, this example illustrates how running conversions in parallel can significantly reduce the total processing time when handling many documents.

The code also demonstrates using batching with scalable parallel processing. Rather than starting every conversion simultaneously, the workload is divided into smaller groups called batches. Each batch is processed independently, and the application waits for all tasks in the current batch to finish. Once the batch completes, the next batch begins processing.

info

Batching helps control resource usage. Even though .NET can manage many concurrent tasks, starting too many conversions at once can place unnecessary pressure on CPU resources, memory usage, temporary file storage, and Chromium rendering processes used internally by the HTML Converter. By limiting the number of active conversions at a given time, batching creates a more balanced and predictable processing workflow.

Example Code

The example first gets the paths to the HTML documents to process and adds them to a collection. It then divides the collection into batches and processes each batch individually. For all HTML documents in the current batch, the example starts asynchronous conversions in parallel using separate tasks. After all conversions in the batch complete processing, the example moves to the next batch and repeats the process until all documents have been converted.

As each batch completes, the example calls Converter.ReleaseResources() to ensure converter resources are properly released before processing the next batch. This helps manage resource usage more efficiently when processing large numbers of HTML documents.

warning

Calling Converter.ReleaseResources() too frequently can significantly increase overall processing time because the HTML Converter must reinitialize its internal Chromium-based processing resources before continuing with additional conversions. For better performance, resources should typically be released only after a batch of conversions has completed rather than after every individual document.

using ceTe.DynamicPDF.HtmlConverter;
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;

namespace html_converter_dotnet_core_cs
{
internal class ConverterParallelExample
{
public static async Task Run() {

// Get the output folder and input HTML resource folder.
string outputPath = Util.GetPath("Output");
string inputFolder = Util.GetPath("Resources/html");

// Get all files from the input folder.
string[] files = Directory.GetFiles(inputFolder);

// Create the list of documents to convert.
List<string> testDocuments = new List<string>();

// Add each file multiple times to simulate a larger workload.
foreach (string file in files)
{
testDocuments.Add(file);
testDocuments.Add(file);
testDocuments.Add(file);
}

// Add one extra document to demonstrate remainder batching.
testDocuments.Add(Util.GetPath("Resources/html/10.html"));
Console.WriteLine("====================================");
Console.WriteLine("Total documents: " + testDocuments.Count);

// Divide the workload into three full-sized batches.
// Any remaining files are processed in a final smaller batch.
int batchDivisions = 3;
int batchSize = testDocuments.Count / batchDivisions;

if (batchSize == 0)
batchSize = 1;

int batchNumber = 1;

// Process each batch.
for (int batchStart = 0; batchStart <
testDocuments.Count; batchStart += batchSize)
{
int batchEnd = batchStart + batchSize;

if (batchEnd > testDocuments.Count)
batchEnd = testDocuments.Count;

int currentBatchCount = batchEnd - batchStart;

Console.WriteLine();
Console.WriteLine("========");
Console.WriteLine("Starting Batch " + batchNumber);
Console.WriteLine("Docs In Batch: " + currentBatchCount);
Console.WriteLine("========");

// Store the conversion tasks for the current batch.
List<Task> tasks = new List<Task>();

// Start one conversion task for each document in the batch.
for (int i = batchStart; i < batchEnd; i++)
{
// Capture local copies for use inside the task.
int documentNumber = i;
string documentPath = testDocuments[i];

tasks.Add(Task.Run(async () =>
{
try
{
// Create a URI from the document path.
Uri inputPath = new Uri(documentPath);

// Create a unique output file name.
string outputFile = Path.Combine(
outputPath,
"output-" + documentNumber + ".pdf");

Console.WriteLine("Converting " + documentNumber);

// Convert the HTML document to PDF.
await Converter.ConvertAsync(inputPath, outputFile);

Console.WriteLine("Finished " + documentNumber);
}
catch (Exception ex)
{
// Continue processing even if one document fails.
Console.WriteLine("Error converting " + documentNumber);
Console.WriteLine(ex.Message);
}
}));
}

// Wait until all conversions in the current batch are complete.
await Task.WhenAll(tasks);

// Release converter resources only after the whole batch has finished.
Converter.ReleaseResources();

Console.WriteLine();
Console.WriteLine("========");
Console.WriteLine("Completed Batch " + batchNumber);
Console.WriteLine("========");

batchNumber++;
}

Console.WriteLine();
Console.WriteLine("All batches completed.");
}
}
}

Source: ConvertParallelExample.cs

Summary

By combining asynchronous programming, parallel processing, and batching, you can significantly improve efficiency when performing large scale HTML-to-PDF conversion using DynamicPDF HTML Converter for .NET. Rather than processing documents sequentially, the Task Parallel Library (TPL) and asynchronous conversion methods allow multiple HTML documents to be converted concurrently while still maintaining controlled resource usage through batching.