Compare Two Documents for Differences

Skip Navigation LinksHome  /  Support  /  Forums  /  DynamicPDF CoreSuite for .NET (v11)  /  Compare Two Documents for Differences

DynamicPDF CoreSuite for .NET (v11) Forum

 Dec 02 2020 10:05 AM
We are wanting to be able to set up unit tests that compare one ceTe document to the previously produced ceTe document to be able to determine if changes occurred in a document or not. Currently, my thought was to save the byte[] from the MergeDocument.Draw() and store these in a file/db each time to check against but these do not match as you may already know.

Is there a way that we can programmatically compare two Documents that were created with the exact same data? Before or after the PDF is created would both work fine.
Posted by a ceTe Software moderator
Hi,

It's not possible to compare the PDF byte data or the PDF contents directly from Core Suite, as there will have some differences each time the PDF is created (timestamp and ID for example). However, you could do this by converting the PDF document to an image and compare the images programmatically. You can accomplish this using our Rasterizer product, then comparing the image data to your stored image data. This is actually how we do our internal testing for product releases.

Thank,
ceTe Software Support Team
Thanks for the prompt response. I am currently going with something like the following (may be helpful to others in the future):

            InputPdf inputPdf = new InputPdf(@"C:\SomePDF.pdf");

            // capture new hash data here
            List<string> newData = new List<string>();

            // Control data, could also be stored in db.
            List<string> controlData = File.ReadLines(@"C:\ControlFile.txt").ToList();

            for (int i = 0; i < inputPdf.Pages.Count; i++)
            {
                // Rasterize pdf and get image data md5 checksum.
                newData.Add(string.Join("", MD5.HashData(new PdfRasterizer(inputPdf).Pages[i].Draw(ImageFormat.Png, ImageSize.Dpi150))));

                // Checking each page individually
                Assert.AreEqual(newData[i], controlData [i]);
            }

All times are US Eastern Standard time. The time now is 7:41 AM.