Image flipped vertically #448

robertovaldesperez · 2024-07-03T09:47:07Z

Hello @jafin @ststeiger @saikatguha @nils-a @HakanL @Bogdancev, I am extracting the images from a PDF, apparently in the PDF they are fine, but when I extract them they are returned vertically flipped.

I send you an example file:
6.335.1 0034220637_tasacion.pdf

Thanks a lot.

robertovaldesperez · 2024-07-26T10:19:26Z

@Bogdancev can you help me?

HakanL · 2024-07-26T14:00:28Z

It seems that the attached PDF is 0 bytes. Also please include a small program that demonstrates the issue.

robertovaldesperez · 2024-07-26T20:47:27Z

Hi @HakanL I send you an example file:
6.335.1 0034220637_tasacion.pdf

My code:

using PdfSharpCore.Pdf;
using PdfSharpCore.Pdf.Advanced;
using PdfSharpCore.Pdf.IO;
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace Guru.Utils.Helper
{
    public static class PdfSharpCoreExtensions
    {
        public static ISet<byte[]> ExtractImages(this byte[] contents)
        {
            using var pdfStream = new MemoryStream(contents);
            try
            {
                var document = PdfReader.Open(pdfStream, PdfDocumentOpenMode.ReadOnly);
                var uniqueImages = new HashSet<byte[]>();
                var images = new HashSet<byte[]>();
                foreach (var page in document.Pages)
                {
                    foreach (var xObject in GetXObjectImages(page))
                    {
                        try
                        {
                            var value = xObject.Stream.Value;
                            if (!uniqueImages.Any(w => StructuralComparisons.StructuralEqualityComparer.Equals(w, value)))
                            {
                                uniqueImages.Add(value);
                                if (xObject.Elements.GetString("/Filter") == "/FlateDecode")
                                {
                                    // TODO
                                }
                                else
                                {
                                    using var image = new MagickImage(value);
                                    images.Add(image.ToByteArray());
                                }
                            }
                        }
                        catch (Exception)
                        {
                            // Do nothing
                        }
                    }
                }
                return images;
            }
            catch (Exception)
            {
                // Do nothing
            }
            return new HashSet<byte[]>();
        }

        private static IEnumerable<PdfDictionary> GetXObjectImages(PdfDictionary pdfDictionary)
        {
            var resources = pdfDictionary.Elements.GetDictionary("/Resources");
            if (resources != null)
            {
                var xObjects = resources.Elements.GetDictionary("/XObject");
                if (xObjects != null)
                {
                    foreach (var item in xObjects.Elements.Values)
                    {
                        if (item is PdfReference reference)
                        {
                            if (reference.Value is PdfDictionary xObject)
                            {
                                if (xObject.Elements.GetString(PdfImage.Keys.Subtype) == "/Image")
                                {
                                    yield return xObject;
                                }
                                else
                                {
                                    foreach (var xObject1 in GetXObjectImages(xObject))
                                    {
                                        if (xObject1.Elements.GetString(PdfImage.Keys.Subtype) == "/Image")
                                        {
                                            yield return xObject1;
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}```

HakanL · 2024-07-26T21:08:21Z

It looks like you just extract the byte array and then you're using a library called MagickImage for the image processing. My guess is that's where the issue is, it may not handle how PDF images are saved correctly.

robertovaldesperez · 2024-07-26T21:39:23Z

Hi @HakanL I have tried this file UVE 01.pdf as well, and it extracts all the images fine.

I don't know if it's the way the PDF is saved. Can you debug the pdf (6.335.1 0034220637_tasacion.pdf) internally to see if anything indicates that the image is flipped vertically?

Thanks a lot.

HakanL · 2024-07-26T21:42:37Z

It may be a different format inside the PDF. Unfortunately I don't have a set up to debug this, I'm not a developer on this project, but the source code is available so perhaps you can try to analyze it. But to my recollection this project doesn't analyze/read/parse the images, so it's my belief that it's not an issue with the PdfSharCore project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Image flipped vertically #448

Image flipped vertically #448

robertovaldesperez commented Jul 3, 2024 •

edited

Loading

robertovaldesperez commented Jul 26, 2024

HakanL commented Jul 26, 2024

robertovaldesperez commented Jul 26, 2024 •

edited

Loading

HakanL commented Jul 26, 2024

robertovaldesperez commented Jul 26, 2024 •

edited

Loading

HakanL commented Jul 26, 2024

Image flipped vertically #448

Image flipped vertically #448

Comments

robertovaldesperez commented Jul 3, 2024 • edited Loading

robertovaldesperez commented Jul 26, 2024

HakanL commented Jul 26, 2024

robertovaldesperez commented Jul 26, 2024 • edited Loading

HakanL commented Jul 26, 2024

robertovaldesperez commented Jul 26, 2024 • edited Loading

HakanL commented Jul 26, 2024

robertovaldesperez commented Jul 3, 2024 •

edited

Loading

robertovaldesperez commented Jul 26, 2024 •

edited

Loading

robertovaldesperez commented Jul 26, 2024 •

edited

Loading