Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image flipped vertically #448

Open
robertovaldesperez opened this issue Jul 3, 2024 · 6 comments
Open

Image flipped vertically #448

robertovaldesperez opened this issue Jul 3, 2024 · 6 comments

Comments

@robertovaldesperez
Copy link

robertovaldesperez commented Jul 3, 2024

Hello @jafin @ststeiger @saikatguha @nils-a @HakanL @Bogdancev, I am extracting the images from a PDF, apparently in the PDF they are fine, but when I extract them they are returned vertically flipped.

I send you an example file:
6.335.1 0034220637_tasacion.pdf

Thanks a lot.

@robertovaldesperez
Copy link
Author

@Bogdancev can you help me?

@HakanL
Copy link
Contributor

HakanL commented Jul 26, 2024

It seems that the attached PDF is 0 bytes. Also please include a small program that demonstrates the issue.

@robertovaldesperez
Copy link
Author

robertovaldesperez commented Jul 26, 2024

Hi @HakanL I send you an example file:
6.335.1 0034220637_tasacion.pdf

My code:

using PdfSharpCore.Pdf;
using PdfSharpCore.Pdf.Advanced;
using PdfSharpCore.Pdf.IO;
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Linq;

namespace Guru.Utils.Helper
{
    public static class PdfSharpCoreExtensions
    {
        public static ISet<byte[]> ExtractImages(this byte[] contents)
        {
            using var pdfStream = new MemoryStream(contents);
            try
            {
                var document = PdfReader.Open(pdfStream, PdfDocumentOpenMode.ReadOnly);
                var uniqueImages = new HashSet<byte[]>();
                var images = new HashSet<byte[]>();
                foreach (var page in document.Pages)
                {
                    foreach (var xObject in GetXObjectImages(page))
                    {
                        try
                        {
                            var value = xObject.Stream.Value;
                            if (!uniqueImages.Any(w => StructuralComparisons.StructuralEqualityComparer.Equals(w, value)))
                            {
                                uniqueImages.Add(value);
                                if (xObject.Elements.GetString("/Filter") == "/FlateDecode")
                                {
                                    // TODO
                                }
                                else
                                {
                                    using var image = new MagickImage(value);
                                    images.Add(image.ToByteArray());
                                }
                            }
                        }
                        catch (Exception)
                        {
                            // Do nothing
                        }
                    }
                }
                return images;
            }
            catch (Exception)
            {
                // Do nothing
            }
            return new HashSet<byte[]>();
        }

        private static IEnumerable<PdfDictionary> GetXObjectImages(PdfDictionary pdfDictionary)
        {
            var resources = pdfDictionary.Elements.GetDictionary("/Resources");
            if (resources != null)
            {
                var xObjects = resources.Elements.GetDictionary("/XObject");
                if (xObjects != null)
                {
                    foreach (var item in xObjects.Elements.Values)
                    {
                        if (item is PdfReference reference)
                        {
                            if (reference.Value is PdfDictionary xObject)
                            {
                                if (xObject.Elements.GetString(PdfImage.Keys.Subtype) == "/Image")
                                {
                                    yield return xObject;
                                }
                                else
                                {
                                    foreach (var xObject1 in GetXObjectImages(xObject))
                                    {
                                        if (xObject1.Elements.GetString(PdfImage.Keys.Subtype) == "/Image")
                                        {
                                            yield return xObject1;
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}```

@HakanL
Copy link
Contributor

HakanL commented Jul 26, 2024

It looks like you just extract the byte array and then you're using a library called MagickImage for the image processing. My guess is that's where the issue is, it may not handle how PDF images are saved correctly.

@robertovaldesperez
Copy link
Author

robertovaldesperez commented Jul 26, 2024

Hi @HakanL I have tried this file UVE 01.pdf as well, and it extracts all the images fine.

I don't know if it's the way the PDF is saved. Can you debug the pdf (6.335.1 0034220637_tasacion.pdf) internally to see if anything indicates that the image is flipped vertically?

Thanks a lot.

@HakanL
Copy link
Contributor

HakanL commented Jul 26, 2024

It may be a different format inside the PDF. Unfortunately I don't have a set up to debug this, I'm not a developer on this project, but the source code is available so perhaps you can try to analyze it. But to my recollection this project doesn't analyze/read/parse the images, so it's my belief that it's not an issue with the PdfSharCore project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants