Skip to content
This repository has been archived by the owner on Sep 10, 2024. It is now read-only.

Handle 404 from External Resources #32

Open
sealocal opened this issue Oct 6, 2015 · 8 comments
Open

Handle 404 from External Resources #32

sealocal opened this issue Oct 6, 2015 · 8 comments

Comments

@sealocal
Copy link

sealocal commented Oct 6, 2015

I'm trying to create a PDF from a URL that produces an HTML document that looks like the one below.

http://www.example.com/foo/bar/baz.png is a path to an image that does not exist. to_pdf fails for the URL that points to this HTML page. How would I go about generating a PDF with some placeholder like a "broken image" or an "X" icon like a modern browser would display? Or how would I display the alt text instead?

<html>
  <head>
    <title>Demo</title>
  </head>
  <body>
    <img alt="baz.png" src="http://www.example.com/foo/bar/baz.png">
  </body>
</html>
@waghanza
Copy link

Perhaps, you could use a client-side technology (like JavaScript) to avoid this behavior.

@sealocal
Copy link
Author

Care to elaborate?

What event would I listen to or what selector would I use to find image URLs that returned a 404?

Would I do this on document ready? Would Shrimp and Phantom wait for my JS to finish executing on document ready?

@sealocal
Copy link
Author

Would be excellent if I could somehow pass an error handler to PhantomJS:
http://phantomjs.org/api/phantom/handler/on-error.html

I'm not sure what the limitations would be for that. Impossible to pass a JS function from Shrimp to PhantomJS? Maybe CoffeeScript or Opal.rb would help?

@sealocal
Copy link
Author

Attempted to use jQuery error handling techniques:

https://api.jquery.com/error/

That page provides an example exactly for this situation - except without the context of PDF generation. However, I cannot see a way to prevent the error event from actually triggering. I believe that is what causes a Shrimp::RenderingError. Even if I replace the images with text after they've failed to load, the event and error has already occurred.

@nathanbrakken
Copy link

I came across a similar issue with broken images @sealocal . while using phantomjs 2.0

I found that the problem is in rasterize.js:52

  page.onResourceError = function (resourceError) {
    error(resourceError.errorString + ' (URL: ' + resourceError.url + ')');
  };

perhaps you don't kill the rendering if there is a 404 on the page? We already know there isn't an error from loading the page in the previous method:

  page.onResourceReceived = function (resource) {
    if (resource.url == url) {
      statusCode = resource.status;
    }
  };

For a short term fix I just downgraded to 0.0.2 and it worked. Another work-around is to just have a return on that line.

  page.onResourceError = function (resourceError) {
   return;
   error(resourceError.errorString + ' (URL: ' + resourceError.url + ')');
  };

@sealocal
Copy link
Author

Thanks, @nathanbrakken! Would the maintainers, @adjust, be interested in a PR with that fix?

It seems like this is the kind of thing that should be accessible as a config.json option - an option to ignore resource error, so perhaps:

  page.onResourceError = function (resourceError) {
   if (config.ignoreResourceErrors)
     return;
   error(resourceError.errorString + ' (URL: ' + resourceError.url + ')');
  };

@nathanbrakken
Copy link

@sealocal I'm not sure that it would work to have as a config.json option.

I would suggest removing the onResourceError method from the js file altogether. Is there any benefit of a hard fail if there is a missing resource on the page?

@sealocal
Copy link
Author

@nathanbrakken I don't really know that it has a technical benefit, but someone put it there thinking that it was a good idea. Perhaps they had a good reason for it. I figure that if someone else wants the error raised onResourceError, and I want it to not be raised, then the only middle ground is an option for the behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants