Handle 404 from External Resources #32

sealocal · 2015-10-06T03:13:20Z

I'm trying to create a PDF from a URL that produces an HTML document that looks like the one below.

http://www.example.com/foo/bar/baz.png is a path to an image that does not exist. to_pdf fails for the URL that points to this HTML page. How would I go about generating a PDF with some placeholder like a "broken image" or an "X" icon like a modern browser would display? Or how would I display the alt text instead?

<html>
  <head>
    <title>Demo</title>
  </head>
  <body>
    <img alt="baz.png" src="http://www.example.com/foo/bar/baz.png">
  </body>
</html>

The text was updated successfully, but these errors were encountered:

waghanza · 2015-11-20T09:25:21Z

Perhaps, you could use a client-side technology (like JavaScript) to avoid this behavior.

sealocal · 2015-11-20T09:38:13Z

Care to elaborate?

What event would I listen to or what selector would I use to find image URLs that returned a 404?

Would I do this on document ready? Would Shrimp and Phantom wait for my JS to finish executing on document ready?

sealocal · 2015-11-20T19:50:20Z

Would be excellent if I could somehow pass an error handler to PhantomJS:
http://phantomjs.org/api/phantom/handler/on-error.html

I'm not sure what the limitations would be for that. Impossible to pass a JS function from Shrimp to PhantomJS? Maybe CoffeeScript or Opal.rb would help?

sealocal · 2015-11-20T23:00:40Z

Attempted to use jQuery error handling techniques:

https://api.jquery.com/error/

That page provides an example exactly for this situation - except without the context of PDF generation. However, I cannot see a way to prevent the error event from actually triggering. I believe that is what causes a Shrimp::RenderingError. Even if I replace the images with text after they've failed to load, the event and error has already occurred.

nathanbrakken · 2016-01-22T22:32:17Z

I came across a similar issue with broken images @sealocal . while using phantomjs 2.0

I found that the problem is in rasterize.js:52

  page.onResourceError = function (resourceError) {
    error(resourceError.errorString + ' (URL: ' + resourceError.url + ')');
  };

perhaps you don't kill the rendering if there is a 404 on the page? We already know there isn't an error from loading the page in the previous method:

  page.onResourceReceived = function (resource) {
    if (resource.url == url) {
      statusCode = resource.status;
    }
  };

For a short term fix I just downgraded to 0.0.2 and it worked. Another work-around is to just have a return on that line.

  page.onResourceError = function (resourceError) {
   return;
   error(resourceError.errorString + ' (URL: ' + resourceError.url + ')');
  };

sealocal · 2016-01-22T23:23:35Z

Thanks, @nathanbrakken! Would the maintainers, @adjust, be interested in a PR with that fix?

It seems like this is the kind of thing that should be accessible as a config.json option - an option to ignore resource error, so perhaps:

  page.onResourceError = function (resourceError) {
   if (config.ignoreResourceErrors)
     return;
   error(resourceError.errorString + ' (URL: ' + resourceError.url + ')');
  };

nathanbrakken · 2016-01-23T00:11:28Z

@sealocal I'm not sure that it would work to have as a config.json option.

I would suggest removing the onResourceError method from the js file altogether. Is there any benefit of a hard fail if there is a missing resource on the page?

sealocal · 2016-01-23T01:08:17Z

@nathanbrakken I don't really know that it has a technical benefit, but someone put it there thinking that it was a good idea. Perhaps they had a good reason for it. I figure that if someone else wants the error raised onResourceError, and I want it to not be raised, then the only middle ground is an option for the behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle 404 from External Resources #32

Handle 404 from External Resources #32

sealocal commented Oct 6, 2015

waghanza commented Nov 20, 2015

sealocal commented Nov 20, 2015

sealocal commented Nov 20, 2015

sealocal commented Nov 20, 2015

nathanbrakken commented Jan 22, 2016

sealocal commented Jan 22, 2016

nathanbrakken commented Jan 23, 2016

sealocal commented Jan 23, 2016

Handle 404 from External Resources #32

Handle 404 from External Resources #32

Comments

sealocal commented Oct 6, 2015

waghanza commented Nov 20, 2015

sealocal commented Nov 20, 2015

sealocal commented Nov 20, 2015

sealocal commented Nov 20, 2015

nathanbrakken commented Jan 22, 2016

sealocal commented Jan 22, 2016

nathanbrakken commented Jan 23, 2016

sealocal commented Jan 23, 2016