Skip to content
This repository has been archived by the owner on Feb 2, 2023. It is now read-only.

Failed on some folded page #8

Open
yangxiaomin08 opened this issue Dec 28, 2016 · 15 comments
Open

Failed on some folded page #8

yangxiaomin08 opened this issue Dec 28, 2016 · 15 comments

Comments

@yangxiaomin08
Copy link

Hi,

As we known, there are lot of pages/articles are folded with some button like 'show more/show more' to show all. After clicked the button, the hidden content was shown. But in some website, the hidden content might be different in dom, such as different level as previous marked 'content', in this case, the hidden content cannot be recognized as 'content'.

Let's take https://m.sohu.com/n/477121843/?wscrid=1137_4 as example, after clicked 'show more' button, and distill it manually, the original hidden content is not distilled.

@yangxiaomin08
Copy link
Author

yangxiaomin08 commented Dec 28, 2016

After analyze the source, my guess the failure is caused by the hidden content is wrapped in a div with a id "rest_content" tag, while the normal content which has been marked as content starts with "p " tag. The SimilarSiblingContentExpansion failed to recognize the hidden content ("rest_content") as content.

This is my guess only, I haven't found an easy way to debug and verify it yet. If you guys have any idea about how to debug it, it would be appreciated for sharing.

Furthermore, even if my guess is correct, I don't have any idea about how to 'fix' it to avoid mistakenly mark other non-content as content. Any advice is welcome.

@wychen
Copy link
Contributor

wychen commented Dec 28, 2016

Hi,

Would you mind filing a bug in crbug.com following our README? We usually track bugs and feature requests there. Issue tracker here on github can work as a more free-form Q and A.

Thanks,
Wei-Yin

@yangxiaomin08
Copy link
Author

@yangxiaomin08
Copy link
Author

Hi Wei-Yin,

From previous discussion, I have reverted some code and successfully debug the java code. I still have some questions, would you please give me a hand? thank you.

Is there any example/test case aims to load a url and distill the page and then display the content in chrome? As I know, I can do this in chrome(such as from chrome://dom-distiller or from menu if I start chrome with --enable-dom-distiller switch) , but I'm not sure whether I can debug the javacode in that way. Another way to say is that, is the java debugging feature can only be used in the local war/xxxx directory).

@wychen
Copy link
Contributor

wychen commented Dec 29, 2016

It's doable.

After reverting to the state where source map works, do the following:

  • Modify build.xml, and change gwt.args to be the same as gwt.test.args, and add --sourcemaps option in the extractjs target, like extractjs.jstests.
  • Edit java/DomDistiller.gwt.xml and add the source map option, like in javatests/DomDistillerJsTest.gwt.xml.
  • Run "ant package"
  • Edit the last line of out/extension/domdistiller.js to be "//@ sourceMappingURL=../debug/....". It's to add "../".
  • Use the Chrome extension to distill a page. Note that you need to use the button on the toolbar, not the "Profile Extraction" button on the Dom Distiller page.
  • In the Profiles tab, click any functions, and you should be back to the Java code.

Let me know if this works for you.

@yangxiaomin08
Copy link
Author

Hi Wei-Yin,

Thanks. I have modified the code step by step.

I couldn't find the instruction about how to test in chrome extension. Such as how to install the modified package to extension. Would you please give me some tips? thanks again.

@wychen
Copy link
Contributor

wychen commented Dec 30, 2016

@yangxiaomin08
Copy link
Author

yangxiaomin08 commented Dec 30, 2016

Thank, I missed it.

I have tried and there is profile extension. The problem I met was when I clicked the profile button, the page became empty/white page. I haven't modified any java code yet, what I have done is follow your instructions about how to enable source map and the above instructions.

My chrome version is 55.0.2883.87 (64-bit) on ubuntu.

@yangxiaomin08
Copy link
Author

yangxiaomin08 commented Dec 30, 2016

I noticed there is a warning in the latest version of chrome, I need to modify //@ to //#
//# sourceMappingURL=../debug/domdistiller/src/domdistiller.sourcemap

But still got the white page.

message in console.
MarkupParser.java:147 DomDistiller debug level: 0
/home/yangxm/codes/dom-distiller/out/extension/extract.js:9 Object1: ""2: Object3: Object5: Object6: Object7: Object8: Object9: "auto"10: Array[0]proto: Object
/home/yangxm/codes/dom-distiller/out/extension/preview.js:2 Uncaught TypeError: Cannot set property 'innerHTML' of null
at /home/yangxm/codes/dom-distiller/out/extension/preview.js:2

@yangxiaomin08
Copy link
Author

I have also tried the extension mode in my chrome built by myself, it is about based on m53 version.
Still the the same console output.

Not sure which step went wrong, my understand is that your last instructions are only about enabled source map feature in chrome extension.(Still have the local modification to revert https://bugs.chromium.org/p/chromium/issues/detail?id=617360). I have reverted and tried again, still got the same console output and white page.

@yangxiaomin08
Copy link
Author

More information, I follow the guide about "Run in Chrome for desktop".

  1. copy to chrome/src/...
  2. touch dom_distiller_resources.grdp
  3. build chrome
  4. load page and distill it from menu.

The content can be viewed.

Is this caused by the extension doesn't decode dom distiller return value which is protocol buffer format?

@wychen
Copy link
Contributor

wychen commented Dec 30, 2016

Some more background. After installing the extension, there should be one additional icon on the toolbar. In the devtools, there should be one additional tab, named "Dom Distiller", where there is one button, named "Profile Extraction". If you click "Profile Extraction", the current page would be distilled, and the JS Console should contain some debug info. If you click the icon, distillation would be done as above, and in addition, the distilled content would replace the original page.

From the console output, it looks like the extracted content is empty (or at least the title is empty string). Strangely, at preview.js line 2, it seems document.body is null in your case. When you click the icon, does the tab contain an article?

@yangxiaomin08
Copy link
Author

I did see the "Dom Distiller Dev 1.0" after load the unpack extension and a icon named "Profile Extraction".

I clicked the "Profile Extraction" button, and empty page. The page has an article. It is https://m.sohu.com/n/477367845/?wscrid=95360_1.

@yangxiaomin08
Copy link
Author

If I didn't locally revert the code https://bugs.chromium.org/p/chromium/issues/detail?id=617360(Just keep the clean version of git repo), it works in extension mode.

Looks like the extension mode is broken by that 'local revert' to support the source map. Would you please help to verify it? thanks.

@wychen
Copy link
Contributor

wychen commented Jan 3, 2017

If I were you, I'd make sure the reversion is correct first, since it's not a clean revert. Then work on enabling source map in the extension.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants