-
Notifications
You must be signed in to change notification settings - Fork 871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix invisible iframe scraping #1723
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on eec6ae0 in 50 seconds
More details
- Looked at
115
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
8
drafted comments based on config settings.
1. skyvern/forge/agent_functions.py:250
- Draft comment:
Changed parameter type from Page|Frame to SkyvernFrame. Ensure all callers are updated accordingly. - Reason this comment was not posted:
Comment did not seem useful: The comment is asking the PR author to ensure that all callers are updated, which violates the rule against asking the author to double-check things or ensure behavior. Therefore, this comment should not be approved.
2. skyvern/forge/agent_functions.py:279
- Draft comment:
Removed multiple locator count check; verify that ambiguity with multiple matching selectors is handled. - Reason this comment was not posted:
Comment did not seem useful: The comment starts with 'verify that', which suggests it is asking the PR author to confirm or ensure something. This violates the rule against asking the author to verify or ensure behavior. Therefore, this comment should not be approved.
3. skyvern/forge/agent_functions.py:305
- Draft comment:
Element blocking check was introduced. Consider logging additional blocking details for improved debugging. - Reason this comment was not posted:
Confidence changes required:50%
None
4. skyvern/forge/agent_functions.py:317
- Draft comment:
Screenshot timeout constant changed from BROWSER_SCREENSHOT_TIMEOUT_MS to BROWSER_ACTION_TIMEOUT_MS. Verify the new timeout is appropriate. - Reason this comment was not posted:
Comment did not seem useful: The comment asks to verify if the new timeout is appropriate, which falls under the rule of not asking the PR author to ensure or verify something. This makes the comment not approved.
5. skyvern/webeye/scraper/scraper.py:474
- Draft comment:
Improved log message indicating failure to retrieve 'unique_id' from frame_element. This should aid in diagnostics. - Reason this comment was not posted:
Comment did not seem useful: This comment is purely informative, as it only describes an improvement made to a log message. It doesn't provide any actionable feedback or suggestions for the PR author.
6. skyvern/forge/agent_functions.py:250
- Draft comment:
Good change switching to 'skyvern_frame' for consistency. However, the removal of the check for multiple locators (i.e., count()>1) might lead to unintended behavior if duplicate elements exist. Consider reintroducing a guard if duplicates are possible. - Reason this comment was not posted:
Marked as duplicate.
7. skyvern/forge/agent_functions.py:317
- Draft comment:
The screenshot timeout constant was changed from BROWSER_SCREENSHOT_TIMEOUT_MS to BROWSER_ACTION_TIMEOUT_MS. Ensure this change is intentional as it affects how long we wait for element screenshots. - Reason this comment was not posted:
Marked as duplicate.
8. skyvern/webeye/scraper/scraper.py:471
- Draft comment:
Adding a check for iframe visibility before obtaining its unique_id is a solid improvement to avoid hangs on invisible iframes. Verify that handling missing unique_id is robust if the attribute is absent. - Reason this comment was not posted:
Marked as duplicate.
Workflow ID: wflow_8iaDH8MkZZA9tmqj
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to eec6ae0 in 57 seconds
More details
- Looked at
115
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
7
drafted comments based on config settings.
1. skyvern/forge/agent_functions.py:293
- Draft comment:
Potential bug: After checking element visibility, no return is done on invisibility. Consider adding 'return None' after logging when the element is not visible. - Reason this comment was not posted:
Comment was on unchanged code.
2. skyvern/forge/agent_functions.py:317
- Draft comment:
Verify timeout value for screenshot: using settings.BROWSER_ACTION_TIMEOUT_MS instead of BROWSER_SCREENSHOT_TIMEOUT_MS as before. Confirm this change is intended. - Reason this comment was not posted:
Comment did not seem useful: The comment asks the PR author to confirm their intention regarding the change in timeout value. This violates the rule against asking the author to confirm their intention or ensure the behavior is intended.
3. skyvern/webeye/scraper/scraper.py:472
- Draft comment:
Improved log message for frame element failure ('Unable to get unique_id from frame_element') looks fine and more specific. - Reason this comment was not posted:
Confidence changes required:0%
None
4. skyvern/forge/agent_functions.py:250
- Draft comment:
Changing the parameter type from 'Page | Frame' to 'SkyvernFrame' improves consistency. Ensure that all upstream usages pass a SkyvernFrame instance. - Reason this comment was not posted:
Confidence changes required:0%
None
5. skyvern/forge/agent_functions.py:280
- Draft comment:
Duplicate visibility check in _convert_css_shape_to_string. Consider removing the second 'if not await locater.is_visible(...):' or adding a return after the first check to abort when the element is not visible. - Reason this comment was not posted:
Comment was on unchanged code.
6. skyvern/forge/agent_functions.py:317
- Draft comment:
Intentional change of screenshot timeout to BROWSER_ACTION_TIMEOUT_MS instead of BROWSER_SCREENSHOT_TIMEOUT_MS; ensure this timeout is appropriate for screenshot capture. - Reason this comment was not posted:
Confidence changes required:33%
None
7. skyvern/webeye/scraper/scraper.py:470
- Draft comment:
Good improvement: adding a check for iframe visibility via frame_element.is_visible() avoids potential hangs when evaluating invisible iframes. - Reason this comment was not posted:
Confidence changes required:0%
None
Workflow ID: wflow_o86Ln4uhIzxA26ym
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
<!-- ELLIPSIS_HIDDEN --> > [!IMPORTANT] > Fix invisible iframe scraping by adding timeouts and improving element handling in various modules. > > - **Behavior**: > - Add timeout to `exclude_captcha_frame()` in `filter.py` to prevent hanging on invisible iframes. > - Modify `get_interactable_element_tree_in_frame()` in `scraper.py` to skip invisible iframes. > - **Functions**: > - Update `_convert_css_shape_to_string()` in `agent_functions.py` to use `SkyvernFrame` and handle blocked elements. > - Remove AutoplayStopper extension logging and addition in `special_browsers.py`. > - **Misc**: > - Import `asyncio` and `settings` in `filter.py` for timeout handling. > - Adjust logging in `special_browsers.py` and `scraper.py` for better error handling and debugging. > > <sup>This description was created by </sup>[<img alt="Ellipsis" src="https://img.shields.io/badge/Ellipsis-blue?color=175173">](https://www.ellipsis.dev?ref=Skyvern-AI%2Fskyvern-cloud&utm_source=github&utm_medium=referral)<sup> for d1b18fd2988aa956255a4fa2452af4acc862e88f. It will automatically update as commits are pushed.</sup> <!-- ELLIPSIS_HIDDEN -->
eec6ae0
to
a15fa00
Compare
Important
Fixes invisible iframe scraping by skipping them in
scraper.py
and refactors_convert_css_shape_to_string()
inagent_functions.py
to useSkyvernFrame
.get_interactable_element_tree_in_frame()
inscraper.py
by skipping invisible frames._convert_css_shape_to_string()
inagent_functions.py
to useSkyvernFrame
instead ofPage | Frame
._convert_css_shape_to_string()
inagent_functions.py
._convert_css_shape_to_string()
inagent_functions.py
.This description was created by
for eec6ae0. It will automatically update as commits are pushed.