Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REF/FIX: Fix Python 3.11 support and test suite-focused cleanup #573

Merged
merged 31 commits into from
Sep 8, 2023

Conversation

klauer
Copy link
Contributor

@klauer klauer commented Aug 28, 2023

Description

  • ophyd.Kind enum.Enum, being an IntFlag is a bit of a special beast. Its functionality in Python 3.11 differs in enumeration of items - resulting in typhos only picking up a couple of component kinds.
  • Secondarily, multiprocessing to run a caproto IOC ended up being problematic for Python 3.11, so I restructured the test suite a bit to run IOCs by name (simply regenerating the test cases in the parent process and in the IOC)
  • Profiling code is skipped on 3.11
  • "All QMenu escape hatch" remains for only one test (all screenshots test). Unclear why this is but we get a green checkmark.
  • Vendored pydm load_ui_file and modified it so we can always get our Display instance back and clean it up if desirable.
  • Ignore deleted qt objects on SignalConnection.remove_connection
  • Avoid hundreds of warnings during profiling
  • Avoid creating subdisplays during "hide_subdisplays"
  • Add a big pytest configuration helper to aid in finding dangling widgets
  • Avoid failing screenshot taking when widgets are garbage collected at the same time

Motivation and Context

Python 3.11 support would be nice eventually

This PR manifested itself like this:

  • Python 3.11 had some obvious incompatibilities with enums, so I fixed that
  • Once that was fixed, the benchmark/profiling test suite was causing major issues, so I refactored it to avoid multiprocessing and isolate the test IOC in its own script
  • That still wasn't enough to get the test suite to pass. I saw widgets leaking from one test to another, so I added a conftest pytest hook that aimed to remove all stragglers
  • Then I found that Python 3.11 benchmarking still wasn't working great, so I disabled it at moved on
  • ... Then I found a bunch of stragglers and some weird things (listed above as 'avoid' or 'fix')
  • And finally, there was one failing test I couldn't resolve (test_take_top_level_widget_screenshots sees hundreds of QMenu instances at the top level)
  • And now, green checkmark

Overall, I think this is in a better spot than before. It's absolutely not perfect, though.

How Has This Been Tested?

Locally and test suite

Where Has This Been Documented?

Screenshots (if appropriate):

    def dump(obj, file, protocol=None):
        '''Replacement for pickle.dump() using ForkingPickler.'''
>       ForkingPickler(file, protocol).dump(obj)
E       _pickle.PicklingError: Can't pickle <class 'ophyd.device.FlatConnect'>: attribute lookup FlatConnect on ophyd.device failed
image

@klauer
Copy link
Contributor Author

klauer commented Aug 28, 2023

Well it failed spectacularly on GitHub Actions, whereas it passed locally so 🤷
Going to mark profiling code as xfail temporarily (PR is still draft)

@klauer klauer marked this pull request as ready for review September 7, 2023 17:22
@klauer
Copy link
Contributor Author

klauer commented Sep 7, 2023

OK, this is a bit of a mess and there are a ton of changes. I did this over a long period of time with (sometimes) limited attention being paid to what I was doing.

Each change should be really scrutinized!

@klauer klauer changed the title WIP: Fix Python 3.11 support REF/FIX: Fix Python 3.11 support Sep 7, 2023

if final_widgets:
if all(isinstance(widget, QtWidgets.QMenu) for widget in final_widgets):
logger.error("%s: Top level QMenu widgets were not cleaned up. Not failing the test suite.", item.nodeid)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually only happens in one test!
test_take_top_level_widget_screenshots
It doesn't make sense to me why this is the case.
Still, this escape hatch is what is allowing the test suite to pass for all versions of Python.

@klauer klauer linked an issue Sep 7, 2023 that may be closed by this pull request
@klauer klauer changed the title REF/FIX: Fix Python 3.11 support REF/FIX: Fix Python 3.11 support and test suite-focused cleanup Sep 7, 2023
Copy link
Member

@ZLLentz ZLLentz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very 👍 on this
I probably left a bunch of questions as I went through this

except Exception as ex:
display: Optional[pydm.Display] = getattr(ex, "pydm_display", None)
if display is not None:
display.setObjectName("_typhos_test_suite_ignore_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the functional effect of renaming these display objects after a failed load?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally, setting the object name for a pydm.Display does nothing - it's a big of a magic string for the test suite and for us when looking in with debug tools is all.

The context of this block is:

  • The new template fails to load
  • typhos notices that and tries to reload the prior template

In typhos master,

  • A pydm.Display is left dangling as a top-level widget and TyphosDeviceDisplay never even sees a reference to it

In this PR,

  • That pydm.Display is seen by TyphosDeviceDisplay and intercepted as a failed load
  • We mark it for deletion and wave goodbye
  • The test suite still picks it up because it's a top-level widget and somehow doesn't get destroyed in time for it to be happy (see below)
  • The test suite sees the "magic" object name and says "OK this is known and expected; let's not fail the test"

Now to be honest, the marked step is fishy and it'd be nice to understand why that's happening. Is there a reference we're missing? It's worth investigating at some point.

"hinted": True,
"config": True,
"omitted": True,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels bad but this definitely is unlikely to ever change at this point so it's probably totally fine and there doesn't seem to be an appropriate more direct way to inspect these now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I can tell, the only way to get the names programatically is to do something similar to this ugly mess:

for val in range(8):
     print(val, Kind(val).name)

0 omitted
1 normal
2 config
3 normal|config
4 None
5 hinted
6 config|4
7 normal|config|hinted

But there's actually no way to know how many bits the bitflag is supposed to have so there's no way to know when to stop. The len returns 2 which is just "there are names for slots 1 and 2". Wild.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref: https://docs.python.org/3/whatsnew/3.11.html#enum

Changed Flag to only consider primary values (power of two) canonical while composite values (3, 6, 10, etc.) are considered aliases; inverted flags are coerced to their positive equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dir(Kind) / inspect.getmembers / ... yeah, for something simple like this I think typing it out is sadly the best option.

Do we rely on this iterability in our other code bases, I wonder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neither of those standard functions actually gives you the results which is wild to me

>>> 'omitted' in dir(Kind)
False
>>> Kind.omitted
<Kind.omitted: 0>

(I forgot to install ipython in my py311 test branch)

app = QtWidgets.QApplication.instance()
if app is None:
return []
return [weakref.ref(widget) for widget in app.topLevelWidgets()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know a PR is going to be good when we bust out the weakref

return res


@pytest.hookimpl(hookwrapper=True, trylast=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any functional difference for this being implemented as a pytest hook wrapper vs an always-include fixture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this as a fixture first without success.
For one, the teardown order of all the fixtures (especially qtbot) is important.
Most importantly, however, a fixture that fails at teardown causes the next test to fail (!). This was really surprising to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a test failing at teardown make all the subsequent tests fail?

  • test passes (fail at teardown) -> test fails -> test passes
    or
  • test passes (fail at teardown) -> test fails -> test fails....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixtures failing at teardown cause only the subsequent test to fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record:

  • The reason why this PR doesn't include a fixture to do the same check is due to the above: fixture teardown exceptions don't affect the current test but the (single) subsequent test
  • The pytest hook as implemented of this PR functions as we would hope: the assertion in this hook applies to the active test (and does not interfere with the subsequent test)

logger.error(failure_text)

try:
assert not final_widgets, failure_text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the methodology you used here to track these issues down

@ZLLentz
Copy link
Member

ZLLentz commented Sep 8, 2023

I notice that you left some of the cleanup for another day- I think that's completely appropriate. You've cleaned up almost all of it already and the last QMenus seem to be lurking from upstream code.

Copy link
Contributor

@tangkong tangkong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results speak for themselves, "cleanup" has never belonged more in a PR title.

Morbid curiosity prompts me to look more closely at the test failures, but that shouldn't really hold this up, particularly since we're trying to tag in the near future

start_ioc: bool,
full_test_name: str,
auto_exit=True,
request=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone looking at this benchmark code for the first time I'd love a type hint here

@@ -0,0 +1,78 @@
"""
Helpful functions that don't belong in a more specific submodule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably possible to more accurately describe this module right? Now that it's mostly IOC related things.

@@ -94,6 +95,10 @@ def __init__(self, channel, address, protocol=None, parent=None):
# Add listener
self.add_listener(channel)

def __dtor__(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google searches fail me when finding this, is there a docs link you could share?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly no... this is a weird (undocumented?) internal part of sip which, right before the C++ object gets deleted, this Python method gets called: https://github.com/search?q=__dtor__+language%3APython&type=code


try:
widget.isVisible()
widget.windowTitle()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is calling both of these necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling one may be sufficient and my reasoning here is a bit hand-wavy/anecdotal:

  • It seems like simple boolean attributes may not trigger the "is the object still alive check; if not raise RuntimeError to alert the user"
  • So I added windowTitle() as a second one, since it requires an actual string to Python string conversion under the hood

It'd be good to dig into the details one day, but for today I think double method call is good enough for me...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm totally good with being doubly-sure. Just curious if there was some portion of the widget that doesn't get caught by widget.isVisible() or something

typhos/tests/conftest.py Show resolved Hide resolved
@@ -276,5 +276,12 @@ def test_take_top_level_widget_screenshots(qtbot: pytestqt.qtbot.QtBot):
widget = QWidget()
qtbot.addWidget(widget)
screenshots = list(utils.take_top_level_widget_screenshots(visible_only=False))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potentially dumb question. Why is this generating any QMenus at all? A naive (and honestly idealistic) reading of this test is that we're creating a single QWidget, then taking a screenshot of it. Knowing that widgets aren't really getting cleaned up between tests, I can see why QMenus get involved, but it shouldn't generate any... right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have repeated this dumb question to myself many times over the course of this PR.

I have no answers 😦
That said, there's one thing I didn't try: what if we load up gammaray when this test fails and we poke around with it there? It'd be worthwhile I think

@klauer
Copy link
Contributor Author

klauer commented Sep 8, 2023

🤞

@klauer
Copy link
Contributor Author

klauer commented Sep 8, 2023

2 green checkmarks in a row indicate it's probably good enough as-is.
Before this gets any larger, if you guys are good with it - I'd say let's merge @tangkong @ZLLentz

We may revisit and adjust how strict this is in the future based on how many intermittent failures we get (due to gc/teardown slowness). Increasing the wait time may also suffice...

@ZLLentz
Copy link
Member

ZLLentz commented Sep 8, 2023

I'll click merge now to make sure it happens

@ZLLentz ZLLentz merged commit 45331c1 into pcdshub:master Sep 8, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test Suite Fails Explosively on GHA py3.11
3 participants