-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF/FIX: Fix Python 3.11 support and test suite-focused cleanup #573
Conversation
Well it failed spectacularly on GitHub Actions, whereas it passed locally so 🤷 |
OK, this is a bit of a mess and there are a ton of changes. I did this over a long period of time with (sometimes) limited attention being paid to what I was doing. Each change should be really scrutinized! |
|
||
if final_widgets: | ||
if all(isinstance(widget, QtWidgets.QMenu) for widget in final_widgets): | ||
logger.error("%s: Top level QMenu widgets were not cleaned up. Not failing the test suite.", item.nodeid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually only happens in one test!
test_take_top_level_widget_screenshots
It doesn't make sense to me why this is the case.
Still, this escape hatch is what is allowing the test suite to pass for all versions of Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very 👍 on this
I probably left a bunch of questions as I went through this
except Exception as ex: | ||
display: Optional[pydm.Display] = getattr(ex, "pydm_display", None) | ||
if display is not None: | ||
display.setObjectName("_typhos_test_suite_ignore_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the functional effect of renaming these display objects after a failed load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally, setting the object name for a pydm.Display
does nothing - it's a big of a magic string for the test suite and for us when looking in with debug tools is all.
The context of this block is:
- The new template fails to load
- typhos notices that and tries to reload the prior template
In typhos master,
- A
pydm.Display
is left dangling as a top-level widget andTyphosDeviceDisplay
never even sees a reference to it
In this PR,
- That
pydm.Display
is seen byTyphosDeviceDisplay
and intercepted as a failed load - We mark it for deletion and wave goodbye
- The test suite still picks it up because it's a top-level widget and somehow doesn't get destroyed in time for it to be happy (see below)
- The test suite sees the "magic" object name and says "OK this is known and expected; let's not fail the test"
Now to be honest, the marked step is fishy and it'd be nice to understand why that's happening. Is there a reference we're missing? It's worth investigating at some point.
"hinted": True, | ||
"config": True, | ||
"omitted": True, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels bad but this definitely is unlikely to ever change at this point so it's probably totally fine and there doesn't seem to be an appropriate more direct way to inspect these now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I can tell, the only way to get the names programatically is to do something similar to this ugly mess:
for val in range(8):
print(val, Kind(val).name)
0 omitted
1 normal
2 config
3 normal|config
4 None
5 hinted
6 config|4
7 normal|config|hinted
But there's actually no way to know how many bits the bitflag is supposed to have so there's no way to know when to stop. The len returns 2 which is just "there are names for slots 1 and 2". Wild.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ref: https://docs.python.org/3/whatsnew/3.11.html#enum
Changed Flag to only consider primary values (power of two) canonical while composite values (3, 6, 10, etc.) are considered aliases; inverted flags are coerced to their positive equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dir(Kind)
/ inspect.getmembers
/ ... yeah, for something simple like this I think typing it out is sadly the best option.
Do we rely on this iterability in our other code bases, I wonder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neither of those standard functions actually gives you the results which is wild to me
>>> 'omitted' in dir(Kind)
False
>>> Kind.omitted
<Kind.omitted: 0>
(I forgot to install ipython in my py311 test branch)
app = QtWidgets.QApplication.instance() | ||
if app is None: | ||
return [] | ||
return [weakref.ref(widget) for widget in app.topLevelWidgets()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know a PR is going to be good when we bust out the weakref
return res | ||
|
||
|
||
@pytest.hookimpl(hookwrapper=True, trylast=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any functional difference for this being implemented as a pytest hook wrapper vs an always-include fixture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this as a fixture first without success.
For one, the teardown order of all the fixtures (especially qtbot) is important.
Most importantly, however, a fixture that fails at teardown causes the next test to fail (!). This was really surprising to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a test failing at teardown make all the subsequent tests fail?
- test passes (fail at teardown) -> test fails -> test passes
or - test passes (fail at teardown) -> test fails -> test fails....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixtures failing at teardown cause only the subsequent test to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record:
- The reason why this PR doesn't include a fixture to do the same check is due to the above: fixture teardown exceptions don't affect the current test but the (single) subsequent test
- The pytest hook as implemented of this PR functions as we would hope: the assertion in this hook applies to the active test (and does not interfere with the subsequent test)
logger.error(failure_text) | ||
|
||
try: | ||
assert not final_widgets, failure_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the methodology you used here to track these issues down
I notice that you left some of the cleanup for another day- I think that's completely appropriate. You've cleaned up almost all of it already and the last QMenus seem to be lurking from upstream code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The results speak for themselves, "cleanup" has never belonged more in a PR title.
Morbid curiosity prompts me to look more closely at the test failures, but that shouldn't really hold this up, particularly since we're trying to tag in the near future
typhos/benchmark/cases.py
Outdated
start_ioc: bool, | ||
full_test_name: str, | ||
auto_exit=True, | ||
request=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As someone looking at this benchmark code for the first time I'd love a type hint here
typhos/benchmark/ioc.py
Outdated
@@ -0,0 +1,78 @@ | |||
""" | |||
Helpful functions that don't belong in a more specific submodule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably possible to more accurately describe this module right? Now that it's mostly IOC related things.
@@ -94,6 +95,10 @@ def __init__(self, channel, address, protocol=None, parent=None): | |||
# Add listener | |||
self.add_listener(channel) | |||
|
|||
def __dtor__(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google searches fail me when finding this, is there a docs link you could share?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly no... this is a weird (undocumented?) internal part of sip which, right before the C++ object gets deleted, this Python method gets called: https://github.com/search?q=__dtor__+language%3APython&type=code
|
||
try: | ||
widget.isVisible() | ||
widget.windowTitle() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is calling both of these necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling one may be sufficient and my reasoning here is a bit hand-wavy/anecdotal:
- It seems like simple boolean attributes may not trigger the "is the object still alive check; if not raise RuntimeError to alert the user"
- So I added
windowTitle()
as a second one, since it requires an actual string to Python string conversion under the hood
It'd be good to dig into the details one day, but for today I think double method call is good enough for me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm totally good with being doubly-sure. Just curious if there was some portion of the widget that doesn't get caught by widget.isVisible()
or something
@@ -276,5 +276,12 @@ def test_take_top_level_widget_screenshots(qtbot: pytestqt.qtbot.QtBot): | |||
widget = QWidget() | |||
qtbot.addWidget(widget) | |||
screenshots = list(utils.take_top_level_widget_screenshots(visible_only=False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potentially dumb question. Why is this generating any QMenus at all? A naive (and honestly idealistic) reading of this test is that we're creating a single QWidget
, then taking a screenshot of it. Knowing that widgets aren't really getting cleaned up between tests, I can see why QMenus get involved, but it shouldn't generate any... right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have repeated this dumb question to myself many times over the course of this PR.
I have no answers 😦
That said, there's one thing I didn't try: what if we load up gammaray when this test fails and we poke around with it there? It'd be worthwhile I think
🤞 |
2 green checkmarks in a row indicate it's probably good enough as-is. We may revisit and adjust how strict this is in the future based on how many intermittent failures we get (due to gc/teardown slowness). Increasing the wait time may also suffice... |
I'll click merge now to make sure it happens |
Description
ophyd.Kind
enum.Enum
, being anIntFlag
is a bit of a special beast. Its functionality in Python 3.11 differs in enumeration of items - resulting in typhos only picking up a couple of component kinds.multiprocessing
to run a caproto IOC ended up being problematic for Python 3.11, so I restructured the test suite a bit to run IOCs by name (simply regenerating the test cases in the parent process and in the IOC)load_ui_file
and modified it so we can always get ourDisplay
instance back and clean it up if desirable.SignalConnection.remove_connection
Motivation and Context
Python 3.11 support would be nice eventually
This PR manifested itself like this:
multiprocessing
and isolate the test IOC in its own scripttest_take_top_level_widget_screenshots
sees hundreds of QMenu instances at the top level)Overall, I think this is in a better spot than before. It's absolutely not perfect, though.
How Has This Been Tested?
Locally and test suite
Where Has This Been Documented?
Screenshots (if appropriate):