Crash preceded by weird error messages #66

M4rtinK · 2016-05-16T21:54:23Z

I have been reworking how map tiles are displayed in modRana and managed to crash PyOtherSide in the process:

https://gist.github.com/M4rtinK/b634432063f81beb034ae3c37d1a6e7b

It basically starts spamming error messages about properties that are definitely in scope not being available and then crashes. My hunch is that some sort of race condition is going on, possibly triggered by quite large bursts of QML <-> Python traffic the new tile display method causes.

Steps to reproduce
git clone https://github.com/M4rtinK/modrana
cd modrana
git checkout master-pyotherside_crash
cd run
./qt5.sh

If the crash does not happen at once try panning the map or closing modRana and opening it again - that should trigger the crash.

Additional information
My machine runs Fedora 23 and the same crash happens on both the latest stable (1.4) and on latest git master.

Let me know if you need any more info. :)

thp · 2016-05-17T16:41:23Z

In general, using PyOtherSide should not result in a crash, so you've found a bug. Ideally you could compile both Qt5 and PyOtherSide with debugging support and provide a gdb backtrace of all threads running at the time of the crash, that could be really helpful in tracking down the issue.

Do you use threads on the Python side? Do you use QObject passing from/to Python?

The logs point to this, right? https://github.com/M4rtinK/modrana/blob/master-pyotherside_crash/modules/gui_modules/gui_qt5/qml/PinchMap.qml#L608

Have you tried using modelData.tile_x instead? Also, have you tried just using a "var" property as model, with it being a JavaScript Array object instead of a QML ListModel?

Also, I see this:

rWin.python.setHandler("tileDownloaded:" + pinchmap.name, pinchmap.tileDownloadedCB)

How many pinchmaps are created? Once the handlers are set, they are never removed it seems, which might leak references to theses objects if you dynamically create/destroy the objects (sorry, haven't fully checked the rest of the codebase to track this down).

Looking at cornerX, cornerY, tilesX, tilesY as the parameters list, I wonder if you could make it so that you just have a number as the model and then in the delegate calculate the coordinate/data?

I also don't really get the explanation given in the commit message of M4rtinK/modrana@72276f4 about URL switching causing a texture re-upload being worse than loading a new image -- in both cases, texture data has to be uploaded to the GPU, whether the same texture object is used or whether a new one is created shouldn't matter to the application developer (Qt's Scene Graph renderer should just do the right thing, which might involve two textures used in alternating fashion if stalling the pipeline with sub-texture uploads is an issue on certain hardware).

If reducing CPU->GPU bandwidth is a goal, the best would be to keep as many tiles as possible in memory (even if off-screen) and only start removing tiles once GPU memory is exhausted (but then again, the application developer might not have too much control over this, and the Qt Scene Graph renderer might decide to handle texture allocation in any way it sees fit).

If performance is really an issue, just have one big element (e.g. PyGLArea in recent PyOtherSide versions) and a single texture big enough to hold the maximum amount of tiles displayed at any point in time, then render textured quads (one for each tile) with the right offsets depending on panning, and once tiles fall off the screen, allocate a new sub-tile inside the texture, upload that (which can happen asynchronously) and you save yourself the trouble of doing all the QML model housekeeping (bonus points for updating mipmap levels and having smooth, performant zooming without reloads).

Something inspired by this: http://silverspaceship.com/src/svt/, but you probably don't even need the additional layer of indirection for your specific case.

M4rtinK · 2016-05-18T00:05:04Z

In general, using PyOtherSide should not result in a crash, so you've found a bug. Ideally you could compile both Qt5 and PyOtherSide with debugging support and provide a gdb backtrace of all threads running at the time of the crash, that could be really helpful in tracking down the issue.

I haven't done that before but I can certainly give it a try. :) For the time being this is what I have managed to gather with ABRT - including some sort of backtrace (ABRT can download debug sub-packages for the packages involved in the crash and generate a meaningful backtrace from that):
http://modrana.org/tmp/ccpp-2016-05-17-22:42:19-23092.tar.xz

Do you use threads on the Python side?
Yes, there are threads on the Python side - among others a thread pool that handles til download requests.

Do you use QObject passing from/to Python?
No - I just stick to message passing for now. :) But I guess I'll have to check it out eventually to see if it might help in some scenario in modRana and what performance implications it has.

The logs point to this, right? https://github.com/M4rtinK/modrana/blob/master-pyotherside_crash/modules/gui_modules/gui_qt5/qml/PinchMap.qml#L608

Have you tried using modelData.tile_x instead?
This is how it looks like when using modelData.tile_x & modelData.tile_y (without an existing ~/.modrana directory to improve repeatability):

https://gist.github.com/M4rtinK/2e924f9eb5aa34a64acc72926ee92485

The error message basically just switches to:

ReferenceError: modelData is not defined

There is also a few lines of:

modrana/modules/gui_modules/gui_qt5/qml/Tile.qml:45: TypeError: Cannot write property of null

Which is also pretty fishy as that refers to this:

tile.downloading = true

And downloading is clearly defined for the tile element:

property bool downloading : false

There are also some Python errors but nothing of that looks relevant to the crash. Then after some more ReferenceError: modelData is not defined it all crashes.

Maybe this is some sort of scope missmatch - eq. Python/QML/PyOtherSide trying to call something/access properties in a wrong scope/on wrong pointers/references, etc. ? (Sorry for such hazy description I hope you get the idea. :) )

Also, have you tried just using a "var" property as model, with it being a JavaScript Array object instead of a QML ListModel?

I tried that but unfortunately that didn't work out as JavaScript arrays don't trigger property bindings once they are assigned. I need that so that the tiles repeater dynamically discards tiles that "fell off the screen" when their ListItem is removed from the model and adds tiles that became visible when their ListItem is added to the model. More on how this is supposed to work in my answers below. :)

Also, I see this:

rWin.python.setHandler("tileDownloaded:" + pinchmap.name, pinchmap.tileDownloadedCB)

How many pinchmaps are created? Once the handlers are set, they are never removed it seems, which might leak references to theses objects if you dynamically create/destroy the objects
There should be only one pinch map instance at the moment I plan to have more in the future (minimap, layer/overlay preview, track/route reference, etc.) so that's why the pinchmap.name based "namespacing" was added. Of course in such case I will make sure that handlers are removed when the given pinchmap instance is no longer needed.

(sorry, haven't fully checked the rest of the codebase to track this down).
No problem - the modRana codebase is really humongous in some places after all those years. :)

Looking at cornerX, cornerY, tilesX, tilesY as the parameters list, I wonder if you could make it so that you just have a number as the model and then in the delegate calculate the coordinate/data?
I thought about that but came to the conclusion that it would not work - I really need a unique unchanging identifier that "travels" with the tile as long as it visible on the screen.

I even tried it now, just to make sure I haven't missed something:

property int tileX: pinchmap.cornerTileX + index % pinchmap.numTilesX
property int tileY: pinchmap.cornerTileY + index % pinchmap.numTilesY

It does work in the sense that it reliably covers the screen with tile delagates, but results in the delegates changing their coordinates when cornerTileX/Y or numTilesX/Y changes, which of course happens all the time (panning, re-centering, screen rotation/window resize, etc.). This is also basically how it worked before - static number of tile delegates "changing roles" as the viewport moved.

In contrast to that I need delegates that never change their coordinates once instantiated (its a bit more complicated due to zooming but lets leave that out for now ;-) ).

I also don't really get the explanation given in the commit message of M4rtinK/modrana@72276f4 about URL switching causing a texture re-upload being worse than loading a new image -- in both cases, texture data has to be uploaded to the GPU, whether the same texture object is used or whether a new one is created shouldn't matter to the application developer (Qt's Scene Graph renderer should just do the right thing, which might involve two textures used in alternating fashion if stalling the pipeline with sub-texture uploads is an issue on certain hardware).

Indeed - you of course need to upload the texture for an Image element to the GPU at least once:

Image {
    id : "image1"
    source : "image://myimageprovider/foo.png"
}
Image {
    id : "image2"
    source : "image://myimageprovider/bar.png"
}

This will result in the results of "image://myimageprovider/foo.png" and "image://myimageprovider/foo.png" being both cached in the QtQuick image cache in RAM & uploaded to the GPU.

But from what I have understood from my IRC discussion with w00t is that when you do this:

image1.source = "image://myimageprovider/bar.png"
image2.source = "image://myimageprovider/foo.png"

The provider will not be queried again (as the both images are already in the QtQuick image cache in RAM) but both textures will be uploaded again to the GPU. If it really is like that & given how many tiles might be on the screen in modRana (+possibly multiple overlays) I would really like to avoid such overhead if possible.

Also if the cache is flushed (which QML application can't really influence) each URL switch will query the provider + the GPU upload mentioned above. Again given the huge number of tiles cache flushes are likely & do happen (and manifest as map flickering).

Also, to further confuse thing but hopefully explain my main motive I'll try to explain how tile display worked in modRana before & how it is supposed to work now:

Before

A static number of tiles that switch coordinates and this source urls. Lets say a 2x2 grid covers the screen:

tile_1,url="0/0"|tile_2,url="1/0"
tile_3,url="0/1"|tile_4,url="1/1"

When the map is panned to the left:

tile_1,url="1/0"|tile_2,url="2/0"
tile_3,url="1/1"|tile_4,url="2/1"

the "0/0" and "0/1" urls are no longer visible
the "1/0" url moves from tile_2 to tile_1 and the "1/1" url moves from tile_4 to tile_3
- thanks to the image cache this does not trigger a query to provider as it should be served by the cache
- might trigger a GPU reupload according to w00t
- will trigger a query to provider if the image cache was flushed in the meantime - this makes the map flicker during panning
"2/0" and "2/1" urls are set for the tile_2 and tile tile_4, the respective images are loaded from provider and cached

Rinse and repeat for another pan, re-centering, zoom in-out, screen resize, etc. :) BTW, it took me some time to actually understand how AGTL did it and I still find it pretty clever - it just has some unfortunate side effects. :)

The new method

A data model track what tiles should be visible on the screen at the given moment and is updated accordingly when the map state changes (panning, re-centering, screen rotation/window size change). Lets say 4 tiles that again 4 tiles can cover the screen like this:

tile_0/0,url="0/0"|tile_1/0,url="1/0"
tile_0/1,url="0/1"|tile_1/1,url="1/1"

Again, the map is panned to the left:

tile_1/0,url="1/0"|tile_2/0,url="2/0"
tile_1/1,url="1/1"|tile_2/1,url="2/1"

tile_0/0 and tile_0/1 "fall of the screen", so their ListItems are removed from the model and their delegates are destroyed by the Repeater
tile_1/0 and tile_1/1 simply move to the left
ListItems are added to the model for tile_2/0 and tile_2/1, their delegates are created by the Repeater and the images are loaded from the provider

Rinse and repeat for another pan, re-centering, zoom in-out, screen resize, etc.

The main benefit is that modRana no longer needs to care about the QtQuick image cache - panning is just moving the Image elements around and will not flicker even if the cache is flushed as images are not reloaded during panning. It might actually make sense to not use the QtQuick image cache at all - modRana can easily cache the tile data on the Python side if needed and unlike with the QtQuick image cache, it has full control of the cache in this case. It can tune the cache size, flush by LRU, drop the cache if memory pressure is high, avoid caching of dynamic layers, etc.

Also another benefit of this method was already mentioned - as long as the tile images are just moved around and not reloaded needless GPU uploads are avoided.

I hope this wall of text makes some sense. ;-)

If reducing CPU->GPU bandwidth is a goal, the best would be to keep as many tiles as possible in memory (even if off-screen) and only start removing tiles once GPU memory is exhausted (but then again, the application developer might not have too much control over this, and the Qt Scene Graph renderer might decide to handle texture allocation in any way it sees fit).

If performance is really an issue, just have one big element (e.g. PyGLArea in recent PyOtherSide versions) and a single texture big enough to hold the maximum amount of tiles displayed at any point in time, then render textured quads (one for each tile) with the right offsets depending on panning, and once tiles fall off the screen, allocate a new sub-tile inside the texture, upload that (which can happen asynchronously) and you save yourself the trouble of doing all the QML model housekeeping (bonus points for updating mipmap levels and having smooth, performant zooming without reloads).
I think this is quite close to the new method of tile display I am trying to achieve (and I guess the Qt/QML internals might be doing something similar to the texture mapping tricks you describe) - of course minus the QML data model management madness. :)

This might indeed be they way to go eventually, not only to avoid the general object juggling overhead but possibly also to implement advanced map display features (map tilt, advanced animations, 3D elements/3D buildings, etc.). But I would need to learn more about OpenGL before that and generally want to have something working soon so I can finally get a modRana release out of the door. :)

BTW, another option might be to use Canvas sized to cover the screen and using the loadImage() method to asynchronously load tile images. Once rendered the canvas could just be panned until some tiles would "fall off" and some would come into view - that would trigger unloadImage() on the "fallen" tiles and "loadImage()" on the new tiles. Then the the canvas would be panned back and re-rendered.

I think that is also quite close to you suggestion - isn't canvas rendering in this case also just blitting the textures on given coordinates ?

Something inspired by this: http://silverspaceship.com/src/svt/, but you probably don't even need the additional layer of indirection for your specific case.

PS.: Sorry for such a long reply that in the end mostly contains information not directly related to the crash. ;-)

M4rtinK · 2016-05-22T12:19:04Z

I might possible have another data point - looks like the weird broken-scope messages "PinchMap.qml:609:37: Unable to assign [undefined] to int" can show up even without a crash provided that modRana does not try to actually display tile images:

https://github.com/M4rtinK/modrana/tree/master-pyotherside_weird_messages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash preceded by weird error messages #66

Crash preceded by weird error messages #66

M4rtinK commented May 16, 2016

thp commented May 17, 2016

M4rtinK commented May 18, 2016

M4rtinK commented May 22, 2016

Crash preceded by weird error messages #66

Crash preceded by weird error messages #66

Comments

M4rtinK commented May 16, 2016

thp commented May 17, 2016

M4rtinK commented May 18, 2016

M4rtinK commented May 22, 2016