explicit/SCCComputerTarjan.tarjan() crashes to stack overflow for larger models #77

ghost · 2018-05-28T16:04:58Z

Hi.
When using the explicit engine to compute steady-state probabilities for this model, stack overflow exception is thrown and the computation fails, while the GUI gets stuck and pretends the computation continues.

dtmc //random walk along a path

const int qCapacity = 50000;

module main

qSize : [0..qCapacity] init 0;

[step] (qSize < qCapacity & qSize != 0) -> 0.5 : (qSize' = qSize+1) + 0.5 : (qSize' = qSize-1);
[step] (qSize = 0) -> (qSize' = qSize+1);
[step] (qSize = qCapacity) -> (qSize' = qSize-1);

endmodule

I did not explore this bug in much depth, nor I understand this algorithm.
But from what I can tell, the reason is as follows.
SCCComputerTarjan attempts to explore all successors of the initial state recursively.
Each time the computation tries to explore a deeper successor, function tarjan(int i) gets called recursively on top of the previous one, which is apparent from the code.

	private void tarjan(int i) throws PrismException
	{
		...
		SuccessorsIterator it = model.getSuccessors(i);
		while (it.hasNext()) {
			...
			if (n.index == -1) {
				tarjan(e); //#THE RECURSIVE CALL##################################
				...
		}
		....
	}

For queue-like models with many states, it can easily happen that the recursion is so deep that stack overflow occurs before all states are explored. Because of this, it has happened to me that similar queue-like models with as few as 5000 states are already too much to handle.

Thanks!

The text was updated successfully, but these errors were encountered:

kleinj · 2018-05-28T18:37:40Z

Hi,

yes, the current implementation of computing the strongly-connected components via Tarjan's algorithm is implemented recursively in the explicit engine. For "deep" models, this can lead to the Java stack getting exhausted. A quick "fix" is to increase the Java stack size, by tweaking the default value in the assignment

PRISM_JAVASTACKSIZE="-Xss4M"

in the bin/prism startup script (which is also called when you run xprism). The default is a 4MB stack size, if you increase that to something like 40 or 100MB, you should be able to handle larger models.

Some time ago I implemented a non-recursive implementation of Tarjan's algorithm, i.e., one that maintains its own computation stack, but that is not yet available in the PRISM master due to pending testing / polishing / benchmarking.

Thanks for remarking on the GUI's behaviour when the computation thread dies, we should make the handling of that error case more robust.

Cheers,
Joachim

kleinj · 2018-07-18T16:47:59Z

Hi,

we just merged some changes to allow setting the Java stack size via e.g. -javastacksize 40m (similar to -javamaxmem) and improved the error handling when a stack overflow occurs (command-line version and GUI). Thanks again for reporting. A non-recursive version of Tarjan's algorithm should be forthcoming shortly as well.

Cheers,
Joachim

ghost · 2018-07-18T17:51:49Z

That's some good news, thanks!

kleinj mentioned this issue Jun 7, 2018

More graceful stack overflow handling #81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explicit/SCCComputerTarjan.tarjan() crashes to stack overflow for larger models #77

explicit/SCCComputerTarjan.tarjan() crashes to stack overflow for larger models #77

ghost commented May 28, 2018

kleinj commented May 28, 2018

kleinj commented Jul 18, 2018

ghost commented Jul 18, 2018

explicit/SCCComputerTarjan.tarjan() crashes to stack overflow for larger models #77

explicit/SCCComputerTarjan.tarjan() crashes to stack overflow for larger models #77

Comments

ghost commented May 28, 2018

kleinj commented May 28, 2018

kleinj commented Jul 18, 2018

ghost commented Jul 18, 2018