You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to ask a question:
The binary search tree (BST) in which flows are stored is not balanced.
The complexity analysis of BST shows that, on average, the insert, delete and search takes O(log n) for n nodes. But in the worst case, they degrade to that of a singly linked list: O(n).
When processing each packet, 3 search operation in the tree are performed (2 calls to ndpi_tfind and 1 call to ndpi_tsearch).
This can cause large delays if the number of flows is large (> 1 million) and the tree is degenerate into a list.
The question is: Why don't you use a balanced tree like a red-black tree where the worst case search cost is O(log n)?
Thank you for any hint
The text was updated successfully, but these errors were encountered:
You analysis is correct.
However there is a missing point: nDPI (the library) doesn't do any flow management; this (important and critical) topic is left to the application. I am sure that any serious applications linking to nDPI is using a proper and efficient data structure for keeping flows information, tailored to the specific application needs. ndpiReader is only a basic example, aiming to show nDPI capabilities and how to use nDPI API: performance is not its goal.
I am not sure because I was not involved with the project at the time, but it is quite likely that ndpiReader is using a simple tree only because that data structure was easily available at the time, or it was easy to implement.
Having said that, any effort from the community to improve (also) ndpiReader performance is welcomed :)
Hello guys,
I would like to ask a question:
The binary search tree (BST) in which flows are stored is not balanced.
The complexity analysis of BST shows that, on average, the insert, delete and search takes O(log n) for n nodes. But in the worst case, they degrade to that of a singly linked list: O(n).
When processing each packet, 3 search operation in the tree are performed (2 calls to ndpi_tfind and 1 call to ndpi_tsearch).
This can cause large delays if the number of flows is large (> 1 million) and the tree is degenerate into a list.
The question is: Why don't you use a balanced tree like a red-black tree where the worst case search cost is O(log n)?
Thank you for any hint
The text was updated successfully, but these errors were encountered: