Significant speed improvements #1477

ampli · 2024-02-17T23:50:47Z

ampli
Feb 17, 2024
Collaborator

I recently revisited the LG project and was surprised to discover several opportunities for significant speed enhancements. For instance, implementing a new data structure for match lists and adapting the counting to leverage it could potentially result in an order of magnitude increase in parsing speed—though the true extent of the gains will only be evident after implementation. Additionally, I've identified areas where caching could be introduced or existing caches optimized.

Would you be able to review my pull requests if I start submitting them? For context, one of the initial PRs aims to significantly reduce the number of the Parse_set elements. It addresses the issue where mk_parse_set() generates unconnected elements in the absence of a complete parse for a word range, a flaw that was straightforward to rectify. This not only conserves memory but also reduces CPU usage. I also implemented a similar fix in do_count(). I addition, I have some old PRs to send.

linas · 2024-03-26T05:28:13Z

linas
Mar 26, 2024
Maintainer

Yes, of course! Please! (Sorry for the late response. I, uhh, took some time off)

0 replies

linas · 2024-03-26T23:52:42Z

linas
Mar 26, 2024
Maintainer

I'm noticing that some words have very few disjuncts, while others have many. Perhaps parsing would be more efficient if it started with the word having the fewest number of disjuncts on it, rather than starting at the left wall?

2 replies

ampli Mar 27, 2024
Collaborator Author

It seems the parsing algo does an exhausting search, so unless some parse paths can be eliminated (like by the pseudocounting and by the nearest/farthest word checks) I don't see how this could help.

linas Mar 28, 2024
Maintainer

Think of it this way: a sentence with 8 words. Lets say that the first word has a million disjuncts, the last word has only two.

Start with the one that has a million disjuncts. For each one, connect up 6 other words, then get to the last word, and find it fails to connect. Retreat back, and try again and again (taking 6 long steps each time) only to discover that, most of the time, there's a blockage in the middle, and the last word is unreachable. So, a million attempts, and most fail.

Now, instead, start at the one with two disjuncts. For each of these two, connect up some words, and discover there's no path to the million-disjunct word, due to some lack of linkage in the middle. Each blocked path in the middle knocks out a million path explorations at the final step. For the few rare cases where the middle words can be linked, only then, after hooking up the other seven words, does it make sense to walk over the million disjuncts, to see which ones fit. This final loop, over a million possibilities, is fast, because the other seven words have already been linked.

Maybe pseudocounting already does this. I don't know. For some reason, I have a mental stumbling block understanding pseudocounting. Every time I think I understand it, I smile, and a week later, I've already forgotten how it works. Alas.

ampli · 2024-03-28T19:15:19Z

ampli
Mar 28, 2024
Collaborator Author

Here's how I envision testing your suggestion of starting with the words that have the fewest disjuncts:

The current algorithm begins parsing a range of words that encompasses the entire sentence. This process involves splitting the given range into two smaller ranges and then recursively parsing each range. It always starts with the LHS. If that side is parseable, it then proceeds to parse the RHS (I ignore the optimizations in these description). In the LG paper (I can't recall if it was the first or the second one), the authors mentioned they attempted to parse the RHS first but found it didn't, on average, speed up the process.

Incorporating your idea, I suggest we first determine the average number of disjuncts per word and start with the range that has the lower average, thinking it will be quicker to handle a smaller number of disjuncts per word. This approach seems like it could delay dealing with words that have more disjuncts as much as possible (though I'm not entirely sure). It doesn't seem too difficult to implement.

However, this approach conflicts with a simpler implementation of the ideas I'm currently exploring, which involves viewing the LHS and RHS jets as a trie of two components. My other concept – to leverage "leacons" in addition to and independently from the current "tracons" – also becomes more complex if we don't always start with a specific side. But in principle, there is no reason to avoid combining all the methods that accelerate parsing.

1 reply

linas Mar 28, 2024
Maintainer

Cool. OK. FWIW, the dataset I gave you has zillions of disjuncts on left-wall, and a lot on "is" and a lot on "or". Other words seem reasonable. BTW, that dataset is "ugly", but its what's generated at the 2nd-stage of batch processing. I'm trying to move away from batching, but that requires a lot of work.Oh, also, the number of disjuncts loaded depends on the actual sentence: the Atomese code already does a pre-pre-prune to only load potentially connectable disjuncts. Thus, as you make longer sentences, you will see disjunct counts go up too (in the verbosity=3 printouts).

Also FYI I am contemplating publishing version 5.12.4 maybe tonight, maybe in the next few days. Let me know if I should wait.

ampli · 2024-03-29T01:46:41Z

ampli
Mar 29, 2024
Collaborator Author

Also FYI I am contemplating publishing version 5.12.4 maybe tonight, maybe in the next few days. Let me know if I should wait.

I don't have further fixes to the current changes, so no reason to delay the new version.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant speed improvements #1477

{{title}}

Replies: 4 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Significant speed improvements #1477

ampli Feb 17, 2024 Collaborator

Replies: 4 comments · 3 replies

linas Mar 26, 2024 Maintainer

linas Mar 26, 2024 Maintainer

ampli Mar 27, 2024 Collaborator Author

linas Mar 28, 2024 Maintainer

ampli Mar 28, 2024 Collaborator Author

linas Mar 28, 2024 Maintainer

ampli Mar 29, 2024 Collaborator Author

ampli
Feb 17, 2024
Collaborator

Replies: 4 comments 3 replies

linas
Mar 26, 2024
Maintainer

linas
Mar 26, 2024
Maintainer

ampli Mar 27, 2024
Collaborator Author

linas Mar 28, 2024
Maintainer

ampli
Mar 28, 2024
Collaborator Author

linas Mar 28, 2024
Maintainer

ampli
Mar 29, 2024
Collaborator Author