Replies: 4 comments 3 replies
-
Yes, of course! Please! (Sorry for the late response. I, uhh, took some time off) |
Beta Was this translation helpful? Give feedback.
-
I'm noticing that some words have very few disjuncts, while others have many. Perhaps parsing would be more efficient if it started with the word having the fewest number of disjuncts on it, rather than starting at the left wall? |
Beta Was this translation helpful? Give feedback.
-
Here's how I envision testing your suggestion of starting with the words that have the fewest disjuncts: The current algorithm begins parsing a range of words that encompasses the entire sentence. This process involves splitting the given range into two smaller ranges and then recursively parsing each range. It always starts with the LHS. If that side is parseable, it then proceeds to parse the RHS (I ignore the optimizations in these description). In the LG paper (I can't recall if it was the first or the second one), the authors mentioned they attempted to parse the RHS first but found it didn't, on average, speed up the process. Incorporating your idea, I suggest we first determine the average number of disjuncts per word and start with the range that has the lower average, thinking it will be quicker to handle a smaller number of disjuncts per word. This approach seems like it could delay dealing with words that have more disjuncts as much as possible (though I'm not entirely sure). It doesn't seem too difficult to implement. However, this approach conflicts with a simpler implementation of the ideas I'm currently exploring, which involves viewing the LHS and RHS jets as a trie of two components. My other concept – to leverage "leacons" in addition to and independently from the current "tracons" – also becomes more complex if we don't always start with a specific side. But in principle, there is no reason to avoid combining all the methods that accelerate parsing. |
Beta Was this translation helpful? Give feedback.
-
I don't have further fixes to the current changes, so no reason to delay the new version. |
Beta Was this translation helpful? Give feedback.
-
Hello @linas,
I recently revisited the LG project and was surprised to discover several opportunities for significant speed enhancements. For instance, implementing a new data structure for match lists and adapting the counting to leverage it could potentially result in an order of magnitude increase in parsing speed—though the true extent of the gains will only be evident after implementation. Additionally, I've identified areas where caching could be introduced or existing caches optimized.
Would you be able to review my pull requests if I start submitting them? For context, one of the initial PRs aims to significantly reduce the number of the
Parse_set
elements. It addresses the issue wheremk_parse_set()
generates unconnected elements in the absence of a complete parse for a word range, a flaw that was straightforward to rectify. This not only conserves memory but also reduces CPU usage. I also implemented a similar fix indo_count()
. I addition, I have some old PRs to send.Beta Was this translation helpful? Give feedback.
All reactions