-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve concurrency issues #834
base: master
Are you sure you want to change the base?
Conversation
…/languages concurrently change `settings.DATE_ORDER`
Codecov Report
@@ Coverage Diff @@
## master #834 +/- ##
=======================================
Coverage 98.37% 98.38%
=======================================
Files 231 231
Lines 2591 2595 +4
=======================================
+ Hits 2549 2553 +4
Misses 42 42
Continue to review full report at Codecov.
|
After ~1h playing around with the changes, I think I’m starting to get a grasp of some of it. At the moment there are 2 questions in my mind:
|
|
if settings: | ||
self._updateall(settings.items()) | ||
else: | ||
def __init__(self, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gallaecio @alex-triphonov
would it be possible to keep the settings=None
and, in case it's used, raise a deprecation warning and preserve the old behavior?
Something like:
def __init__(self, settings=None, **kwargs):
if settings:
# raise warning (deprecation warning, not concurrent) + old behavior
else:
# new behavior using kwargs
I know the code won't be nice, but I think in that way we could keep the backward compatibility until the next major version.
(You should do the same for get_key()
and maybe something similar in the other two methods you touched).
Maybe I'm saying something stupid, sorry guys, I didn't have too much time to play with the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this is not possible- settings are still being used, but in kwargs for more convenience, the only way to separate behaviours is to add some new kwarg like new_settings
but is old behaviour with fewer Settings
instances that much essential?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it’s mostly a matter of API. The current change would require to postpone this to dateparser 2.x, as it is backward incompatible. If we can figure a backward compatible implementation, this can go on the next dateparser version.
@alex-triphonov you did a good job here 💪, but as mentioned before we can't merge it as-is. The point is not "how essential it is", the issue here is that when declaring a public class or function (in this case However, fixing multithreading is a must and a priority, so I will check it further to see if I find a way to support both "settings" versions to avoid a really premature thanks! |
Ok I get it. So in order to add backward incompatibility, we can add a new flag to settings, like
This is relatively easy to implement, but then everyone who doesn't care about concurrency |
Sorry, I think I would need to add some tests to fully understand if these changes related to settings creation are breaking changes or not. But for the Couldn't be possible to do something like this?: def replace(self, mod_settings=None, **kwds):
if any(kwarg for kwarg in kwds if kwarg not in ('settings')):
# warning + old behavior
else:
# new behavior |
No I don't think we can do it this way because we have kwargs besides |
Does it mean adding different behaviour for |
Hey @alex-triphonov no, that milestone means that, as of now, the PR couldn't be merged as-is before a It's ok to use the |
merge upstream to master
@noviluni
|
I’ve found two main concurrent issues with dateparser:
I) TypeError: unsupported operand type(s) for -: 'NoneType' and 'relativedelta' #441 #813
Reason: Variable freshness_date_parser is a shared instance and its self.now was set to None by some threads while still being used by others.
Solution: Make a local variable now instead of self.now. There’s already an PR #814 with this solution, but unfortunately it stuck, so I had to add it to my code in order to pass tests.
II) Threads mix up date orders
Reason: Settings instances are shared between locales and while one thread in
_DateLocaleParser._try_parser() was parsing a date with date order picked for specific locale, other threads with the same Settings instances and different locales used shared _settings.DATE_ORDER before previous thread set default _settings.DATE_ORDER back, which is wrong for their locales and thus returned wrong dates.
Solution: Create separate Settings instances not only for different incoming settings, but for all other kwargs e.g. different locales, languages, etc.
other notes:
I noticed that the replace() method changed the original settings dict as it's a mutable object, so I’ve modified it to work with a copy() of it. As settings can be passed to kwargs as a None- added a check for this case.