-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trigger Geopoint ES Index on Geospatial Feature Flag Enable #35126
base: master
Are you sure you want to change the base?
Changes from 4 commits
4b32aec
6280c17
7f52f0e
2184e76
d7abcee
5409b02
177b1b3
9deab96
b647463
196ffd3
6906d82
ca4ad45
128757f
edb3a61
a4c7bb5
707a97d
ded63ee
a161b10
a030555
29ec236
037e5f2
0724d43
d96c1d4
e1d82cc
5137009
dd28ee8
1581ca5
cbe08cf
e1ef27a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -133,3 +133,5 @@ | |
} | ||
} | ||
} | ||
|
||
INDEX_ES_TASK_HELPER_BASE_KEY = 'geo_cases_index_cases' |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,24 @@ | ||
from django.utils.translation import gettext as _ | ||
|
||
from corehq.util.decorators import serial_task | ||
|
||
from corehq.apps.celery import task | ||
from corehq.apps.geospatial.utils import CeleryTaskTracker, update_cases_owner | ||
from corehq.apps.geospatial.const import INDEX_ES_TASK_HELPER_BASE_KEY | ||
from corehq.apps.geospatial.utils import ( | ||
get_celery_task_tracker, | ||
CeleryTaskTracker, | ||
update_cases_owner, | ||
get_geo_case_property, | ||
) | ||
from corehq.apps.geospatial.management.commands.index_geolocation_case_properties import ( | ||
_es_case_query, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're importing a protected function somewhere other than a test, then it's possibly no longer being treated as protected and you should drop the leading underscore. -- This is a guideline, not a rule, so I'm not requesting a change, and I'll leave it to your discretion ... but interesting that the linter didn't flag this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. I have always been a little unsure as to how strict we should be when it comes to marking functions in Python as private. Is having simply a single external reference enough justification, or do we need to start referencing it a few times before it becomes obvious that this is clearly a public function (I have seen plenty of examples in HQ that use both examples)? I suppose this is the potential downside of having it more as a guideline than a rule, it can be difficult to gauge where the line lies at times. Thinking this through however, the former does make more sense since even a single external reference clearly means its not private/self-contained anymore. Given this, I agree it would make sense to drop the _ here. |
||
get_batch_count, | ||
process_batch, | ||
DEFAULT_QUERY_LIMIT, | ||
DEFAULT_CHUNK_SIZE, | ||
) | ||
|
||
from settings import MAX_GEOSPATIAL_INDEX_DOC_LIMIT | ||
|
||
|
||
@task(queue="background_queue", ignore_result=True) | ||
|
@@ -9,3 +28,38 @@ def geo_cases_reassignment_update_owners(domain, case_owner_updates_dict, task_k | |
finally: | ||
celery_task_tracker = CeleryTaskTracker(task_key) | ||
celery_task_tracker.mark_completed() | ||
|
||
|
||
@serial_task('async-index-es-docs', timeout=30 * 60, queue='background_queue', ignore_result=True) | ||
ajeety4 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
def index_es_docs_with_location_props(domain): | ||
celery_task_tracker = get_celery_task_tracker(domain, INDEX_ES_TASK_HELPER_BASE_KEY) | ||
if celery_task_tracker.is_active(): | ||
return | ||
|
||
geo_case_prop = get_geo_case_property(domain) | ||
query = _es_case_query(domain, geo_case_prop) | ||
doc_count = query.count() | ||
if doc_count > MAX_GEOSPATIAL_INDEX_DOC_LIMIT: | ||
celery_task_tracker.set_message( | ||
_('This domain contains too many cases and so they will not be made available ' | ||
'for use by this feature. Please reach out to support.') | ||
) | ||
return | ||
|
||
celery_task_tracker.mark_requested() | ||
batch_count = get_batch_count(doc_count, DEFAULT_QUERY_LIMIT) | ||
try: | ||
for i in range(batch_count): | ||
progress = (i / batch_count) * 100 | ||
celery_task_tracker.set_message( | ||
_(f'Cases are being made ready for use by this feature. Please be patient. ({progress}%)') | ||
AmitPhulera marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) | ||
process_batch( | ||
domain, | ||
geo_case_prop, | ||
case_type=None, | ||
query_limit=DEFAULT_QUERY_LIMIT, | ||
chunk_size=DEFAULT_CHUNK_SIZE, | ||
) | ||
finally: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking of a situation when an exception occurs while processing the cases. This would mark the task as completed in the tracker. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I followed the same logic as the task to async update owners. If we want to be a bit more cautious here, we could do a generic There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes this is more of a safeguard thing. Good point about the task to async update owners. I think it would have been good to have the error marked and stored for that as well. Not very sure what norm is followed in HQ, however a quick search shows handling exception is a case by case basis. I would recommend for this indexing task considering if it throws a exception, the pending cases would never be processed and will not be available for the usage by the feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, handling an exception here more gracefully would be a good idea. Thinking through this however, I'm a little unsure about trying to store a message in the redis cache again. We would need to store the raw string in redis and then translate it when rendering it on the front-end, and I'm not entirely confident or sure whether pulling a string from the redis cache into a variable and then trying to translate it would work correctly. @ajeety4 What do you think of extending the status system to have the ability for custom error statuses instead? For the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Great catch. You are right on this, As per django docs,
This is a good idea. I feel like a cleaner approach would be to use a separate key There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice find! Doing message strings is definitely out then.
Do you mean that the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes correct. What do you think of returning the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like this idea, it would then keep it completely clean from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed in ded63ee. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. <3 Great discussion. |
||
celery_task_tracker.mark_completed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It feels awkward for the tasks module to be importing from a management command, instead of the other way round (or both importing from somewhere else, but I'd prefer not to throw everything in the utils module, like that second drawer in the kitchen that somehow ends up not only with tongs and skewers, but also clothes pegs, and screws, and one rusty paper clip).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a good alternative would be to move the
es_case_query
function off to thees.py
file as I feel it would fit well there. This can then be renamed to something likees_case_query_for_missing_geopoint_val
or something to that effect.Alternatively, I can also simply create a file in a new "utils" directory to contain just the above helper function. I do feel the first option makes sense enough and is a bit more straightforward, but happy to go in this direction as well if you think it sounds more suitable @kaapstorm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
es
module sounds ideal fores_case_query
! Yeah, I agree, the function is worth renaming too, and you could drop the "es_" prefix because we would know from its module that it would be an Elasticsearch query.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in a161b10.