Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Post.get_keywords to handle empty keywords, cleanup search_indexes #205

Merged
merged 4 commits into from
Feb 23, 2016

Conversation

skirsdeda
Copy link
Contributor

Tried using search_indexes got some exceptions (like None returned from get_keywords). Ended up cleaning up the entire PostIndex.

@@ -47,32 +47,32 @@ def get_model(self):
return Post

def get_search_data(self, post, language, request):
optional_attributes = []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this optional_attributes thing was never used :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a zombie code, yeah. The idea behind, if i remember correctly, was to provide optional attribute to extend the search. Maybe we could just remove it and eventually develop the idea in a separated branch.

@skirsdeda
Copy link
Contributor Author

Tests have to be rewritten. I will try to finish this up tomorrow.

@@ -28,7 +25,7 @@ def get_description(self, post):
return post.get_description()

def prepare_pub_date(self, post):
return post.date_published.strftime("%Y-%m-%d %H:%M:%S")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haystack handles datetime objects, strftime is really unnecessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@yakky yakky self-assigned this Feb 16, 2016
@skirsdeda
Copy link
Contributor Author

This is pretty much finished. But there is an interesting thing going on with date_published. For some reason the prepared datetime differs a few seconds from the one saved. Not sure what to blame yet :)

@yakky
Copy link
Member

yakky commented Feb 16, 2016

@skirsdeda let's ping @fmarco who worked on the first implementation of the haystack indexes IIRC he had time skewes issues as well

@skirsdeda
Copy link
Contributor Author

@yakky I would say that the logic in test_blog_post_is_indexed_using_update_object is wrong. The other one (test_blog_post_is_indexed_using_prepare) always passes.

@skirsdeda
Copy link
Contributor Author

@yakky Actually test_blog_post_is_indexed_using_update_object calls index.update_object which calls backend.update (inside Haystack) which is noop (since HAYSTACK_CONECTIONS is empty hence backend is the default - SimpleSearchBackend). So it should return the date from a previously prepared index and therefore the difference. Now I thought that test function call order is non-deterministic but I guess I was wrong?

@skirsdeda
Copy link
Contributor Author

Also note that _get_post calls Post.objects.create which makes sure we get a new instance (and a new date_published therefore).

@skirsdeda
Copy link
Contributor Author

So I removed the offending test which did nothing more than the one that is left now.

@yakky yakky assigned fmarco and unassigned yakky Feb 17, 2016
@skirsdeda
Copy link
Contributor Author

Also removed get_title from get_search_data because index_title = True. No need to repeat it twice.

@fmarco
Copy link
Contributor

fmarco commented Feb 17, 2016

I'll have a deep look into ASAP @skirsdeda

@@ -47,32 +44,36 @@ def get_model(self):
return Post

def get_search_data(self, post, language, request):
optional_attributes = []
abstract = post.safe_translation_getter('abstract')
text_bits = [post.get_title()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep this one, because it returns the meta_title or the title as a fallback, so it's not a real repetition.
See: https://github.com/nephila/djangocms-blog/blob/develop/djangocms_blog/models.py#L228

Btw, we can discuss about it :)!

@skirsdeda
Copy link
Contributor Author

@fmarco Since index_title=True, it gets included anyway. I changed get_title to use post.get_title so that fallback is used when needed.

@@ -15,20 +15,17 @@ class PostIndex(get_index_base()):

author = indexes.CharField(indexed=True, model_attr='get_author')
keywords = indexes.CharField(null=True)
tags = indexes.CharField(null=True)
tags = indexes.CharField(null=True, model_attr='get_tags')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fmarco
Copy link
Contributor

fmarco commented Feb 19, 2016

@skirsdeda Ok, it's an idea. By the way i think we could keep as a separated default search attribute the title, in the case we want to search for the title or the meta_title. @yakky what do you think?

@skirsdeda
Copy link
Contributor Author

@fmarco Regarding title, I personally use index_title=False and when searching I search both in title and text. And that's how it usually is because title gets bigger boost and documents matching query in title get bigger score.

@@ -232,7 +232,7 @@ def get_title(self):
return title.strip()

def get_keywords(self):
return self.safe_translation_getter('meta_keywords').strip().split(',')
return self.safe_translation_getter('meta_keywords', default='').strip().split(',')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps: btw, how can the error be reproduced? Because here: https://github.com/nephila/djangocms-blog/blob/develop/djangocms_blog/models.py#L133 "meta_keywords" have the '' default.
Just to have an idea of the flow error. Anyway, it's a sane check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmarco Now that is interesting because I have all migrations applied and yet meta_keywords is not null with no default in database. Actually all fields in djangocms_blog_post_translation are not null with no default... So either there were some migration changes after they were first created or it has to do with parler. I have to investigate this a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmarco So Django doesn't save default values in DB schema. Means that all is good with migrations. I bet I am hitting some django-parler bug here. Maybe it doesn't consider field's default value in safe_translation_getter for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh! So that's the problem that causes the issue!
So, the stronger check have even more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmarco Yup, it's django-parler's fault. Filed an issue. But for now it's better to have default='' to make safe_translation_getter really safe :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree :)! 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps: so, as said above let's make the check on the Post model method get_keywords to have a default '' fallback 👍 .

@fmarco
Copy link
Contributor

fmarco commented Feb 19, 2016

@skirsdeda About test_blog_post_is_indexed_using_update_object test, do you any suggestion to make test using a different backend? Otherwise we keep it removed as you did :).

@skirsdeda
Copy link
Contributor Author

@fmarco If you use a different backend you will need to pull in some dependencies for tests. For Solr and elasticsearch you would have to install and run an actual service which is a bit of an overkill for one test. Also it would make Travis builds considerably longer. The only option is using Whoosh backend which just runs as a python lib.

@fmarco
Copy link
Contributor

fmarco commented Feb 19, 2016

@skirsdeda I see, infact i read that someone try to use services (as even Docker containers to run those services), but i agree with you that's an overkill for the moment. Maybe we could integrate a whoosh based test next.

@fmarco
Copy link
Contributor

fmarco commented Feb 19, 2016

@yakky LGTM

@yakky yakky assigned yakky and unassigned fmarco Feb 19, 2016
@yakky
Copy link
Member

yakky commented Feb 23, 2016

@skirsdeda thanks for your work
@fmarco thanks for the review

yakky added a commit that referenced this pull request Feb 23, 2016
fix Post.get_keywords to handle empty keywords, cleanup search_indexes
@yakky yakky merged commit 1941cc5 into nephila:develop Feb 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants