Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved search keyword encoding with support for exact phrase #80

Open
akifusenet opened this issue Jun 12, 2020 · 5 comments
Open

Improved search keyword encoding with support for exact phrase #80

akifusenet opened this issue Jun 12, 2020 · 5 comments
Assignees
Milestone

Comments

@akifusenet
Copy link

Issue Template

Description

For example on indeed when you want to search for an exact phrase (multiple words) as keyword you put this phrase between double quotes.

When I want to use this feature on funnel it removes the double quotes and it returns wrong results.

Steps to Reproduce

  1. Use funnel with multiple word as keywords between double quotes
  2. Example: -kw "Data Distribution Service"

Expected behavior

Normally when you write this keywords on indeed website this is the URL that is generated:
https://www.indeed.com/jobs?q=%22data+distribution+service%22&l=Saratoga%2C+CA&radius=25

Actual behavior

But funnel generates this url:
getting indeed page 0 : http://www.indeed.com/jobs?q=Data Distribution Service&l=Saratoga%2C+CA&radius=25&limit=50&filter=0&start=0

Environment

*Windows 10 Home

@PaulMcInnis
Copy link
Owner

Great suggestion for improved usability.

bunsenmurder added a commit to bunsenmurder/JobFunnel that referenced this issue Jun 29, 2020
- For Indeed and Monster, the query string was not properly encoded when a quoted phrase with spaces in-between words were provided. The fix was to encode all spaces with the proper character(+/-). This issue and fix also applied to city names.
- For GlassDoorStatic, the query string was encoded for a URL and returned improper results. Since this class searches using a JSON payload, the solution was to combine the keywords with a space instead.
-The old query construction function was moved from GlassDoorBase to GlassDoorDynamic to prevent the dynamic scraper class from breaking.

Fixes issues PaulMcInnis#80.
bunsenmurder added a commit that referenced this issue Jul 12, 2020
* Fixes search issues due to bugs in search query encodings
- For Indeed and Monster, the query string was not properly encoded when a quoted phrase with spaces in-between words were provided. The fix was to encode all spaces with the proper character(+/-). This issue and fix also applied to city names.
- For GlassDoorStatic, the query string was encoded for a URL and returned improper results. Since this class searches using a JSON payload, the solution was to combine the keywords with a space instead.
- The old query construction function was moved from GlassDoorBase to GlassDoorDynamic to prevent the dynamic scraper class from breaking.
- Fixes issues #80.

* Radius function cleanup

* Cleaning and networking code adjustments
- Removed unused requests imports
- Changed URL strings that had http in them to https
- Set provider header dictionary as the default headers on the provider's session object. Setting headers on the actual post/get method call is only necessary for temporarily overriding the session headers on an individual request.
- Adjusted search_page_for_job_soups method for GlassDoorStatic class so that it uses GET instead of POST. Sending payload data when we already have the search page URL is unnecessary and can lead to bot detection measures activating more frequently.

* Updated indeed test URL
- Updated test URL to test for https instead of http

* Fixes to asynchronous parsing code 
- Previously futures would be deleted whether they finished parsing or not.
- Added code to delete the HTML page after it's  parsed.
- Added code to log any errors during blurb retrieval and parsing.

* Version bump
@bunsenmurder
Copy link
Collaborator

Hi, @akifusenet, I just added a commit that should fix this issue. Could you pull the latest commit and let us know if it fixed the problem?

@PaulMcInnis
Copy link
Owner

Assigning to myself because I need to port this fix to new master

@PaulMcInnis PaulMcInnis self-assigned this Sep 12, 2020
@PaulMcInnis PaulMcInnis added this to the 3.0.1 milestone Sep 12, 2020
@PaulMcInnis
Copy link
Owner

I am thinking it might be wiser if we provide a search config parameter such as --exact-match

@markkvdb
Copy link
Collaborator

markkvdb commented Oct 2, 2020

I think the simplest way would be to split the search url into two parts: stem_url and arguments. The stem url would contain everything up to arguments, e.g., https://monster.com/jobs/search/ while arguments contains all things like ?q=%22data+distribution+service%22.

The latter can be simplified and clarified by using the urllib.parse.urlencode in which you give the arguments as a dictionary. Strings will also be automatically converted to the URI encoding used for URLS.

markkvdb added a commit that referenced this issue Oct 6, 2020
@PaulMcInnis PaulMcInnis modified the milestones: 3.0.1, 3.0.2 Oct 11, 2020
@PaulMcInnis PaulMcInnis changed the title How to search for the exact phrase as keyword Ability to search for an exact phrase Nov 25, 2021
@PaulMcInnis PaulMcInnis changed the title Ability to search for an exact phrase Improved search keyword encoding with support for exact phrase Nov 25, 2021
@PaulMcInnis PaulMcInnis modified the milestones: 3.0.3, 4.0 Nov 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants