Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #456 Add Image/Video search support for Bing #448

Merged
merged 13 commits into from
Jan 26, 2018
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ install:
- pip install -r requirements.txt

before_script:
- flake8 . --count --max-complexity=15 --show-source --statistics
- flake8 . --count --max-complexity=16 --show-source --statistics
script:
- python -m app.server > /dev/null &
- pytest --cov=./
Expand Down
4 changes: 4 additions & 0 deletions app/scrapers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ def feed_gen(query, engine, count=10, qtype=''):
engine = old_names.get(engine, engine)
if engine in ('quora', 'youtube'):
urls = scrapers[engine].search_without_count(query)
elif (engine in ['bing']) and (qtype == 'vid'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if we follow the same convention for conditional statements. As line 43 uses a tuple, we should use tuple for line 45 and 47 as well, i.e. changing engine in ['bing'] to engine in ('bing').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif engine in ['bing'] and qtype == 'vid': # Lose the extra parens -- this is not JavaScript

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about the use of list and tuple in the conditional statements. We should follow one convention to reduce confusion and better readability.

urls = scrapers[engine].video_search_without_count(query)
elif (engine in ['bing']) and (qtype == 'isch'):
urls = scrapers[engine].image_search_without_count(query)
else:
urls = scrapers[engine].search(query, count, qtype)
return urls
40 changes: 40 additions & 0 deletions app/scrapers/bing.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,13 @@ class Bing(Scraper):
def __init__(self):
Scraper.__init__(self)
self.url = 'http://www.bing.com/search'
self.videoURL = 'https://www.bing.com/videos/search'
self.imageURL = 'https://www.bing.com/images/search'
self.defaultStart = 1
self.startKey = 'first'
self.name = 'bing'
self.videoKey = 'FORM'
self.imageKey = 'FORM'

def parse_response(self, soup):
""" Parses the reponse and return set of urls
Expand All @@ -30,3 +34,39 @@ def parse_response(self, soup):
print('Bing parsed: ' + str(urls))

return urls

def parse_video_response(self, soup):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can be changed to a static method by using @staticmethod and removing self from the position argument.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since previously it was not useful so not using this time too.

Copy link
Contributor

@cclauss cclauss Jan 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a really poor answer. The suggested modification is indeed useful as the linter is trying to help you to avoid unnecessary runtime overhead.

https://www.codacy.com/app/fossasia/query-server/pullRequest?prid=1252888#newIssuesView

Normal: 0.5956953540444374
 Class: 0.38477163994684815
Static: 0.18970419897232205  # Three times faster than the normal method call.
#!/usr/bin/env python3

from timeit import timeit


setup = """
class TestClass:
    def add_normal_method(self, i):
        return i + 1

    @classmethod
    def add_class_method(cls, i):
        return i + 1

    @staticmethod
    def add_static_method(i):
        return i + 1


t = TestClass()"""


print('Normal:', timeit('t.add_normal_method(1)', setup))
print(' Class:', timeit('t.add_class_method(1)', setup))
print('Static:', timeit('t.add_static_method(1)', setup))

""" Parse response and returns the urls

Returns: urls (list)
[[Tile1, url1], [Title2, url2], ...]
"""
urls = []
for a in soup.findAll('a', attrs={'class': 'mc_vtvc_link'}):
title = a.get('aria-label').split(' Duration')[0]
url = 'https://www.bing.com' + a.get('href')
urls.append({
'title': title,
'link': url
})

print('Bing parsed: ' + str(urls))

return urls

def parse_image_response(self, soup):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method can be changed to a static method by using @staticmethod and removing self from the position argument. Codacy also suggests the same modification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since previously it was not useful so not using this time too.

""" Parse response and returns the urls

Returns: urls (list)
[[url1], [url2], ...]
"""
urls = []
for a in soup.findAll('a', attrs={'class': 'iusc'}):
url = 'https://www.bing.com' + a.get('href')
urls.append({
'link': url
})

print('Bing parsed: ' + str(urls))

return urls
28 changes: 28 additions & 0 deletions app/scrapers/generalized.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,3 +81,31 @@ def search_without_count(self, query):
soup = BeautifulSoup(response.text, 'html.parser')
urls = self.parse_response(soup)
return urls

def video_search_without_count(self, query):
"""
Search for the query and return set of urls
Returns: list
"""
urls = []
if self.name in ['bing']:
url = self.videoURL
payload = {self.queryKey: query, self.videoKey: 'HDRSC3'}
response = requests.get(url, headers=self.headers, params=payload)
soup = BeautifulSoup(response.text, 'html.parser')
urls = self.parse_video_response(soup)
return urls

def image_search_without_count(self, query):
"""
Search for the query and return set of urls
Returns: list
"""
urls = []
if self.name in ['bing']:
url = self.imageURL
payload = {self.queryKey: query, self.imageKey: 'HDRSC2'}
response = requests.get(url, headers=self.headers, params=payload)
soup = BeautifulSoup(response.text, 'html.parser')
urls = self.parse_image_response(soup)
return urls
3 changes: 2 additions & 1 deletion app/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ def search(search_engine):
unicode # unicode is undefined in Python 3 so NameError is raised
for line in result:
line['link'] = line['link'].encode('utf-8')
line['title'] = line['title'].encode('utf-8')
if 'desc' in line:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 'title', not 'desc'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

line['title'] = line['title'].encode('utf-8')
if 'desc' in line:
line['desc'] = line['desc'].encode('utf-8')
except NameError:
Expand Down
51 changes: 51 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.