Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #456 Add Image/Video search support for Bing #448

Merged
merged 13 commits into from
Jan 26, 2018
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ install:
- pip install -r requirements.txt

before_script:
- flake8 . --count --max-complexity=15 --show-source --statistics
- flake8 . --count --max-complexity=16 --show-source --statistics
script:
- python -m app.server > /dev/null &
- pytest --cov=./
Expand Down
4 changes: 4 additions & 0 deletions app/scrapers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@ def feed_gen(query, engine, count=10, qtype=''):
engine = old_names.get(engine, engine)
if engine in ('quora', 'youtube'):
urls = scrapers[engine].search_without_count(query)
elif (engine in ['bing']) and (qtype == 'vid'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful if we follow the same convention for conditional statements. As line 43 uses a tuple, we should use tuple for line 45 and 47 as well, i.e. changing engine in ['bing'] to engine in ('bing').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif engine in ['bing'] and qtype == 'vid': # Lose the extra parens -- this is not JavaScript

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about the use of list and tuple in the conditional statements. We should follow one convention to reduce confusion and better readability.

urls = scrapers[engine].video_search_without_count(query)
elif (engine in ['bing']) and (qtype == 'isch'):
urls = scrapers[engine].image_search_without_count(query)
else:
urls = scrapers[engine].search(query, count, qtype)
return urls
42 changes: 42 additions & 0 deletions app/scrapers/bing.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,13 @@ class Bing(Scraper):
def __init__(self):
Scraper.__init__(self)
self.url = 'http://www.bing.com/search'
self.videoURL = 'https://www.bing.com/videos/search'
self.imageURL = 'https://www.bing.com/images/search'
self.defaultStart = 1
self.startKey = 'first'
self.name = 'bing'
self.videoKey = 'FORM'
self.imageKey = 'FORM'

@staticmethod
def parse_response(soup):
Expand All @@ -31,3 +35,41 @@ def parse_response(soup):
print('Bing parsed: ' + str(urls))

return urls

@staticmethod
def parse_video_response(soup):
""" Parse response and returns the urls

Returns: urls (list)
[[Tile1, url1], [Title2, url2], ...]
"""
urls = []
for a in soup.findAll('a', attrs={'class': 'mc_vtvc_link'}):
title = a.get('aria-label').split(' Duration')[0]
url = 'https://www.bing.com' + a.get('href')
urls.append({
'title': title,
'link': url
})

print('Bing parsed: ' + str(urls))

return urls

@staticmethod
def parse_image_response(soup):
""" Parse response and returns the urls

Returns: urls (list)
[[url1], [url2], ...]
"""
urls = []
for a in soup.findAll('a', attrs={'class': 'iusc'}):
url = 'https://www.bing.com' + a.get('href')
urls.append({
'link': url
})

print('Bing parsed: ' + str(urls))

return urls
28 changes: 28 additions & 0 deletions app/scrapers/generalized.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,3 +84,31 @@ def search_without_count(self, query):
soup = BeautifulSoup(response.text, 'html.parser')
urls = self.parse_response(soup)
return urls

def video_search_without_count(self, query):
"""
Search for the query and return set of urls
Returns: list
"""
urls = []
if self.name in ['bing']:
url = self.videoURL
payload = {self.queryKey: query, self.videoKey: 'HDRSC3'}
response = requests.get(url, headers=self.headers, params=payload)
soup = BeautifulSoup(response.text, 'html.parser')
urls = self.parse_video_response(soup)
return urls

def image_search_without_count(self, query):
"""
Search for the query and return set of urls
Returns: list
"""
urls = []
if self.name in ['bing']:
url = self.imageURL
payload = {self.queryKey: query, self.imageKey: 'HDRSC2'}
response = requests.get(url, headers=self.headers, params=payload)
soup = BeautifulSoup(response.text, 'html.parser')
urls = self.parse_image_response(soup)
return urls
3 changes: 2 additions & 1 deletion app/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,8 @@ def search(search_engine):
unicode # unicode is undefined in Python 3 so NameError is raised
for line in result:
line['link'] = line['link'].encode('utf-8')
line['title'] = line['title'].encode('utf-8')
if 'desc' in line:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be 'title', not 'desc'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👍

line['title'] = line['title'].encode('utf-8')
if 'desc' in line:
line['desc'] = line['desc'].encode('utf-8')
except NameError:
Expand Down
51 changes: 51 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.