Skip to content

Commit

Permalink
Exclude files that user is not allowed to download
Browse files Browse the repository at this point in the history
Based on the type of ecoinvent license, not all files are actually
available to download even though they're listed in the page source.
Files with the html class 'fileDownloadNotAllowed' are now excluded.
  • Loading branch information
haasad committed Apr 23, 2022
1 parent f758861 commit 6f4430d
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions eidl/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,9 +97,11 @@ def get_available_files(self):
self.handle_connection_timeout()
raise e
soup = bs4.BeautifulSoup(files_res.text, 'html.parser')
file_list = [l for l in soup.find_all('a', href=True) if
all_files = [l for l in soup.find_all('a', href=True) if
l['href'].startswith('/File/File?')]
link_dict = {f.contents[0]: f['href'] for f in file_list}
not_allowed = soup.find_all('a', class_='fileDownloadNotAllowed')
available_files = set(all_files).difference(set(not_allowed))
link_dict = {f.contents[0]: f['href'] for f in available_files}
link_dict = {
k.replace('-', ''):v for k, v in link_dict.items() if k.startswith('ecoinvent ') and
k.endswith('ecoSpold02.7z') and not 'lc' in k.lower()
Expand Down

0 comments on commit 6f4430d

Please sign in to comment.