Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

HasGand / scihelper Public

Notifications You must be signed in to change notification settings
Fork 0
Star 3

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Breadcrumbs

scihelper

/

项目设计.md

Copy path

Latest commit

History

91 lines (56 loc) · 2.18 KB

Breadcrumbs

scihelper

/

项目设计.md

File metadata and controls

91 lines (56 loc) · 2.18 KB

爬虫功能需求：

从wos批量导出DOI（导出参考文献信息 & 摘要）（以及导入endnote20）
通过DOI（批量）获取文献pdf （以及导入endnote20）

步骤：

单个DOI从 sci-hub 下载pdf
批量下载（从含doi的文件）
从wos导出doi
或者从wos导出ris，直接导入endnote
从wos批量导出doi（ris）
根据标题下载pdf & ris
arxiv文献下载

添加功能 & 优化

动态显示下载百分比
单个文件的断点续传
文件的下载记录---缓存类Cache
解耦，优化
添加robots的解析，下载时间间隔

从sci-hub下载文献

根据doi获取该页面html源码 get_html()
正则表达式获取 pdf_url get_pdf_url()
pdf_url 下载 download_pdf()

从wos获取doi (根据主题搜索)

搜索页面没有doi信息，首先获得每篇文献的url get_url()
在每篇文献的页面获取doi get_doi()
存入文件 dois store_dois()

缓存类

每成功下载一个pdf文件，向缓存对象添加一个键值对
可以从中读取已下载内容，续接上次下载

pyinstaller -F -w main.py -p else.py -p else.py --hidden-import User-agent-list.py -w 参数可以让exe 静默运行

修改从两个网站 get_info 的删除信息问题
修改download.py
新建一个主函数进行调用

*解决办法 pandas._libs pandas._libs pyinstaller.exe -F --hidden-import pandas._libs.tslibs.base .\scihelper.spec

打包Pyinstaller详解

关于wos的正则表达式容易获取不到信息而报错的问题

加入wos页面的url连接写入info

arxiv也可写入info url
添加功能：scihub里不能下载的，尝试使用arxiv下载
增加功能：传入 doi or serial 进行下载

关于v3版本，就直接把各种报错信息处理一下

v4版本在改进

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.