Skip to content

Commit 317368a

Browse files
committed
update readme
1 parent 346a2d5 commit 317368a

File tree

3,996 files changed

+702409
-18
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

3,996 files changed

+702409
-18
lines changed

.idea/.gitignore

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/profiles_settings.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/learn_python3_spider.iml

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/misc.xml

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/modules.xml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/vcs.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,6 @@
33

44
peace.
55

6-
7-
8-
96
# python爬虫教程从0到1
107

118
## python爬虫前,抓包
@@ -41,27 +38,35 @@ peace.
4138
- [python爬虫系列教程26 | 当Python遇到MongoDB的时候,存储av女优的数据变得如此顺滑爽~](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484520&idx=1&sn=5e2adaa2accb7fd9af35cbe7ceef945e&scene=19#wechat_redirect)
4239
- [python爬虫系列教程27 | 你爬下的数据不分析一波可就亏了啊,使用python进行数据可视化](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484538&idx=1&sn=d9b614201c96ad283bbad8a867d42082&scene=19#wechat_redirect)
4340
- [python爬虫系列教程28 | 使用scrapy爬取糗事百科的例子,告诉你它有多厉害!](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484571&idx=1&sn=e9b1b3cf6e5401ce5bfa0dd3d29f9305&scene=19#wechat_redirect)
44-
- [python爬虫系列教程29 | 使用scrapy爬取糗事百科的例子,告诉你它有多厉害!](https://fxxkpython.com/python3-web-fxxkpython-spider-tutorial-29.html)
4541
- [python爬虫系列教程30 | scrapy后续,把「糗事百科」的段子爬下来然后存到数据库中](https://fxxkpython.com/python3-web-fxxkpython-spider-tutorial-30.html)
4642
- [mitmproxy | 那个站在中间的男人,使用Python就能直接操控你的上网请求](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485104&idx=1&sn=5ee4a04e6ce2854e5507cd320517fd0d&chksm=fc8bbe21cbfc373738d926e0ca3250f44079449a85c1fe88f307805e28a3cc4ada07d9e322bb&token=2085568099&lang=zh_CN#rd)
4743
- [mitmproxy | 如何使用 mitmproxy 监控你的手机](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485117&idx=1&sn=3819b0d55ec071164b7cabe2477ddc13&scene=19#wechat_redirect)
4844

4945

50-
5146
## python爬冲进阶:python爬虫反爬
5247

5348
- [python爬虫反爬 | 对方是如何丧心病狂的通过 css 加密让你爬不到数据的](http://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484810&idx=1&sn=ed3297773c1eeb741bdabfb31c3ea00e&chksm=fc8bbd1bcbfc340d6ae0166e035dd8c8e106afae8adc5fc32162a17b68916b69383b0ab67265&scene=27#wechat_redirect)
54-
5549
- [python爬虫反反爬 | 看完这篇,你几乎可以横扫大部分 css 字体加密的网站!](http://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484921&idx=1&sn=72a707c5bc67eede144947829cab4dc6&chksm=fc8bbd68cbfc347eca6727ff90f85ef58a4fdd7c2f75a962aee3ccd5e9c4266dbe5f4e6e2262&scene=27#wechat_redirect)
56-
5750
- [python爬虫反反爬 | 像有道词典这样的 JS 混淆加密应该怎么破](http://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484997&idx=1&sn=b304304aacb3cba31f5f7a6c6bb1ba69&chksm=fc8bbed4cbfc37c29db631c187295757c164ae75ff3e0381dbbf685a9f3d1410098e5b751e33&scene=27#wechat_redirect)
51+
- [你想逆向我的 js 代码?呵呵,先过了我的反 debug 再说吧!](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485338&idx=1&sn=5b4d6ed34a27ed5e81a3e5d8ccf8bee9&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
52+
- [你想逆向我的 js 代码?呵呵,先过了我的反 debug 再说吧!](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485338&idx=1&sn=5b4d6ed34a27ed5e81a3e5d8ccf8bee9&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
53+
54+
## Python websocket 爬虫:
55+
- [哇靠,这些数据疯狂变化,该怎么爬取?](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485466&idx=1&sn=1e4db96f3ca1d3a263dd7e075cbd7600&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
56+
57+
## Python 分布式爬虫
58+
- [说说分布式爬虫](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485718&idx=1&sn=2d42d1c7408b14781ef4c1e97fbac8f6&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
59+
- [我整来了几台服务器,就是为了给你演示一下分布式爬虫的整个过程](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485863&idx=1&sn=34f9fb196c77dffdcce4a610b622270d&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
60+
5861
## 爬虫实战教程
5962
- [python爬取 20w 表情包之后,从此你就成为了微信斗图届的高手](https://fxxkpython.com/python-pa-qu-biao-qing-bao.html)
6063
- [python爬取你喜欢的公众号的所有原创文章,然后搞成PDF慢慢看](http://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484657&idx=1&sn=998bfcce6cd22b7fedff29e68a46fe3f&chksm=fc8bbc60cbfc3576f117d3566fbea8a042ee573d840bbe6a3d4ec9bffef815c691b7f9a59711&scene=27#wechat_redirect)
6164
- [当 python 遇到你的微信的时候,你才发现原来你的微信好友是这样的](http://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484710&idx=1&sn=cf17f2e87405ebffb20edd0ca0a7315b&chksm=fc8bbdb7cbfc34a1389e17d4485b677d5ada497a404dc8f14107914e50382c640e7bd3cb93a4&scene=27#wechat_redirect)
6265
- [高考要来了,扒一扒历年高考录取分数来压压惊](http://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484745&idx=1&sn=24362e73605d30e06ebe05d1fe7225f2&chksm=fc8bbdd8cbfc34ce100b9461f46c8a1c0008172f101b34b38e146f56323bc40bbd373a127ee8&scene=27#wechat_redirect)
6366
- [随着身子的一阵颤抖,Python爬取抖音上的小姐姐突然变得索然无味](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485150&idx=1&sn=b813993925a1031d4e85eb8841ccdb37&scene=19#wechat_redirect)
64-
67+
- [使用 scrapy 爬取 stackoverflow 上的所有 Python 问答](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485754&idx=1&sn=3e52aa0ac13f3a23c6dee2b75424f0f5&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
68+
- [爬取周杰伦新歌《说好不哭》的所有评论,然后生成词云图](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485571&idx=1&sn=094517114b22a4684988008aecab2639&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
69+
- [我整来了几台服务器,就是为了给你演示一下分布式爬虫的整个过程](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485863&idx=1&sn=34f9fb196c77dffdcce4a610b622270d&scene=19&token=464856977&lang=zh_CN#wechat_redirect)
6570

6671

6772
## 爬虫实例源代码
@@ -76,10 +81,11 @@ peace.
7681
[6、搞事情了,用 Appium 爬取你的微信朋友圈](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484386&idx=1&sn=7f0545f27f095f20d69deedfa9f606a1&scene=19#wechat_redirect) | [源码](https://github.com/wistbean/learn_python3_spider/blob/master/wechat_moment.py)
7782
[7、scrapy爬取糗事百科段子到MongoDB(上)](https://fxxkpython.com/python3-web-fxxkpython-spider-tutorial-29.html)[scrapy爬取糗事百科段子到MongoDB(下)](https://fxxkpython.com/python3-web-fxxkpython-spider-tutorial-30.html) | [源码](https://github.com/wistbean/learn_python3_spider/tree/master/qiushibaike)
7883
[8、python爬取 20w 表情包之后,从此你就成为了微信斗图届的高手](https://fxxkpython.com/python-pa-qu-biao-qing-bao.html) | [源码](https://github.com/wistbean/learn_python3_spider/tree/master/biaoqingbao)
84+
[9、python爬取你喜欢的公众号的所有原创文章,然后搞成PDF慢慢看](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484657&idx=1&sn=998bfcce6cd22b7fedff29e68a46fe3f&scene=19&token=464856977&lang=zh_CN#wechat_redirect) | [源码](https://github.com/wistbean/learn_python3_spider/blob/master/wechat_public_account.py)
85+
[10、当 python 遇到你的微信的时候,你才发现原来你的微信好友是这样的](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247484710&idx=1&sn=cf17f2e87405ebffb20edd0ca0a7315b&scene=19&token=464856977&lang=zh_CN#wechat_redirect) | [--](https://wistbean.github.io)
7986
> 未完待续...
8087
8188
## 爬虫技巧
82-
8389
- [给你们说几点鲜有人知的爬虫技巧](https://mp.weixin.qq.com/s?__biz=MzU2ODYzNTkwMg==&mid=2247485129&idx=1&sn=56a9aecafa73162c639a873b5bbdf534&chksm=fc8bbe58cbfc374e5c033a37a82b94e8391855d85f1db26975579ddb3cf0882f1157e37f224c&token=2111372640&lang=zh_CN#rd)
8490

8591
## python爬虫段子
@@ -96,7 +102,7 @@ peace.
96102
微信搜索id:fxxkpython
97103
名称:学习python的正确姿势
98104

99-
![扫一扫关注学习python的正确姿势](https://fxxkpython.com/images/wxgzh.jpeg)
105+
![扫一扫关注学习python的正确姿势](https://wistbean.github.io/images/python/J2icnQspGlaJsODs2ibc1aSu5WoajHE4dItZQuTC20wibncMCIHG3X3iajk6ZLeF3yPb6BdHtuhrjICS26d1cEHTNg/640)
100106

101107
## 通往Python高手之路
102108
小帅b手把手带你:[通往Python高手之路](http://vip.fxxkpython.com/?page_id=18)

stackoverflow/.idea/misc.xml

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

stackoverflow/.idea/stackoverflow.iml

Lines changed: 4 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

stackoverflow/.idea/vcs.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

stackoverflow/.idea/workspace.xml

Lines changed: 39 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

stackoverflow/stackoverflow/pipelines.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
class StackoverflowPipeline(object):
1010
def __init__(self):
11-
self.connection = pymongo.MongoClient('127.0.0.1', 27017)
11+
self.connection = pymongo.MongoClient('68.183.180.71', 27017)
1212
self.db = self.connection.scrapy
1313
self.collection = self.db.stackoverflow
1414

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
pymongo==3.9.0
2+
redis==3.3.11
3+
Scrapy==1.7.4
4+
scrapy-redis==0.6.8
5+
lxml==4.4.1
6+
parsel==1.5.2
7+
8+
9+

stackoverflow/stackoverflow/settings.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
# Configure a delay for requests for the same website (default: 0)
2828
# See https://doc.scrapy.org/en/latest/topics/settings.html#download-delay
2929
# See also autothrottle settings and docs
30-
#DOWNLOAD_DELAY = 3
30+
DOWNLOAD_DELAY = 1
3131
# The download delay setting will honor only one of:
3232
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
3333
#CONCURRENT_REQUESTS_PER_IP = 16
@@ -88,3 +88,18 @@
8888
#HTTPCACHE_DIR = 'httpcache'
8989
#HTTPCACHE_IGNORE_HTTP_CODES = []
9090
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
91+
92+
93+
# 调度器改为 scrapy_redis
94+
SCHEDULER = 'scrapy_redis.scheduler.Scheduler'
95+
# redis 去重
96+
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
97+
# redis服务器地址
98+
REDIS_HOST = '68.183.180.0'
99+
REDIS_PORT = 6379
100+
101+
102+
103+
104+
105+
Binary file not shown.

stackoverflow/venv/bin/activate

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# This file must be used with "source bin/activate" *from bash*
2+
# you cannot run it directly
3+
4+
deactivate () {
5+
# reset old environment variables
6+
if [ -n "${_OLD_VIRTUAL_PATH:-}" ] ; then
7+
PATH="${_OLD_VIRTUAL_PATH:-}"
8+
export PATH
9+
unset _OLD_VIRTUAL_PATH
10+
fi
11+
if [ -n "${_OLD_VIRTUAL_PYTHONHOME:-}" ] ; then
12+
PYTHONHOME="${_OLD_VIRTUAL_PYTHONHOME:-}"
13+
export PYTHONHOME
14+
unset _OLD_VIRTUAL_PYTHONHOME
15+
fi
16+
17+
# This should detect bash and zsh, which have a hash command that must
18+
# be called to get it to forget past commands. Without forgetting
19+
# past commands the $PATH changes we made may not be respected
20+
if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
21+
hash -r
22+
fi
23+
24+
if [ -n "${_OLD_VIRTUAL_PS1:-}" ] ; then
25+
PS1="${_OLD_VIRTUAL_PS1:-}"
26+
export PS1
27+
unset _OLD_VIRTUAL_PS1
28+
fi
29+
30+
unset VIRTUAL_ENV
31+
if [ ! "$1" = "nondestructive" ] ; then
32+
# Self destruct!
33+
unset -f deactivate
34+
fi
35+
}
36+
37+
# unset irrelevant variables
38+
deactivate nondestructive
39+
40+
VIRTUAL_ENV="/home/wistbean/githubproject/learn_python3_spider/stackoverflow/venv"
41+
export VIRTUAL_ENV
42+
43+
_OLD_VIRTUAL_PATH="$PATH"
44+
PATH="$VIRTUAL_ENV/bin:$PATH"
45+
export PATH
46+
47+
# unset PYTHONHOME if set
48+
# this will fail if PYTHONHOME is set to the empty string (which is bad anyway)
49+
# could use `if (set -u; : $PYTHONHOME) ;` in bash
50+
if [ -n "${PYTHONHOME:-}" ] ; then
51+
_OLD_VIRTUAL_PYTHONHOME="${PYTHONHOME:-}"
52+
unset PYTHONHOME
53+
fi
54+
55+
if [ -z "${VIRTUAL_ENV_DISABLE_PROMPT:-}" ] ; then
56+
_OLD_VIRTUAL_PS1="${PS1:-}"
57+
if [ "x(venv) " != x ] ; then
58+
PS1="(venv) ${PS1:-}"
59+
else
60+
if [ "`basename \"$VIRTUAL_ENV\"`" = "__" ] ; then
61+
# special case for Aspen magic directories
62+
# see http://www.zetadev.com/software/aspen/
63+
PS1="[`basename \`dirname \"$VIRTUAL_ENV\"\``] $PS1"
64+
else
65+
PS1="(`basename \"$VIRTUAL_ENV\"`)$PS1"
66+
fi
67+
fi
68+
export PS1
69+
fi
70+
71+
# This should detect bash and zsh, which have a hash command that must
72+
# be called to get it to forget past commands. Without forgetting
73+
# past commands the $PATH changes we made may not be respected
74+
if [ -n "${BASH:-}" -o -n "${ZSH_VERSION:-}" ] ; then
75+
hash -r
76+
fi

stackoverflow/venv/bin/activate.csh

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# This file must be used with "source bin/activate.csh" *from csh*.
2+
# You cannot run it directly.
3+
# Created by Davide Di Blasi <[email protected]>.
4+
# Ported to Python 3.3 venv by Andrew Svetlov <[email protected]>
5+
6+
alias deactivate 'test $?_OLD_VIRTUAL_PATH != 0 && setenv PATH "$_OLD_VIRTUAL_PATH" && unset _OLD_VIRTUAL_PATH; rehash; test $?_OLD_VIRTUAL_PROMPT != 0 && set prompt="$_OLD_VIRTUAL_PROMPT" && unset _OLD_VIRTUAL_PROMPT; unsetenv VIRTUAL_ENV; test "\!:*" != "nondestructive" && unalias deactivate'
7+
8+
# Unset irrelevant variables.
9+
deactivate nondestructive
10+
11+
setenv VIRTUAL_ENV "/home/wistbean/githubproject/learn_python3_spider/stackoverflow/venv"
12+
13+
set _OLD_VIRTUAL_PATH="$PATH"
14+
setenv PATH "$VIRTUAL_ENV/bin:$PATH"
15+
16+
17+
set _OLD_VIRTUAL_PROMPT="$prompt"
18+
19+
if (! "$?VIRTUAL_ENV_DISABLE_PROMPT") then
20+
if ("venv" != "") then
21+
set env_name = "venv"
22+
else
23+
if (`basename "VIRTUAL_ENV"` == "__") then
24+
# special case for Aspen magic directories
25+
# see http://www.zetadev.com/software/aspen/
26+
set env_name = `basename \`dirname "$VIRTUAL_ENV"\``
27+
else
28+
set env_name = `basename "$VIRTUAL_ENV"`
29+
endif
30+
endif
31+
set prompt = "[$env_name] $prompt"
32+
unset env_name
33+
endif
34+
35+
alias pydoc python -m pydoc
36+
37+
rehash

0 commit comments

Comments
 (0)