Skip to content

Commit 97fdbdc

Browse files
committed
2 parents bd3af68 + 065f238 commit 97fdbdc

File tree

2 files changed

+54
-0
lines changed

2 files changed

+54
-0
lines changed

moumoubaimifan/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ Python技术 公众号文章代码库
99

1010
## 实例代码
1111

12+
[煎蛋网全站妹子图爬虫](https://github.com/JustDoPython/python-examples/blob/master/moumoubaimifan/jandan.py)
13+
1214
[自动录制 python 脚本的爬虫利器 Playwright!](https://github.com/JustDoPython/python-examples/blob/master/moumoubaimifan/playwright)
1315

1416
[用协程的方式下载英雄联盟的高清皮肤](https://github.com/JustDoPython/python-examples/blob/master/moumoubaimifan/lol)

moumoubaimifan/jandan.py

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
2+
import requests
3+
from bs4 import BeautifulSoup
4+
import time
5+
import random
6+
7+
8+
headers = {
9+
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
10+
}
11+
12+
def get_html(url):
13+
14+
resp = requests.get(url = url, headers = headers)
15+
soup = BeautifulSoup(resp.text)
16+
return soup
17+
18+
def get_next_page(soup):
19+
next_page = soup.find(class_='previous-comment-page')
20+
next_page_href = next_page.get('href')
21+
return f'http:{next_page_href}'
22+
23+
def get_img_url(soup):
24+
a_list = soup.find_all(class_ = 'view_img_link')
25+
urls = []
26+
for a in a_list:
27+
href = 'http:' + a.get('href')
28+
urls.append(href)
29+
return urls
30+
31+
def save_image(urls):
32+
for item in urls:
33+
name = item.split('/')[-1]
34+
resp = requests.get(url=item, headers = headers)
35+
with open('D:/xxoo/' + name, 'wb') as f:
36+
f.write(resp.content)
37+
time.sleep(random.randint(2,5))
38+
39+
if __name__ == "__main__":
40+
url = 'http://jandan.net/girl';
41+
while True:
42+
43+
soup = get_html(url)
44+
urls = get_img_url(soup)
45+
46+
if(len(urls) == 0):
47+
break
48+
49+
save_image(urls)
50+
url = get_next_page(soup)
51+
52+

0 commit comments

Comments
 (0)