本文来源吾爱破解论坛
首先,膜拜论坛里其他爬虫大佬,妹子图有人前面发过,一个是Beautiful爬的,一个是多线程,但是我保证,我没有看着他们的写,而是自己独立写的简单的单线程爬取,有的爱友可能没有python环境,我就顺便打包了一下,嘻嘻。大佬勿喷,小白一起交流。
exe下载地址:https://www.lanzous.com/i57m9wf
我不知道行不行,因为我电脑测试成功了,不知道没有环境的行不行,可以试试,激起大家学爬虫的热情。
[Python] 纯文本查看 复制代码
import requests import os from lxml import etree class Spider(object): def headers(self): head={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0', 'Referer':'https://www.mzitu.com/tag/youhuo/' } self.first_request(head) def first_request(self,head): url = 'http://www.mzitu.com' response = requests.get(url,headers=head) # print(response.content.decode()) html = etree.HTML(response.content.decode('utf-8')) Bigtit_list = html.xpath('//ul[@id="pins"]/li/a/img/@alt') Bigsrc_list = html.xpath('//ul[@id="pins"]/li/a/@href') # print(Bigtit_list,Bigsrc_list) for Bigtit,Bigsrc in zip(Bigtit_list,Bigsrc_list): if os.path.exists(Bigtit) == False: os.mkdir(Bigtit) print(Bigsrc) self.second_request(Bigtit,Bigsrc,head) def second_request(self,Bigtit,Bigsrc,head): for i in range(1,15): response = requests.get(Bigsrc+'/'+str(i),headers=head) html = etree.HTML(response.content.decode()) img_name = html.xpath('//div[@class="main-image"]/p/a/img/@alt') img_link =html.xpath('//div[@class="main-image"]/p/a/img/@src') for name,link in zip(img_name,img_link): try: rst = requests.get(link,headers=head) img = rst.content print(link) file_name = Bigtit +'\\'+link.split('/')[-1] print('正在下载的图片为:',name) with open(file_name,'wb') as f: f.write(img) except Exception as err: print(err) spyder=Spider() spyder.headers()
下面这个源码是壁纸88的,昨天一个爱友问我能不能爬爬试试,我试了试,只能爬取网站面向大众的图片,后台原画质的图片,我抓包到原画质图片的下载地址,可是毕竟还是有点太小白了,不会分析后面的特征码从哪里来,很抱歉。
[Python] 纯文本查看 复制代码
import requests import re from lxml import etree class Spyder(object): def first_url(self,page): for i in range(1,page): url = 'http://www.bizhi88.com/s/122/'+str(i)+'.html' response = requests.get(url) html = response.content.decode() mid_tit_list = re.compile('<a class="title" href=".*?" target="_blank" title=".*?">(.*?)</a>').findall(html) mid_url_list = re.compile('<a class="title" href="(.*?)" target="_blank" title=".*?">.*?</a>').findall(html) for mid_tit,mid_url in zip(mid_tit_list,mid_url_list): self.get_url(mid_tit,mid_url) def get_url(self,mid_tit,mid_url): url = 'http://www.bizhi88.com'+mid_url response = requests.get(url) html = etree.HTML(response.content.decode()) new_url = html.xpath('//div[@class="layout wp-con"]/div/img/@src') # print(new_url) self.data_save(new_url,mid_tit) def data_save(self,new_url,mid_tit): response = requests.get(new_url[0]) data = response.content print('正在下载的图片名字是:',mid_tit) with open(mid_tit+'.jpg','wb') as f: f.write(data) spyder = Spyder() page = int(input('请输入要下载而页数:')) spyder.first_url(page)
QQ截图20190729113039.png (65.11 KB, 下载次数: 1)
下载附件 保存到相册
2019-7-29 11:30 上传
妹纸图
QQ截图20190729113500.png (108.75 KB, 下载次数: 2)
下载附件 保存到相册
2019-7-29 11:35 上传
壁纸88
版权声明:
本站所有资源均为站长或网友整理自互联网或站长购买自互联网,站长无法分辨资源版权出自何处,所以不承担任何版权以及其他问题带来的法律责任,如有侵权或者其他问题请联系站长删除!站长QQ754403226 谢谢。