本文来源吾爱破解论坛
最近接了个义务单,是关于赶集网的二手房爬虫。 赶集网.png (214.49 KB, 下载次数: 3)
下载附件
保存到相册
github:https://github.com/CcphAmy/pythonDemo
[Python] 纯文本查看 复制代码
import requests import os from bs4 import BeautifulSoup class GanJi(): """docstring for GanJi""" def __init__(self): super(GanJi, self).__init__() def get(self,url): user_agent = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36' headers = {'User-Agent':user_agent} webData = requests.get(url + 'o1',headers=headers).text soup = BeautifulSoup(webData,'lxml') sum = soup.find('span',class_="num").text.replace("套","") ave = int(sum) / 32 forNum = int(ave) if forNum < ave: forNum = forNum + 1 for x in range(forNum): webData = requests.get(url + 'o' + str(x + 1),headers=headers).text soup = BeautifulSoup(webData,'lxml') find_list = soup.find('div',class_="f-main-list").find_all('div',class_="f-list-item ershoufang-list") for dl in find_list: print(dl.find('a',class_="js-title value title-font").text,end='|') # 名称 # 中间 5 个信息 tempDD = dl.find('dd',class_="dd-item size").find_all('span') for tempSpan in tempDD: if not tempSpan.text == '' : print(tempSpan.text.replace("\n", ""),end='|') print(dl.find('span',class_="area").text.replace(" ","").replace("\n",""),end='|') # 地址 print(dl.find('div',class_="price").text.replace(" ","").replace("\n",""),end='|') # 价钱 print(dl.find('div',class_="time").text.replace(" ","").replace("\n",""),end="|") # 平均 print("http://chaozhou.ganji.com" + dl['href'],end="|") # 地址 print(str(x + 1)) if __name__ == '__main__': temp = GanJi() temp.get("http://chaozhou.ganji.com/fang5/xiangqiao/")
版权声明:
本站所有资源均为站长或网友整理自互联网或站长购买自互联网,站长无法分辨资源版权出自何处,所以不承担任何版权以及其他问题带来的法律责任,如有侵权或者其他问题请联系站长删除!站长QQ754403226 谢谢。
- 上一篇: 换个框架爬妹子图mzitu解决直接访问的403
- 下一篇: python实现U盘备份脚本