首页 编程教程正文

百度文库爬虫批量下载

piaodoo 编程教程 2020-02-22 22:06:54 916 0 python教程

本文来源吾爱破解论坛

本帖最后由 hksnow 于 2019-8-15 13:13 编辑

前言

2019-08-15_130226.png (200.32 KB, 下载次数: 1)

下载附件  保存到相册

2019-8-15 13:04 上传


需要对照练习册答案,百度找到了答案,想要下载下来。
代码
[Python] 纯文本查看 复制代码
import requests
import json
import os
#from concurrent import futures
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'}
# https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=1&rn=5&answer_id=fa3ab7c30c22590102029d3f&sign=meow
def download_file(file_url,download_path):
    global file_num
    html = requests.get(file_url,headers = headers)
    file_name = download_path + '\\' + str(file_num) + '.jpg'
    with open(file_name,'wb') as code:
        code.write(html.content)
    file_num = file_num + 1
def start(link):
    global thread
    # 第一次提交为了获取基本信息
    answer_id = link.split('/')[-1]
    url = 'https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=1&rn=5&answer_id=' + answer_id + '&sign=meow'
    html = requests.get(url,headers = headers)
    json_data = json.loads(html.text)
    img_totals = json_data['data']['answer_info']['pages']
    title = json_data['data']['answer_info']['title']
    page_nums = int(img_totals)//5
    last_page_img_totals = int(img_totals)%5
    # 第二次获取图片数据
    path = os.getcwd() + '\\' + title
    os.makedirs(path)
    for n in range(0,page_nums + 1):
        url = 'https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=' + str(n) + '&rn=5&answer_id=' + answer_id + '&sign=meow'
        html = requests.get(url,headers = headers)
        json_data = json.loads(html.text)
        answer_urls_list = json_data['data']['answer_urls']
        #print(answer_urls_list)
        if (n == page_nums):
            num_lists = range(0,last_page_img_totals)
            print('最后一页了!')
        else:
            num_lists = range(0,5)
        for x in num_lists:
            img_url = answer_urls_list[x]
            #print(img_url)
            #thread.submit()
            download_file(img_url,path)
if __name__ == "__main__":
    #thread = futures.ThreadPoolExecutor(max_workers = 5)
    file_num = 1
    url = '[/color][/color][/b][/size][size=6][b][color=Red][color=Black]'
    start(url)



仅支持https://wk.baidu.com/bigque/book/xxxxxxxxxx22590102029d3f这样类似的链接


2019-08-15_131153.png (130.14 KB, 下载次数: 0)

下载附件  保存到相册

2019-8-15 13:12 上传

版权声明:

本站所有资源均为站长或网友整理自互联网或站长购买自互联网,站长无法分辨资源版权出自何处,所以不承担任何版权以及其他问题带来的法律责任,如有侵权或者其他问题请联系站长删除!站长QQ754403226 谢谢。

有关影视版权:本站只供百度云网盘资源,版权均属于影片公司所有,请在下载后24小时删除,切勿用于商业用途。本站所有资源信息均从互联网搜索而来,本站不对显示的内容承担责任,如您认为本站页面信息侵犯了您的权益,请附上版权证明邮件告知【754403226@qq.com】,在收到邮件后72小时内删除。本文链接:https://www.piaodoo.com/7570.html

搜索