本文来源吾爱破解论坛
本帖最后由 天空宫阙 于 2019-12-15 15:54 编辑 01.jpg (74.4 KB, 下载次数: 1)
下载附件
保存到相册
02.jpg (92.72 KB, 下载次数: 1)
下载附件
保存到相册
03.jpg (142.37 KB, 下载次数: 1)
下载附件
保存到相册
之前看到一篇美拍视频js逆向的文章讲的很具体https://www.52pojie.cn/thread-1068403-1-1.html,于是自己也尝试一下。
美拍网页源代码中有# data-video="2301aHR0tcDovL212dmlkZW8xMS5tZWl0dWRhdGEuY29tLzVkZjBlMDYyNjI1NWNIMjY0V0VCMTgxMDA0X0gyNjRfTVA1ZGYwZjMuo3lsddbXA0",其实它经过解析之后就是视频的真实地址
解析视频的方法就写在js中,而且没有混淆,用来练手是极好的,赶紧打开Chrome开发者工具抓一下包。
本来我定位不到解析视频的这段代码,查了一下全局搜索decode就可以,于是我找到了这个js。
看到了src:什么,就知道大概就在附近了,进一步搜索decodeMp4就定位了解析的函数
以上就是解析视频地址的JavaScript代码,没有混淆呢。
进过我亲手扣得到了一下的代码
[JavaScript] 纯文本查看 复制代码function atob(input) {
input = input.replace(/=+$/, '')
if (input.length % 4 == 1) throw INVALID_CHARACTER_ERR;
for (
// initialize result and counters
var bc = 0, bs, buffer, idx = 0, output = '';
// get next character
buffer = input.charAt(idx++);
// character found in table? initialize bit storage and add its ascii value;
~buffer && (bs = bc % 4 ? bs * 64 + buffer : buffer,
// and if not first of each 4 characters,
// convert the first 8 bits to one ascii character
bc++ % 4) ? output += String.fromCharCode(255 & bs >> (-2 * bc & 6)) : 0
) {
// try to find character in table (0-63, not found => -1)
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
buffer = chars.indexOf(buffer);
}
return output;
};
var decodeMp4 = {
getHex: function (str) {
return {
str: str['substring'](4),
hex: str['substring'](0, 4)['split']('').reverse().join('')
};
},
getDec: function (hex) {
var dec = parseInt(hex, 16).toString();
return {
pre: dec['substring'](0, 2)['split'](''),
tail: dec['substring'](2)['split']('')
};
},
substr: function (str, pos) {
var str0 = str['substring'](0, pos[0]);
var str1 = str['substr'](pos[0], pos[1]);
return str0 + str['substring'](pos[0])['replace'](str1, '');
},
getPos: function (str, pos) {
pos[0] = str.length - pos[0] - pos[1];
return pos;
},
decode: function (str) {
var result0 = this.getHex(str);
var dec = this.getDec(result0.hex);
var result1 = this['substr'](result0.str, dec.pre);
return atob(this['substr'](result1, this.getPos(result1, dec.tail)));
}
};
var origin_str = '0c91Ly9tdnMLmfNZpZGVvMTAubWVpdHVkYXRhLmNvbS81ZGJkNTE0OWU0YTc2N2dnMXNmbmltMTIwMl9IMjY0X01QNWRiZDUAKzLm1wNA==';
function decodeVideo(str){
return decodeMp4.decode(str);
}
// console.log(decodeVideo(origin_str));
// console.log(decodeMp4.decode(origin_str));
然后再用js2py这个python库就可以运行js代码了
当然也可以完全逆向成python的代码,大概看了一下这段js解析代码的处理方式是根据前面四位数字处理余下的字符串中间切掉一些干扰的字符串,最后base64解密就是视频的真实地址了。
有了解析data-video这串加密的url的方法后剩下的就简单了。
以下是爬取一个用户所有发布的美拍视频的python代码
[Python] 纯文本查看 复制代码import requests
from bs4 import BeautifulSoup
import re
import js2py
import time
# 用python运行js代码进行解密
def get_video_url(str):
context = js2py.EvalJs()
with open("meipai.js", "r", encoding="utf-8") as f:
context.execute(f.read())
result = context.decodeVideo(str)
if not result[:4] =='http':
result = 'http:'+ result
return result
headers= {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36'
}
def get_one_page(url):
data_video_list = []
response = requests.get(url,headers=headers)
if response.status_code ==200:
soup = BeautifulSoup(response.text,'lxml')
mediasList = soup.select('#mediasList')[0]
medias = mediasList.select('li.feed-item.pr')
for media in medias:
title = media.select('.detail-cover-title.break')
description = media.select('.feed-description.break')
if not description ==[]:
description = description[0].get_text().strip()
# print(description)
if title ==[]:
if description:
title = description
else:
title = '无标题'
else:
title = title[0].string.strip()
matched = re.search('data-video=\"(.*?)\">',str(media),re.S)
if matched:
# data-video="2301aHR0tcDovL212dmlkZW8xMS5tZWl0dWRhdGEuY29tLzVkZjBlMDYyNjI1NWNIMjY0V0VCMTgxMDA0X0gyNjRfTVA1ZGYwZjMuo3lsddbXA0">
# return data-video
video_info = {
'title':title,
'origin':matched.group(1)
}
# data_video_list.append(matched.group(1))
data_video_list.append(video_info)
return data_video_list
def get_total_page(url):
response = requests.get(url,headers=headers)
if response.status_code ==200:
soup = BeautifulSoup(response.text,'lxml')
num_video = soup.select('#rightUser > div.user-num > a.user-num-item.user-hv.first.dbl > span.user-txt.pa')[0].string
total_page = int(int(num_video) / 24)+1
return total_page
# print(total_page)
if __name__ == "__main__":
# base_url = 'https://www.meipai.com/user/改成相应的id就可以采集其他用户的全部视频了'
# base_url = 'https://www.meipai.com/user/35498115'
base_url = 'https://www.meipai.com/user/30427636'
total_page = get_total_page(base_url)
print(total_page)
for i in range(1,total_page+1):
time.sleep(0.5)
url = base_url + '?p={}'.format(str(i))
data_video_list = get_one_page(url)
for each_video in data_video_list:
video_url = get_video_url(each_video['origin'])
print(each_video['title'],video_url)
视频的下载地址都有了,下载部分的代码就自己写吧,大不了下载一个sleep一会儿总可以下载完的。
运行的效果:
还把视频的标题或者描述采集下来了呢,贴心吧
代码放这里了,如果对你有帮助,免费评一下分
https://www.lanzous.com/i815a2b
版权声明:
本站所有资源均为站长或网友整理自互联网或站长购买自互联网,站长无法分辨资源版权出自何处,所以不承担任何版权以及其他问题带来的法律责任,如有侵权或者其他问题请联系站长删除!站长QQ754403226 谢谢。