本文来源吾爱破解论坛
本帖最后由 ljl764439816 于 2019-9-4 13:40 编辑
需要pandas支持,处理CSV文件
转前后文件均为csv文件
转换前示例如下:
after_ip,begain_port,end_port,before_ip // 转换前IP,起始端口,结束端口,转换后IP
3074337170,40960,41215,2111136891
转换后示例如下:
after_ip,begain_port,end_port,before_ip
['183.62.169.146'],40960,41215,['125.213.100.123']
同时转换后的['及去没有想到办法去掉。
期望输出(可以使用UE进行二次处理,去除符号):
183.62.169.146,40960,41215,125.213.100.123
[Python] 纯文本查看 复制代码
import sys import csv import string import pandas as pd import numpy as np def decode_after_ip(): #计算转换前IP为IPv4 after_ip_list = [] for ip in after_ip: floor_list = [] yushu=ip for i in reversed(range(4)): res=divmod(yushu,256**i) floor_list.append(str(res[0])) yushu=res[1] decode_ip = ('.'.join(floor_list)) result_list = decode_ip.split() after_ip_list.append(result_list) return after_ip_list def decode_before_ip(): #计算转换后IP为IPv4 before_ip_list = [] for ip in before_ip: floor_list = [] yushu=ip for i in reversed(range(4)): res=divmod(yushu,256**i) floor_list.append(str(res[0])) yushu=res[1] decode_ip = ('.'.join(floor_list)) result_list = decode_ip.split() before_ip_list.append(result_list) return before_ip_list def decode_begin_port(): #转换起始端口为list begin_port_list = [] for port in begain_port: begin_port_list.append(str(port)) return begin_port_list def decode_end_port(): #转换结束端口为list end_port_list = [] for port in end_port: end_port_list.append(str(port)) return end_port_list csv_reader = pd.read_csv('explongip.csv', usecols=['after_ip', 'begain_port', 'end_port', 'before_ip'],nrows = 200,chunksize=100) #nrow读取文件行数 chunksize=每次读取行数,此行为调试用 #csv_reader = pd.read_csv('explongip.csv', usecols=['after_ip', 'begain_port', 'end_port', 'before_ip'],chunksize=1000000) #实际处理文件测试每次读取100W行数据 csv_file = pd.DataFrame() count = 0 #循环计数确认循环次数 for chunk in csv_reader: #分块读取 csv_file = csv_file.append(chunk,ignore_index=True) # 从DataFrame取出数据 after_ip = csv_file.loc[:, 'after_ip'] begain_port = csv_file.loc[:, 'begain_port'] end_port = csv_file.loc[:, 'end_port'] before_ip = csv_file.loc[:, 'before_ip'] # 将数据转换为字典并转换为DataFrame csv = {'after_ip': decode_after_ip(),'begain_port': begain_port, 'end_port': end_port,'before_ip': decode_before_ip()} exp_file = pd.DataFrame(csv) # 追加输出csv exp_file.to_csv("decode_ip.csv", mode='a', index=False) count=count + 1 print("The loop count:",count) csv_file = pd.DataFrame() #每次循环后将csv_file置为空,否则处理大量数据时会出现Memoryerror print(csv_file) #输出检查是否成功置为空值
版权声明:
本站所有资源均为站长或网友整理自互联网或站长购买自互联网,站长无法分辨资源版权出自何处,所以不承担任何版权以及其他问题带来的法律责任,如有侵权或者其他问题请联系站长删除!站长QQ754403226 谢谢。