基础知识
处理数据一般分为几个阶段:数据整理与清洗、数据分析与建模、数据可视化与制表,Pandas与npArrays 是处理数据的理想工具
import pandas as pd
import numpy as np
数据结构
名称 | 描述 |
---|
Series | 带标签的一维同构数组 | DataFrame | pandas常用,带标签的,大小可变的,二维异构表格 | numpy.ndarray | npArrays常用,带标签的,大小可变的,二维异构表格,下文简称npArrays |
数据格式转换
csv转DataFrame
df = pd.read_csv(path,header=None, encoding="gbk")
DataFrame转csv
df_name.to_csv(path,sep=',',index=False)
list转DataFrame
DataFrame['表头名'] = List
c={"列名1": ls_1,"列名2": ls_2}
df = pd.DataFrame(c,columns=['列名1', '列名2'])
DataFrame转list
首先使用np.array()函数把DataFrame转化为np.ndarray(),再利用tolist()函数把np.ndarray()转为list
data = np.array(df).tolist()
data=df.as_matrix(['表头名','表头名']).tolist()
data=df.values.tolist()
Series转DataFrame
dict_name = {'表头名': Series.values}
df = pd.DataFrame(dict_name)
npArrays转DataFrame
df = pd.DataFrame(npArrays,columns=['表头名'],index=None)
DataFrame转npArrays
df = pd.read_csv("data.csv")
data_np = np.array(df)
list转npArrays
npArrays = np.array(List)
npArrays转list
List = npArrays.tolist()
处理数据
提取df数据
column_headers = list(df.columns.values)
data = df.values
处理NAN值(空缺或无穷小浮点数)
df.dropna(inplace=True)
df.fillna(x,inplace = True)
print(np.isnan(df).any())
DataFrame上下拼接
df = pd.concat([df_1, df_2])
删除一列
df_del = df.drop("表头名", 1)
npArrays转置
npArrays_T = npArrays.transpose()
保存数据
np.savetxt("文件名.csv", npArrays,delimiter=',', fmt='%s')
df.to_csv(path,sep=',',index=False)
Tips:会不断更新完善
|