1、DataFrame
既有行索引(index)又有列索引(columns)的二维数组
stock=[f'股票{i}' for i in range(10)]
date=pd.date_range(start='20211201',priods=5,freq='B')
data1=pd.DataFrame(stock_change,index=stock,columns=date)
data2=pd.DataFrame({'month':[1,2,3,45,5],
'year':[1,2,3,4,5],
'sale':[20,30,50,40,80]})
data.head()
data.tail()
data.columns
data.index
data.values
data.shape
data.dtype
stock_=[f'股票_{i}' for i in range(10)]
data.index=stock_
data.reset_index(drop=Flase)
data.set_index('股票0',drop=True)
2、series
带行索引的一维数组
data1=pd.Series({'red':10,'yellow':20,'black':40})
data2=pd.Series([1,2,3,4,5],index=['r','y','b','k','c'])
3、基本数据操作
-
索引操作 data['columns名']
data['columns名']['index名']
data.iloc[0][0]
data.loc['index名']['columns名']
-
赋值操作
-
排序操作 data.sort_index(ascending=False)
data.sort_values(by='行列名',ascending=True)
4、DataFrame 运算
-
算数运算 和numpy中 array与数运算类似 data['股票0']+1
data['股票1']*3
-
逻辑运算
data['股票0']>0
(data['股票0']>0)&(data['股票1']>0)
data['股票0'].isin([3,2,1])
data.query('股票0>0&股票1<0')
-
统计运算
data.decribe()
data.idxmax(axis=0)
data['股票0'].cumsum()
-
自定义运算
data.apply(lambda x: x.max()-x.min())
5、pandas绘图
data.plot(x="volume",y='turnover',kind='scatter')
sr.plot(kind='line')
6、文本的读取与存储
-
csv
data=pd.read_csv('./1.csv',usecols=['列名'])
data=pd.read_csv('./2.csv',names=['列名'])
data.iloc[:,0].to_csv('test.csv',index=False)
data[0:5].to_csv('test1.csv',columns=['列名'],index=False)
data=data.drop(['ma','列名',axis=1])
-
hdf5
data=pd.read_hdf('./1.h5',key='变量名')
data.iloc[,0:5].to_hdf("2.h5",key='变量名')
-
json data=pd.read_json('./1.json',orient='records',lines=True)
data.iloc[:,1:5].to_json(('2.json',orient='records',lines=True))
|