构建:
pd.Series构建单列数据框,附带索引,np.nan表示空值
import numpy as np
import pandas as pd
s=pd.Series([1,2,3,np.nan,4,5])
print(s)
结果:
0 1.0
1 2.0
2 3.0
3 NaN
4 4.0
5 5.0
dtype: float64
构建:
pd.date_range构建日期数据框,周期和单位可选,'20201024'表示开始时间,freq=‘D’、'M'、'Y'表示日月年,默认为D
dates=pd.date_range('20201024',periods=6,freq='D')
print(dates)
结果:
DatetimeIndex(['2020-10-24', '2020-10-25', '2020-10-26', '2020-10-27',
'2020-10-28', '2020-10-29'],
dtype='datetime64[ns]', freq='D')
构建:
构建数据框,以及索引和列名,index表示索引,columns表示列名,np.random.randn(6,4)表示六行四列的正态分布矩阵
np.arange(12).reshape((3,4))表示对0-11进行三行四列的排列
df=pd.DataFrame(np.random.randn(6,4),index=dates,columns=['a','b','c','d'])
df1=pd.DataFrame(np.arange(12).reshape((3,4)))
df2=pd.DataFrame({'A':1.,
'B':pd.Timestamp('20130102'),
'C':pd.Series(1,index=list(range(4)),dtype='float32'),
'D':np.array(([3]*4),dtype='int32'),
'E':pd.Categorical(['test','train','test','train']),
'F':'fool'})
print(df)
print(df1)
print(df2)
结果:
a b c d
2020-10-24 -1.064660 -0.228393 -1.897609 0.566754
2020-10-25 0.277156 1.308101 0.292550 -0.786283
2020-10-26 -0.090731 0.890201 -0.169900 -0.068995
2020-10-27 1.411326 0.943571 0.797507 1.352421
2020-10-28 -0.403858 1.016165 -0.784167 -2.197176
2020-10-29 0.809139 0.278650 1.226037 -0.384152
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
A B C D E F
0 1.0 2013-01-02 1.0 3 test fool
1 1.0 2013-01-02 1.0 3 train fool
2 1.0 2013-01-02 1.0 3 test fool
3 1.0 2013-01-02 1.0 3 train fool
?
?
|