import numpy as np
import pandas as pd
对象创建
- Series通过传递值列表来创建a,让pandas创建一个默认整数索引
s = pd.Series([1,3,5,np.nan,6,8])
s
0 1.0
1 3.0
2 5.0
3 NaN
4 6.0
5 8.0
dtype: float64
- DataFrame通过传递一个Numpy数组、一个日期时间索引和标签列来创建一个:
dates = pd.date_range("2022-06-07",periods=6)
dates
DatetimeIndex(['2022-06-07', '2022-06-08', '2022-06-09', '2022-06-10',
'2022-06-11', '2022-06-12'],
dtype='datetime64[ns]', freq='D')
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list("ABCD"))
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
- DataFrame通过传递可以转换为类似系列结构的对象字典来创建a:
df2 = pd.DataFrame(
{
"A":1.0,
"B":pd.Timestamp("20130102"),
"C":pd.Series(1,index=list(range(4)),dtype="float32"),
"D":np.array([3]*4,dtype="int32"),
"E":pd.Categorical(["test","train","test","train"]),
"F":"foo",
})
df2
| A | B | C | D | E | F |
---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
df3 = pd.DataFrame(
{
"A":1.0,
"B":pd.Timestamp("20130102"),
"C":pd.Series(1,index=list(range(5)),dtype="float32"),
"D":np.array([3]*5,dtype="int32"),
"E":pd.Categorical(["test","train","test","train","a"]),
"F":"foo",
})
df3
| A | B | C | D | E | F |
---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
4 | 1.0 | 2013-01-02 | 1.0 | 3 | a | foo |
---|
df2
| A | B | C | D | E | F |
---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
df2.dtypes
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
- 如果您使用的是IPython,则会自动启用列名(以及公共属性)的制表符(Tab键)补全。
df2.A
0 1.0
1 1.0
2 1.0
3 1.0
Name: A, dtype: float64
df2.abs
<bound method NDFrame.abs of A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo>
df2.add
<bound method flex_arith_method_FRAME.<locals>.f of A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo>
df2.all
<bound method NDFrame._add_numeric_operations.<locals>.all of A B C D E F
0 1.0 2013-01-02 1.0 3 test foo
1 1.0 2013-01-02 1.0 3 train foo
2 1.0 2013-01-02 1.0 3 test foo
3 1.0 2013-01-02 1.0 3 train foo>
查看数据
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.head()
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
df.head(3)
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
df.tail()
| A | B | C | D |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.tail(3)
| A | B | C | D |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.index
DatetimeIndex(['2022-06-07', '2022-06-08', '2022-06-09', '2022-06-10',
'2022-06-11', '2022-06-12'],
dtype='datetime64[ns]', freq='D')
df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
- 注意:DataFrame.to_numpy()给出底层数据的NumPy表示,请注意,当您的DataFrame列具有不同的数据类型时,这可能是一项昂贵的操作,这归结为pandas和NumPy之间的根本区别:NumPy数组对整个数组有一个dtype,而pandas
DataFrames每列有一个dtype.当您调用时 DataFrame.to_numpy(),pandas会找到可以容纳DataFrame中所有dtype的 NumPy dtype。这最终可能是object,这需要将每个值转换为Python对象。 - 对于df,我们DataFrame的所有浮点值,DataFrame.to_numpy()速度很快并且不需要复制数据:
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.dtypes
A float64
B float64
C float64
D float64
dtype: object
df.to_numpy()
array([[ 0.40526325, 0.46532668, 0.07694617, -0.3115456 ],
[ 0.06912909, 0.9769407 , -0.28743027, 1.08426954],
[-0.20022708, 1.17280586, 1.34307017, 0.56144631],
[-0.34616439, -1.60996101, 1.18171013, 0.04600243],
[-1.83349661, -0.26301183, 0.36815984, 0.16598165],
[-0.61690579, 0.95554251, -0.60358546, 0.89023561]])
df2
| A | B | C | D | E | F |
---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
df2.dtypes
A float64
B datetime64[ns]
C float32
D int32
E category
F object
dtype: object
df2.to_numpy()
array([[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo'],
[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'test', 'foo'],
[1.0, Timestamp('2013-01-02 00:00:00'), 1.0, 3, 'train', 'foo']],
dtype=object)
df.index
DatetimeIndex(['2022-06-07', '2022-06-08', '2022-06-09', '2022-06-10',
'2022-06-11', '2022-06-12'],
dtype='datetime64[ns]', freq='D')
df2.index
Int64Index([0, 1, 2, 3], dtype='int64')
df.describe()
| A | B | C | D |
---|
count | 6.000000 | 6.000000 | 6.000000 | 6.000000 |
---|
mean | -0.420400 | 0.282940 | 0.346478 | 0.406065 |
---|
std | 0.775990 | 1.062101 | 0.783376 | 0.533062 |
---|
min | -1.833497 | -1.609961 | -0.603585 | -0.311546 |
---|
25% | -0.549220 | -0.080927 | -0.196336 | 0.075997 |
---|
50% | -0.273196 | 0.710435 | 0.222553 | 0.363714 |
---|
75% | 0.001790 | 0.971591 | 0.978323 | 0.808038 |
---|
max | 0.405263 | 1.172806 | 1.343070 | 1.084270 |
---|
df.T
| 2022-06-07 | 2022-06-08 | 2022-06-09 | 2022-06-10 | 2022-06-11 | 2022-06-12 |
---|
A | 0.405263 | 0.069129 | -0.200227 | -0.346164 | -1.833497 | -0.616906 |
---|
B | 0.465327 | 0.976941 | 1.172806 | -1.609961 | -0.263012 | 0.955543 |
---|
C | 0.076946 | -0.287430 | 1.343070 | 1.181710 | 0.368160 | -0.603585 |
---|
D | -0.311546 | 1.084270 | 0.561446 | 0.046002 | 0.165982 | 0.890236 |
---|
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df2
| A | B | C | D | E | F |
---|
0 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
1 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
2 | 1.0 | 2013-01-02 | 1.0 | 3 | test | foo |
---|
3 | 1.0 | 2013-01-02 | 1.0 | 3 | train | foo |
---|
df2.columns
Index(['A', 'B', 'C', 'D', 'E', 'F'], dtype='object')
df2.describe()
| A | C | D |
---|
count | 4.0 | 4.0 | 4.0 |
---|
mean | 1.0 | 1.0 | 3.0 |
---|
std | 0.0 | 0.0 | 0.0 |
---|
min | 1.0 | 1.0 | 3.0 |
---|
25% | 1.0 | 1.0 | 3.0 |
---|
50% | 1.0 | 1.0 | 3.0 |
---|
75% | 1.0 | 1.0 | 3.0 |
---|
max | 1.0 | 1.0 | 3.0 |
---|
df2.T
| 0 | 1 | 2 | 3 |
---|
A | 1.0 | 1.0 | 1.0 | 1.0 |
---|
B | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 | 2013-01-02 00:00:00 |
---|
C | 1.0 | 1.0 | 1.0 | 1.0 |
---|
D | 3 | 3 | 3 | 3 |
---|
E | test | train | test | train |
---|
F | foo | foo | foo | foo |
---|
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.T
| 2022-06-07 | 2022-06-08 | 2022-06-09 | 2022-06-10 | 2022-06-11 | 2022-06-12 |
---|
A | 0.405263 | 0.069129 | -0.200227 | -0.346164 | -1.833497 | -0.616906 |
---|
B | 0.465327 | 0.976941 | 1.172806 | -1.609961 | -0.263012 | 0.955543 |
---|
C | 0.076946 | -0.287430 | 1.343070 | 1.181710 | 0.368160 | -0.603585 |
---|
D | -0.311546 | 1.084270 | 0.561446 | 0.046002 | 0.165982 | 0.890236 |
---|
df.sort_index(axis=1,ascending=False)
| D | C | B | A |
---|
2022-06-07 | -0.311546 | 0.076946 | 0.465327 | 0.405263 |
---|
2022-06-08 | 1.084270 | -0.287430 | 0.976941 | 0.069129 |
---|
2022-06-09 | 0.561446 | 1.343070 | 1.172806 | -0.200227 |
---|
2022-06-10 | 0.046002 | 1.181710 | -1.609961 | -0.346164 |
---|
2022-06-11 | 0.165982 | 0.368160 | -0.263012 | -1.833497 |
---|
2022-06-12 | 0.890236 | -0.603585 | 0.955543 | -0.616906 |
---|
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.sort_index(axis=0,ascending=False)
| A | B | C | D |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
df.sort_index(axis=1,ascending=False)
| D | C | B | A |
---|
2022-06-07 | -0.311546 | 0.076946 | 0.465327 | 0.405263 |
---|
2022-06-08 | 1.084270 | -0.287430 | 0.976941 | 0.069129 |
---|
2022-06-09 | 0.561446 | 1.343070 | 1.172806 | -0.200227 |
---|
2022-06-10 | 0.046002 | 1.181710 | -1.609961 | -0.346164 |
---|
2022-06-11 | 0.165982 | 0.368160 | -0.263012 | -1.833497 |
---|
2022-06-12 | 0.890236 | -0.603585 | 0.955543 | -0.616906 |
---|
df.sort_index(axis=0,ascending=False)
| A | B | C | D |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
df.sort_index(axis=0,ascending=True)
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.T
| 2022-06-07 | 2022-06-08 | 2022-06-09 | 2022-06-10 | 2022-06-11 | 2022-06-12 |
---|
A | 0.405263 | 0.069129 | -0.200227 | -0.346164 | -1.833497 | -0.616906 |
---|
B | 0.465327 | 0.976941 | 1.172806 | -1.609961 | -0.263012 | 0.955543 |
---|
C | 0.076946 | -0.287430 | 1.343070 | 1.181710 | 0.368160 | -0.603585 |
---|
D | -0.311546 | 1.084270 | 0.561446 | 0.046002 | 0.165982 | 0.890236 |
---|
df.T.sort_index(axis=1,ascending=False)
| 2022-06-12 | 2022-06-11 | 2022-06-10 | 2022-06-09 | 2022-06-08 | 2022-06-07 |
---|
A | -0.616906 | -1.833497 | -0.346164 | -0.200227 | 0.069129 | 0.405263 |
---|
B | 0.955543 | -0.263012 | -1.609961 | 1.172806 | 0.976941 | 0.465327 |
---|
C | -0.603585 | 0.368160 | 1.181710 | 1.343070 | -0.287430 | 0.076946 |
---|
D | 0.890236 | 0.165982 | 0.046002 | 0.561446 | 1.084270 | -0.311546 |
---|
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.sort_values(by="B")
| A | B | C | D |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
"
df.sort_values(
by,
axis: 'Axis' = 0,
ascending=True,
inplace: 'bool' = False,
kind: 'str' = 'quicksort',
na_position: 'str' = 'last',
ignore_index: 'bool' = False,
key: 'ValueKeyFunc' = None,
)
"
df.sort_values(axis=1,by="2022-06-10")
| B | A | D | C |
---|
2022-06-07 | 0.465327 | 0.405263 | -0.311546 | 0.076946 |
---|
2022-06-08 | 0.976941 | 0.069129 | 1.084270 | -0.287430 |
---|
2022-06-09 | 1.172806 | -0.200227 | 0.561446 | 1.343070 |
---|
2022-06-10 | -1.609961 | -0.346164 | 0.046002 | 1.181710 |
---|
2022-06-11 | -0.263012 | -1.833497 | 0.165982 | 0.368160 |
---|
2022-06-12 | 0.955543 | -0.616906 | 0.890236 | -0.603585 |
---|
df.sort_values(axis=0,by="B")
| A | B | C | D |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
选择
- 虽然用于选择和设置的标准Python/NumPy表达式很直观,并且在交互工作中派上用场,但对于生产代码,我们推荐优化的pandas数据访问方法.at、、、、.iat和.loc .iloc
获取
- 选择单个列,这会产生a Series,相当于df.A:
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df["A"]
2022-06-07 0.405263
2022-06-08 0.069129
2022-06-09 -0.200227
2022-06-10 -0.346164
2022-06-11 -1.833497
2022-06-12 -0.616906
Freq: D, Name: A, dtype: float64
df["D"]
2022-06-07 -0.311546
2022-06-08 1.084270
2022-06-09 0.561446
2022-06-10 0.046002
2022-06-11 0.165982
2022-06-12 0.890236
Freq: D, Name: D, dtype: float64
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df[0:3]
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
df["20220608":"20220611"]
| A | B | C | D |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.28743 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.34307 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.18171 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.36816 | 0.165982 |
---|
按标签选择
df.loc[dates[0]]
A 0.405263
B 0.465327
C 0.076946
D -0.311546
Name: 2022-06-07 00:00:00, dtype: float64
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.loc()
<pandas.core.indexing._LocIndexer at 0x162f252a130>
df.loc[dates[1]]
A 0.069129
B 0.976941
C -0.287430
D 1.084270
Name: 2022-06-08 00:00:00, dtype: float64
df.loc[dates[2]]
A -0.200227
B 1.172806
C 1.343070
D 0.561446
Name: 2022-06-09 00:00:00, dtype: float64
df.loc[:,]
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.loc[:,["A","B"]]
| A | B |
---|
2022-06-07 | 0.405263 | 0.465327 |
---|
2022-06-08 | 0.069129 | 0.976941 |
---|
2022-06-09 | -0.200227 | 1.172806 |
---|
2022-06-10 | -0.346164 | -1.609961 |
---|
2022-06-11 | -1.833497 | -0.263012 |
---|
2022-06-12 | -0.616906 | 0.955543 |
---|
df.loc[["20220607","20220609"],["C","D"]]
| C | D |
---|
2022-06-07 | 0.076946 | -0.311546 |
---|
2022-06-09 | 1.343070 | 0.561446 |
---|
df.loc[["20220607","20220611"],["A","D"]]
| A | D |
---|
2022-06-07 | 0.405263 | -0.311546 |
---|
2022-06-11 | -1.833497 | 0.165982 |
---|
df.loc[["20220611"],["A","D"]]
| A | D |
---|
2022-06-11 | -1.833497 | 0.165982 |
---|
df.loc["20220611",["A","D"]]
A -1.833497
D 0.165982
Name: 2022-06-11 00:00:00, dtype: float64
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.loc[dates[0]]
A 0.405263
B 0.465327
C 0.076946
D -0.311546
Name: 2022-06-07 00:00:00, dtype: float64
df.loc[dates[1]]
A 0.069129
B 0.976941
C -0.287430
D 1.084270
Name: 2022-06-08 00:00:00, dtype: float64
df.loc[dates[1],"A"]
0.06912908863219207
df.loc[dates[1],"C"]
-0.28743026681864575
df.at[dates[0],"A"]
0.40526325343260083
df.at[dates[1],"C"]
-0.28743026681864575
按位置选择
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.iloc[3]
A -0.346164
B -1.609961
C 1.181710
D 0.046002
Name: 2022-06-10 00:00:00, dtype: float64
df.iloc[4]
A -1.833497
B -0.263012
C 0.368160
D 0.165982
Name: 2022-06-11 00:00:00, dtype: float64
df.iloc[3:5]
| A | B | C | D |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.18171 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.36816 | 0.165982 |
---|
df.iloc[3:5,0:2]
| A | B |
---|
2022-06-10 | -0.346164 | -1.609961 |
---|
2022-06-11 | -1.833497 | -0.263012 |
---|
- 通过整数位置列表,类似于Numpy/Python样式
df.iloc[[1,2,4],[0,2]]
| A | C |
---|
2022-06-08 | 0.069129 | -0.28743 |
---|
2022-06-09 | -0.200227 | 1.34307 |
---|
2022-06-11 | -1.833497 | 0.36816 |
---|
df.iloc[1:3,:]
| A | B | C | D |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.28743 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.34307 | 0.561446 |
---|
df.iloc[1:3,2:3]
| C |
---|
2022-06-08 | -0.28743 |
---|
2022-06-09 | 1.34307 |
---|
df.iloc[:,1:3]
| B | C |
---|
2022-06-07 | 0.465327 | 0.076946 |
---|
2022-06-08 | 0.976941 | -0.287430 |
---|
2022-06-09 | 1.172806 | 1.343070 |
---|
2022-06-10 | -1.609961 | 1.181710 |
---|
2022-06-11 | -0.263012 | 0.368160 |
---|
2022-06-12 | 0.955543 | -0.603585 |
---|
df.iloc[1:4,1:3]
| B | C |
---|
2022-06-08 | 0.976941 | -0.28743 |
---|
2022-06-09 | 1.172806 | 1.34307 |
---|
2022-06-10 | -1.609961 | 1.18171 |
---|
df.iloc[1,1]
0.9769407016879463
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.iloc[3,3]
0.04600243177073029
df.iat[1,1]
0.9769407016879463
df.iat[3,3]
0.04600243177073029
布尔索引
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df["A"]
2022-06-07 0.405263
2022-06-08 0.069129
2022-06-09 -0.200227
2022-06-10 -0.346164
2022-06-11 -1.833497
2022-06-12 -0.616906
Freq: D, Name: A, dtype: float64
df["A"]>0
2022-06-07 True
2022-06-08 True
2022-06-09 False
2022-06-10 False
2022-06-11 False
2022-06-12 False
Freq: D, Name: A, dtype: bool
df[df["A"]>0]
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
df
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df[df["B"] < 0]
| A | B | C | D |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.18171 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.36816 | 0.165982 |
---|
df3 = df.copy()
df3
| A | B | C | D |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df3["E"] = ["zero","one","two","three","four","five"]
df3
| A | B | C | D | E |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 | zero |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 | one |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 | two |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 | three |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 | four |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 | five |
---|
df3["E"]
2022-06-07 zero
2022-06-08 one
2022-06-09 two
2022-06-10 three
2022-06-11 four
2022-06-12 five
Freq: D, Name: E, dtype: object
df3["E"].isin(["two"])
2022-06-07 False
2022-06-08 False
2022-06-09 True
2022-06-10 False
2022-06-11 False
2022-06-12 False
Freq: D, Name: E, dtype: bool
df3["E"].isin(["two","four"])
2022-06-07 False
2022-06-08 False
2022-06-09 True
2022-06-10 False
2022-06-11 True
2022-06-12 False
Freq: D, Name: E, dtype: bool
df3[df3["E"].isin(["two","four"])]
| A | B | C | D | E |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.34307 | 0.561446 | two |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.36816 | 0.165982 | four |
---|
设置
s1 = pd.Series([1,2,3,4,5,6],index=pd.date_range("20220607",periods=6))
s1
2022-06-07 1
2022-06-08 2
2022-06-09 3
2022-06-10 4
2022-06-11 5
2022-06-12 6
Freq: D, dtype: int64
df.at[dates[0],"A"] = 0
df
| A | B | C | D |
---|
2022-06-07 | 0.000000 | 0.465327 | 0.076946 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df
| A | B | C | D |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | -0.311546 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.iat[0,3]=0
df
| A | B | C | D |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 |
---|
df.loc[:,"D"]
2022-06-07 0.000000
2022-06-08 1.084270
2022-06-09 0.561446
2022-06-10 0.046002
2022-06-11 0.165982
2022-06-12 0.890236
Freq: D, Name: D, dtype: float64
df.loc[:,"D"] = np.array([5] * len(df))
df
| A | B | C | D |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 |
---|
df.loc[:,"F"] = np.array([i for i in range(6)])
df
| A | B | C | D | F |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 |
---|
df.loc[:,"E"] = np.array([5] * len(df))
df
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
df4 = df.copy()
df4
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
df4 > 0
| A | B | C | D | F | E |
---|
2022-06-07 | False | False | False | True | False | True |
---|
2022-06-08 | True | True | False | True | True | True |
---|
2022-06-09 | False | True | True | True | True | True |
---|
2022-06-10 | False | False | True | True | True | True |
---|
2022-06-11 | False | False | True | True | True | True |
---|
2022-06-12 | False | True | False | True | True | True |
---|
df4[df4>0]
| A | B | C | D | F | E |
---|
2022-06-07 | NaN | NaN | NaN | 5 | NaN | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | NaN | 5 | 1.0 | 5 |
---|
2022-06-09 | NaN | 1.172806 | 1.34307 | 5 | 2.0 | 5 |
---|
2022-06-10 | NaN | NaN | 1.18171 | 5 | 3.0 | 5 |
---|
2022-06-11 | NaN | NaN | 0.36816 | 5 | 4.0 | 5 |
---|
2022-06-12 | NaN | 0.955543 | NaN | 5 | 5.0 | 5 |
---|
-df4
| A | B | C | D | F | E |
---|
2022-06-07 | -0.000000 | -0.000000 | -0.000000 | -5 | 0 | -5 |
---|
2022-06-08 | -0.069129 | -0.976941 | 0.287430 | -5 | -1 | -5 |
---|
2022-06-09 | 0.200227 | -1.172806 | -1.343070 | -5 | -2 | -5 |
---|
2022-06-10 | 0.346164 | 1.609961 | -1.181710 | -5 | -3 | -5 |
---|
2022-06-11 | 1.833497 | 0.263012 | -0.368160 | -5 | -4 | -5 |
---|
2022-06-12 | 0.616906 | -0.955543 | 0.603585 | -5 | -5 | -5 |
---|
df4[df4>0] = -df4
df4
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | -5 | 0 | -5 |
---|
2022-06-08 | -0.069129 | -0.976941 | -0.287430 | -5 | -1 | -5 |
---|
2022-06-09 | -0.200227 | -1.172806 | -1.343070 | -5 | -2 | -5 |
---|
2022-06-10 | -0.346164 | -1.609961 | -1.181710 | -5 | -3 | -5 |
---|
2022-06-11 | -1.833497 | -0.263012 | -0.368160 | -5 | -4 | -5 |
---|
2022-06-12 | -0.616906 | -0.955543 | -0.603585 | -5 | -5 | -5 |
---|
df4
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | -5 | 0 | -5 |
---|
2022-06-08 | -0.069129 | -0.976941 | -0.287430 | -5 | -1 | -5 |
---|
2022-06-09 | -0.200227 | -1.172806 | -1.343070 | -5 | -2 | -5 |
---|
2022-06-10 | -0.346164 | -1.609961 | -1.181710 | -5 | -3 | -5 |
---|
2022-06-11 | -1.833497 | -0.263012 | -0.368160 | -5 | -4 | -5 |
---|
2022-06-12 | -0.616906 | -0.955543 | -0.603585 | -5 | -5 | -5 |
---|
df3
| A | B | C | D | E |
---|
2022-06-07 | 0.405263 | 0.465327 | 0.076946 | -0.311546 | zero |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 1.084270 | one |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 0.561446 | two |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 0.046002 | three |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 0.165982 | four |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 0.890236 | five |
---|
df6 = df.copy()
df6
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
df6>0
| A | B | C | D | F | E |
---|
2022-06-07 | False | False | False | True | False | True |
---|
2022-06-08 | True | True | False | True | True | True |
---|
2022-06-09 | False | True | True | True | True | True |
---|
2022-06-10 | False | False | True | True | True | True |
---|
2022-06-11 | False | False | True | True | True | True |
---|
2022-06-12 | False | True | False | True | True | True |
---|
df6[df6>0]
| A | B | C | D | F | E |
---|
2022-06-07 | NaN | NaN | NaN | 5 | NaN | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | NaN | 5 | 1.0 | 5 |
---|
2022-06-09 | NaN | 1.172806 | 1.34307 | 5 | 2.0 | 5 |
---|
2022-06-10 | NaN | NaN | 1.18171 | 5 | 3.0 | 5 |
---|
2022-06-11 | NaN | NaN | 0.36816 | 5 | 4.0 | 5 |
---|
2022-06-12 | NaN | 0.955543 | NaN | 5 | 5.0 | 5 |
---|
df6
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
-df6
| A | B | C | D | F | E |
---|
2022-06-07 | -0.000000 | -0.000000 | -0.000000 | -5 | 0 | -5 |
---|
2022-06-08 | -0.069129 | -0.976941 | 0.287430 | -5 | -1 | -5 |
---|
2022-06-09 | 0.200227 | -1.172806 | -1.343070 | -5 | -2 | -5 |
---|
2022-06-10 | 0.346164 | 1.609961 | -1.181710 | -5 | -3 | -5 |
---|
2022-06-11 | 1.833497 | 0.263012 | -0.368160 | -5 | -4 | -5 |
---|
2022-06-12 | 0.616906 | -0.955543 | 0.603585 | -5 | -5 | -5 |
---|
df6[df6>0] = -df6
df6
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | -5 | 0 | -5 |
---|
2022-06-08 | -0.069129 | -0.976941 | -0.287430 | -5 | -1 | -5 |
---|
2022-06-09 | -0.200227 | -1.172806 | -1.343070 | -5 | -2 | -5 |
---|
2022-06-10 | -0.346164 | -1.609961 | -1.181710 | -5 | -3 | -5 |
---|
2022-06-11 | -1.833497 | -0.263012 | -0.368160 | -5 | -4 | -5 |
---|
2022-06-12 | -0.616906 | -0.955543 | -0.603585 | -5 | -5 | -5 |
---|
df6
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | -5 | 0 | -5 |
---|
2022-06-08 | -0.069129 | -0.976941 | -0.287430 | -5 | -1 | -5 |
---|
2022-06-09 | -0.200227 | -1.172806 | -1.343070 | -5 | -2 | -5 |
---|
2022-06-10 | -0.346164 | -1.609961 | -1.181710 | -5 | -3 | -5 |
---|
2022-06-11 | -1.833497 | -0.263012 | -0.368160 | -5 | -4 | -5 |
---|
2022-06-12 | -0.616906 | -0.955543 | -0.603585 | -5 | -5 | -5 |
---|
df7 = -df6
df7
| A | B | C | D | F | E |
---|
2022-06-07 | -0.000000 | -0.000000 | -0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | 0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | 0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | 0.346164 | 1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | 1.833497 | 0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | 0.616906 | 0.955543 | 0.603585 | 5 | 5 | 5 |
---|
df7[df7>0]
| A | B | C | D | F | E |
---|
2022-06-07 | NaN | NaN | NaN | 5 | NaN | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | 0.287430 | 5 | 1.0 | 5 |
---|
2022-06-09 | 0.200227 | 1.172806 | 1.343070 | 5 | 2.0 | 5 |
---|
2022-06-10 | 0.346164 | 1.609961 | 1.181710 | 5 | 3.0 | 5 |
---|
2022-06-11 | 1.833497 | 0.263012 | 0.368160 | 5 | 4.0 | 5 |
---|
2022-06-12 | 0.616906 | 0.955543 | 0.603585 | 5 | 5.0 | 5 |
---|
df7[df7>0]=0
df7
| A | B | C | D | F | E |
---|
2022-06-07 | -0.0 | -0.0 | -0.0 | 0 | 0 | 0 |
---|
2022-06-08 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 |
---|
2022-06-09 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 |
---|
2022-06-10 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 |
---|
2022-06-11 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 |
---|
2022-06-12 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 |
---|
缺失数据
df
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
df1 = df.reindex(index=dates[0:4],columns=list(df.columns) + ["E"])
df1.loc[dates[0] :dates[1],"E"] = 1
df1
| A | B | C | D | F | E | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.00000 | 5 | 0 | 1 | 1 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.28743 | 5 | 1 | 1 | 1 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.34307 | 5 | 2 | 5 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.18171 | 5 | 3 | 5 | 5 |
---|
df1.dropna(how="any")
| A | B | C | D | F | E | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.00000 | 5 | 0 | 1 | 1 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.28743 | 5 | 1 | 1 | 1 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.34307 | 5 | 2 | 5 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.18171 | 5 | 3 | 5 | 5 |
---|
df1 = df1[df1>0]
df1
| A | B | C | D | F | E | E |
---|
2022-06-07 | NaN | NaN | NaN | 5 | NaN | 1 | 1 |
---|
2022-06-08 | 0.069129 | 0.976941 | NaN | 5 | 1.0 | 1 | 1 |
---|
2022-06-09 | NaN | 1.172806 | 1.34307 | 5 | 2.0 | 5 | 5 |
---|
2022-06-10 | NaN | NaN | 1.18171 | 5 | 3.0 | 5 | 5 |
---|
df1.dropna(how="any")
df6
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | -5 | 0 | -5 |
---|
2022-06-08 | -0.069129 | -0.976941 | -0.287430 | -5 | -1 | -5 |
---|
2022-06-09 | -0.200227 | -1.172806 | -1.343070 | -5 | -2 | -5 |
---|
2022-06-10 | -0.346164 | -1.609961 | -1.181710 | -5 | -3 | -5 |
---|
2022-06-11 | -1.833497 | -0.263012 | -0.368160 | -5 | -4 | -5 |
---|
2022-06-12 | -0.616906 | -0.955543 | -0.603585 | -5 | -5 | -5 |
---|
-df6
| A | B | C | D | F | E |
---|
2022-06-07 | -0.000000 | -0.000000 | -0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | 0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | 0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | 0.346164 | 1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | 1.833497 | 0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | 0.616906 | 0.955543 | 0.603585 | 5 | 5 | 5 |
---|
df8 = -df6[-df6>0]
df8
| A | B | C | D | F | E |
---|
2022-06-07 | NaN | NaN | NaN | 5 | NaN | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | 0.287430 | 5 | 1.0 | 5 |
---|
2022-06-09 | 0.200227 | 1.172806 | 1.343070 | 5 | 2.0 | 5 |
---|
2022-06-10 | 0.346164 | 1.609961 | 1.181710 | 5 | 3.0 | 5 |
---|
2022-06-11 | 1.833497 | 0.263012 | 0.368160 | 5 | 4.0 | 5 |
---|
2022-06-12 | 0.616906 | 0.955543 | 0.603585 | 5 | 5.0 | 5 |
---|
df8.dropna(how="any")
| A | B | C | D | F | E |
---|
2022-06-08 | 0.069129 | 0.976941 | 0.287430 | 5 | 1.0 | 5 |
---|
2022-06-09 | 0.200227 | 1.172806 | 1.343070 | 5 | 2.0 | 5 |
---|
2022-06-10 | 0.346164 | 1.609961 | 1.181710 | 5 | 3.0 | 5 |
---|
2022-06-11 | 1.833497 | 0.263012 | 0.368160 | 5 | 4.0 | 5 |
---|
2022-06-12 | 0.616906 | 0.955543 | 0.603585 | 5 | 5.0 | 5 |
---|
df1
| A | B | C | D | F | E | E |
---|
2022-06-07 | NaN | NaN | NaN | 5 | NaN | 1 | 1 |
---|
2022-06-08 | 0.069129 | 0.976941 | NaN | 5 | 1.0 | 1 | 1 |
---|
2022-06-09 | NaN | 1.172806 | 1.34307 | 5 | 2.0 | 5 | 5 |
---|
2022-06-10 | NaN | NaN | 1.18171 | 5 | 3.0 | 5 | 5 |
---|
df1.fillna(value=5)
| A | B | C | D | F | E | E |
---|
2022-06-07 | 5.000000 | 5.000000 | 5.00000 | 5 | 5.0 | 1 | 1 |
---|
2022-06-08 | 0.069129 | 0.976941 | 5.00000 | 5 | 1.0 | 1 | 1 |
---|
2022-06-09 | 5.000000 | 1.172806 | 1.34307 | 5 | 2.0 | 5 | 5 |
---|
2022-06-10 | 5.000000 | 5.000000 | 1.18171 | 5 | 3.0 | 5 | 5 |
---|
df1
| A | B | C | D | F | E | E |
---|
2022-06-07 | NaN | NaN | NaN | 5 | NaN | 1 | 1 |
---|
2022-06-08 | 0.069129 | 0.976941 | NaN | 5 | 1.0 | 1 | 1 |
---|
2022-06-09 | NaN | 1.172806 | 1.34307 | 5 | 2.0 | 5 | 5 |
---|
2022-06-10 | NaN | NaN | 1.18171 | 5 | 3.0 | 5 | 5 |
---|
pd.isna(df1)
| A | B | C | D | F | E | E |
---|
2022-06-07 | True | True | True | False | True | False | False |
---|
2022-06-08 | False | False | True | False | False | False | False |
---|
2022-06-09 | True | False | False | False | False | False | False |
---|
2022-06-10 | True | True | False | False | False | False | False |
---|
操作
统计
df
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
df.mean()
A -0.487944
B 0.205386
C 0.333654
D 5.000000
F 2.500000
E 5.000000
dtype: float64
df.mean(1)
2022-06-07 1.666667
2022-06-08 1.959773
2022-06-09 2.385941
2022-06-10 2.037597
2022-06-11 2.045275
2022-06-12 2.455842
Freq: D, dtype: float64
df.mean(0)
A -0.487944
B 0.205386
C 0.333654
D 5.000000
F 2.500000
E 5.000000
dtype: float64
- 使用具有不同维度且需要对齐的对象进行操作。此外,pandas会自动沿指定维度进行广播:
s = pd.Series([1,3,5,np.nan,6,8],index=dates).shift(2)
s
2022-06-07 NaN
2022-06-08 NaN
2022-06-09 1.0
2022-06-10 3.0
2022-06-11 5.0
2022-06-12 NaN
Freq: D, dtype: float64
pd.Series([1,3,5,np.nan,6,8],index=dates).shift(3)
2022-06-07 NaN
2022-06-08 NaN
2022-06-09 NaN
2022-06-10 1.0
2022-06-11 3.0
2022-06-12 5.0
Freq: D, dtype: float64
pd.Series([1,3,5,np.nan,6,8],index=dates).shift(1)
2022-06-07 NaN
2022-06-08 1.0
2022-06-09 3.0
2022-06-10 5.0
2022-06-11 NaN
2022-06-12 6.0
Freq: D, dtype: float64
pd.Series([1,3,5,np.nan,6,8],index=dates)
2022-06-07 1.0
2022-06-08 3.0
2022-06-09 5.0
2022-06-10 NaN
2022-06-11 6.0
2022-06-12 8.0
Freq: D, dtype: float64
dates
DatetimeIndex(['2022-06-07', '2022-06-08', '2022-06-09', '2022-06-10',
'2022-06-11', '2022-06-12'],
dtype='datetime64[ns]', freq='D')
pd.Series([1,3,5,np.nan,7,8],index=dates)
2022-06-07 1.0
2022-06-08 3.0
2022-06-09 5.0
2022-06-10 NaN
2022-06-11 7.0
2022-06-12 8.0
Freq: D, dtype: float64
s = pd.Series([1,3,5,np.nan,7,8],index=dates).shift(2)
s
2022-06-07 NaN
2022-06-08 NaN
2022-06-09 1.0
2022-06-10 3.0
2022-06-11 5.0
2022-06-12 NaN
Freq: D, dtype: float64
应用
df
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
np.cumsum
<function numpy.cumsum(a, axis=None, dtype=None, out=None)>
df.apply(np.cumsum)
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 10 | 1 | 10 |
---|
2022-06-09 | -0.131098 | 2.149747 | 1.055640 | 15 | 3 | 15 |
---|
2022-06-10 | -0.477262 | 0.539786 | 2.237350 | 20 | 6 | 20 |
---|
2022-06-11 | -2.310759 | 0.276774 | 2.605510 | 25 | 10 | 25 |
---|
2022-06-12 | -2.927665 | 1.232316 | 2.001924 | 30 | 15 | 30 |
---|
df
| A | B | C | D | F | E |
---|
2022-06-07 | 0.000000 | 0.000000 | 0.000000 | 5 | 0 | 5 |
---|
2022-06-08 | 0.069129 | 0.976941 | -0.287430 | 5 | 1 | 5 |
---|
2022-06-09 | -0.200227 | 1.172806 | 1.343070 | 5 | 2 | 5 |
---|
2022-06-10 | -0.346164 | -1.609961 | 1.181710 | 5 | 3 | 5 |
---|
2022-06-11 | -1.833497 | -0.263012 | 0.368160 | 5 | 4 | 5 |
---|
2022-06-12 | -0.616906 | 0.955543 | -0.603585 | 5 | 5 | 5 |
---|
df.apply(lambda x: x.max() - x.min())
A 1.902626
B 2.782767
C 1.946656
D 0.000000
F 5.000000
E 0.000000
dtype: float64
df.apply(lambda x: x.max() - x.min(),axis=1)
2022-06-07 5.000000
2022-06-08 5.287430
2022-06-09 5.200227
2022-06-10 6.609961
2022-06-11 6.833497
2022-06-12 5.616906
Freq: D, dtype: float64
- series,只是一个一维数据结构,它由index和value组成。
- dataframe,是一个二维结构,除了拥有index和value之外,还拥有column。
- 联系:
- dataframe由多个series组成,无论是行还是列,单独拆分出来都是一个series。
直方图
s = pd.Series(np.random.randint(0,7,size=10))
s
0 2
1 5
2 2
3 4
4 1
5 3
6 0
7 2
8 5
9 4
dtype: int32
s.value_counts()
2 3
5 2
4 2
1 1
3 1
0 1
dtype: int64
字符串的方法
- Series 在属性中配备了一组字符串处理方式str,可以方便地对数组的每个元素进行操作,如下面的代码片段所示。请注意,模式匹配str通常默认使用正则表达式
s = pd.Series(["A","B","C","Aaba","Baca",np.nan,"CABA","dog","cat"])
s
0 A
1 B
2 C
3 Aaba
4 Baca
5 NaN
6 CABA
7 dog
8 cat
dtype: object
s.str
<pandas.core.strings.accessor.StringMethods at 0x26552ed2a90>
s.str.lower()
0 a
1 b
2 c
3 aaba
4 baca
5 NaN
6 caba
7 dog
8 cat
dtype: object
s.str.upper()
0 A
1 B
2 C
3 AABA
4 BACA
5 NaN
6 CABA
7 DOG
8 CAT
dtype: object
合并
df = pd.DataFrame(np.random.randn(10,4))
df
| 0 | 1 | 2 | 3 |
---|
0 | 0.879358 | -0.162415 | -0.122199 | -1.436661 |
---|
1 | -0.090463 | 0.173721 | -0.425374 | -0.509393 |
---|
2 | -1.155403 | -1.351560 | 0.032734 | 0.085148 |
---|
3 | -0.808055 | -1.637611 | 0.382922 | 0.525315 |
---|
4 | 0.659453 | -0.851103 | 0.214721 | 1.031853 |
---|
5 | 0.532633 | 1.506630 | 1.476901 | -1.016453 |
---|
6 | 0.860219 | 3.015384 | 1.003056 | -2.795348 |
---|
7 | 0.580518 | -2.575408 | 1.470146 | -1.946652 |
---|
8 | -1.104715 | 0.954115 | 0.479431 | 1.001990 |
---|
9 | 0.709469 | -1.613924 | 0.424452 | -0.641368 |
---|
pd.DataFrame(np.random.randn(10,4))
| 0 | 1 | 2 | 3 |
---|
0 | -1.231020 | 0.062966 | 0.248977 | -2.006465 |
---|
1 | -0.121096 | -0.790854 | 1.270002 | 0.437691 |
---|
2 | -1.342012 | -0.213068 | -0.632990 | -0.454876 |
---|
3 | -2.299231 | -0.449179 | 0.799823 | 1.320912 |
---|
4 | -0.214516 | -0.759868 | -0.509929 | 0.125942 |
---|
5 | 1.743264 | -0.047220 | 0.532117 | 0.087455 |
---|
6 | -0.172050 | 0.387625 | 0.903231 | 1.419179 |
---|
7 | 0.610765 | -0.666323 | -0.396873 | 0.956829 |
---|
8 | -0.740147 | 1.397083 | 0.360241 | 0.106912 |
---|
9 | -0.402985 | 1.289189 | -0.202836 | -1.308507 |
---|
df
| 0 | 1 | 2 | 3 |
---|
0 | 0.879358 | -0.162415 | -0.122199 | -1.436661 |
---|
1 | -0.090463 | 0.173721 | -0.425374 | -0.509393 |
---|
2 | -1.155403 | -1.351560 | 0.032734 | 0.085148 |
---|
3 | -0.808055 | -1.637611 | 0.382922 | 0.525315 |
---|
4 | 0.659453 | -0.851103 | 0.214721 | 1.031853 |
---|
5 | 0.532633 | 1.506630 | 1.476901 | -1.016453 |
---|
6 | 0.860219 | 3.015384 | 1.003056 | -2.795348 |
---|
7 | 0.580518 | -2.575408 | 1.470146 | -1.946652 |
---|
8 | -1.104715 | 0.954115 | 0.479431 | 1.001990 |
---|
9 | 0.709469 | -1.613924 | 0.424452 | -0.641368 |
---|
df[:3]
| 0 | 1 | 2 | 3 |
---|
0 | 0.879358 | -0.162415 | -0.122199 | -1.436661 |
---|
1 | -0.090463 | 0.173721 | -0.425374 | -0.509393 |
---|
2 | -1.155403 | -1.351560 | 0.032734 | 0.085148 |
---|
df[3:7]
| 0 | 1 | 2 | 3 |
---|
3 | -0.808055 | -1.637611 | 0.382922 | 0.525315 |
---|
4 | 0.659453 | -0.851103 | 0.214721 | 1.031853 |
---|
5 | 0.532633 | 1.506630 | 1.476901 | -1.016453 |
---|
6 | 0.860219 | 3.015384 | 1.003056 | -2.795348 |
---|
df[7:]
| 0 | 1 | 2 | 3 |
---|
7 | 0.580518 | -2.575408 | 1.470146 | -1.946652 |
---|
8 | -1.104715 | 0.954115 | 0.479431 | 1.001990 |
---|
9 | 0.709469 | -1.613924 | 0.424452 | -0.641368 |
---|
pieces=[df[:3],df[3:7],df[7:]]
pieces
[ 0 1 2 3
0 0.879358 -0.162415 -0.122199 -1.436661
1 -0.090463 0.173721 -0.425374 -0.509393
2 -1.155403 -1.351560 0.032734 0.085148,
0 1 2 3
3 -0.808055 -1.637611 0.382922 0.525315
4 0.659453 -0.851103 0.214721 1.031853
5 0.532633 1.506630 1.476901 -1.016453
6 0.860219 3.015384 1.003056 -2.795348,
0 1 2 3
7 0.580518 -2.575408 1.470146 -1.946652
8 -1.104715 0.954115 0.479431 1.001990
9 0.709469 -1.613924 0.424452 -0.641368]
pd.concat(pieces)
| 0 | 1 | 2 | 3 |
---|
0 | 0.879358 | -0.162415 | -0.122199 | -1.436661 |
---|
1 | -0.090463 | 0.173721 | -0.425374 | -0.509393 |
---|
2 | -1.155403 | -1.351560 | 0.032734 | 0.085148 |
---|
3 | -0.808055 | -1.637611 | 0.382922 | 0.525315 |
---|
4 | 0.659453 | -0.851103 | 0.214721 | 1.031853 |
---|
5 | 0.532633 | 1.506630 | 1.476901 | -1.016453 |
---|
6 | 0.860219 | 3.015384 | 1.003056 | -2.795348 |
---|
7 | 0.580518 | -2.575408 | 1.470146 | -1.946652 |
---|
8 | -1.104715 | 0.954115 | 0.479431 | 1.001990 |
---|
9 | 0.709469 | -1.613924 | 0.424452 | -0.641368 |
---|
a = [df[3:]]
a
[ 0 1 2 3
3 -0.808055 -1.637611 0.382922 0.525315
4 0.659453 -0.851103 0.214721 1.031853
5 0.532633 1.506630 1.476901 -1.016453
6 0.860219 3.015384 1.003056 -2.795348
7 0.580518 -2.575408 1.470146 -1.946652
8 -1.104715 0.954115 0.479431 1.001990
9 0.709469 -1.613924 0.424452 -0.641368]
pd.concat(a)
| 0 | 1 | 2 | 3 |
---|
3 | -0.808055 | -1.637611 | 0.382922 | 0.525315 |
---|
4 | 0.659453 | -0.851103 | 0.214721 | 1.031853 |
---|
5 | 0.532633 | 1.506630 | 1.476901 | -1.016453 |
---|
6 | 0.860219 | 3.015384 | 1.003056 | -2.795348 |
---|
7 | 0.580518 | -2.575408 | 1.470146 | -1.946652 |
---|
8 | -1.104715 | 0.954115 | 0.479431 | 1.001990 |
---|
9 | 0.709469 | -1.613924 | 0.424452 | -0.641368 |
---|
pd
<module 'pandas' from 'D:\\software\\anaconda\\lib\\site-packages\\pandas\\__init__.py'>
df
| 0 | 1 | 2 | 3 |
---|
0 | 0.879358 | -0.162415 | -0.122199 | -1.436661 |
---|
1 | -0.090463 | 0.173721 | -0.425374 | -0.509393 |
---|
2 | -1.155403 | -1.351560 | 0.032734 | 0.085148 |
---|
3 | -0.808055 | -1.637611 | 0.382922 | 0.525315 |
---|
4 | 0.659453 | -0.851103 | 0.214721 | 1.031853 |
---|
5 | 0.532633 | 1.506630 | 1.476901 | -1.016453 |
---|
6 | 0.860219 | 3.015384 | 1.003056 | -2.795348 |
---|
7 | 0.580518 | -2.575408 | 1.470146 | -1.946652 |
---|
8 | -1.104715 | 0.954115 | 0.479431 | 1.001990 |
---|
9 | 0.709469 | -1.613924 | 0.424452 | -0.641368 |
---|
- [笔记]
- 向a添加列DataFrame相对较快。但是,添加一行需要一个副本,并且可能很昂贵。我们建议将预先构建的记录列表传递给DataFrame构造函数,而不是DataFrame通过迭代地将记录附加到它来构建一个。
加入
left = pd.DataFrame({"key":["foo","foo"],"lval":[1,2]})
left
right = pd.DataFrame({"key":["foo","foo"],"rval":[4,5]})
right
pd.merge(left,right,on="key")
| key | lval | rval |
---|
0 | foo | 1 | 4 |
---|
1 | foo | 1 | 5 |
---|
2 | foo | 2 | 4 |
---|
3 | foo | 2 | 5 |
---|
pd
<module 'pandas' from 'D:\\software\\anaconda\\lib\\site-packages\\pandas\\__init__.py'>
left
right
left = pd.DataFrame({"key":["foo","bar"],"lval":[1,2]})
left
right = pd.DataFrame({"key":["foo","bar"],"rval":[4,5]})
right
pd.merge(left,right,on="key")
分组
- "分组依据"是指涉及以下一个或多个步骤的过程:
- 根据某些标准将数据分组
- 将函数独立应用于每个组
- 将结果组合成数据结构
df = pd.DataFrame({
"A":["foo","bar","foo","bar","foo","bar","foo","foo"],
"B":["zero","one","two","there","four","five","six","seven"],
"C":np.random.randn(8),
"D":np.random.randn(8),
}
)
df
| A | B | C | D |
---|
0 | foo | zero | 0.729545 | 0.301263 |
---|
1 | bar | one | 1.603889 | 0.458280 |
---|
2 | foo | two | 0.633382 | 1.820535 |
---|
3 | bar | there | -0.723170 | 1.917200 |
---|
4 | foo | four | 0.581405 | 0.961305 |
---|
5 | bar | five | -1.414755 | 0.986130 |
---|
6 | foo | six | 0.577222 | 0.851816 |
---|
7 | foo | seven | -1.318073 | 0.757913 |
---|
df.groupby("A")
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002655C66AFD0>
df.groupby("A").sum()
| C | D |
---|
A | | |
---|
bar | -0.534035 | 3.361610 |
---|
foo | 1.203481 | 4.692833 |
---|
df.index
RangeIndex(start=0, stop=8, step=1)
df.columns
Index(['A', 'B', 'C', 'D'], dtype='object')
df.groupby("D").mean()
| C |
---|
D | |
---|
0.301263 | 0.729545 |
---|
0.458280 | 1.603889 |
---|
0.757913 | -1.318073 |
---|
0.851816 | 0.577222 |
---|
0.961305 | 0.581405 |
---|
0.986130 | -1.414755 |
---|
1.820535 | 0.633382 |
---|
1.917200 | -0.723170 |
---|
df.groupby("D").sum()
| C |
---|
D | |
---|
0.301263 | 0.729545 |
---|
0.458280 | 1.603889 |
---|
0.757913 | -1.318073 |
---|
0.851816 | 0.577222 |
---|
0.961305 | 0.581405 |
---|
0.986130 | -1.414755 |
---|
1.820535 | 0.633382 |
---|
1.917200 | -0.723170 |
---|
df.sum()
A foobarfoobarfoobarfoofoo
B zeroonetwotherefourfivesixseven
C 0.669445
D 8.054443
dtype: object
df.groupby("A").sum()
| C | D |
---|
A | | |
---|
bar | -0.534035 | 3.361610 |
---|
foo | 1.203481 | 4.692833 |
---|
df.groupby("B").sum()
| C | D |
---|
B | | |
---|
five | -1.414755 | 0.986130 |
---|
four | 0.581405 | 0.961305 |
---|
one | 1.603889 | 0.458280 |
---|
seven | -1.318073 | 0.757913 |
---|
six | 0.577222 | 0.851816 |
---|
there | -0.723170 | 1.917200 |
---|
two | 0.633382 | 1.820535 |
---|
zero | 0.729545 | 0.301263 |
---|
- 按照多列分组形成层次索引,我们可以再次应用该sum()函数:
df.groupby(["A","B"]).sum()
| | C | D |
---|
A | B | | |
---|
bar | five | -1.414755 | 0.986130 |
---|
one | 1.603889 | 0.458280 |
---|
there | -0.723170 | 1.917200 |
---|
foo | four | 0.581405 | 0.961305 |
---|
seven | -1.318073 | 0.757913 |
---|
six | 0.577222 | 0.851816 |
---|
two | 0.633382 | 1.820535 |
---|
zero | 0.729545 | 0.301263 |
---|
重塑
堆栈
tuples = list(
zip(
*[
["bar","bar","baz","baz","foo","foo","qux","que"],
["one","two","one","two","one","two","one","there"]
]
)
)
tuples
[('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('que', 'there')]
index = pd.MultiIndex.from_tuples(tuples,name=["first","second"])
index
MultiIndex([('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('foo', 'two'),
('qux', 'one'),
('que', 'there')],
names=['first', 'second'])
df = pd.DataFrame(np.random.randn(8,2),index=index,columns=["A","B"])
df
| | A | B |
---|
first | second | | |
---|
bar | one | -0.484529 | 1.071371 |
---|
two | -0.295187 | 1.255282 |
---|
baz | one | -0.341345 | 0.790919 |
---|
two | -1.297284 | -1.285871 |
---|
foo | one | -0.598789 | -0.624319 |
---|
two | -0.297379 | -2.395637 |
---|
qux | one | -0.194991 | 0.614675 |
---|
que | there | 0.866328 | 0.098695 |
---|
df2 = df[:4]
df2
| | A | B |
---|
first | second | | |
---|
bar | one | -0.484529 | 1.071371 |
---|
two | -0.295187 | 1.255282 |
---|
baz | one | -0.341345 | 0.790919 |
---|
two | -1.297284 | -1.285871 |
---|
- 该stack()方法在DataFrame的列中"压缩"一个级别:
stacked = df2.stack()
stacked
first second
bar one A -0.484529
B 1.071371
two A -0.295187
B 1.255282
baz one A -0.341345
B 0.790919
two A -1.297284
B -1.285871
dtype: float64
- 对于"堆叠"的DataFrame或Series (将aMultiIndexz作为),is index的逆运算,默认情况下会取消堆栈最后一层:stack()unstack()
stacked
first second
bar one A -0.484529
B 1.071371
two A -0.295187
B 1.255282
baz one A -0.341345
B 0.790919
two A -1.297284
B -1.285871
dtype: float64
stacked.unstack()
| | A | B |
---|
first | second | | |
---|
bar | one | -0.484529 | 1.071371 |
---|
two | -0.295187 | 1.255282 |
---|
baz | one | -0.341345 | 0.790919 |
---|
two | -1.297284 | -1.285871 |
---|
stacked.unstack().unstack()
| A | B |
---|
second | one | two | one | two |
---|
first | | | | |
---|
bar | -0.484529 | -0.295187 | 1.071371 | 1.255282 |
---|
baz | -0.341345 | -1.297284 | 0.790919 | -1.285871 |
---|
- stack()就是把二维表转化成一维表
- unstack() 则为stack的逆函数,即把一维表转化成二维表的过程
stacked.unstack(1)
| second | one | two |
---|
first | | | |
---|
bar | A | -0.484529 | -0.295187 |
---|
B | 1.071371 | 1.255282 |
---|
baz | A | -0.341345 | -1.297284 |
---|
B | 0.790919 | -1.285871 |
---|
stacked.unstack(2)
| | A | B |
---|
first | second | | |
---|
bar | one | -0.484529 | 1.071371 |
---|
two | -0.295187 | 1.255282 |
---|
baz | one | -0.341345 | 0.790919 |
---|
two | -1.297284 | -1.285871 |
---|
stacked.unstack(0)
数据透视表
df = pd.DataFrame({
"A":["one","one","two","three"] * 3,
"B":["A","B","C"] * 4,
"C":["foo","foo","foo","bar","bar","bar"]*2,
"D": np.random.randn(12),
"E": np.random.randn(12),
})
df
| A | B | C | D | E |
---|
0 | one | A | foo | 0.013846 | 0.444128 |
---|
1 | one | B | foo | 1.785051 | -0.880777 |
---|
2 | two | C | foo | 2.020651 | -1.403231 |
---|
3 | three | A | bar | -0.623111 | -0.053250 |
---|
4 | one | B | bar | -0.022848 | -0.821333 |
---|
5 | one | C | bar | -0.962751 | 0.691853 |
---|
6 | two | A | foo | 0.991734 | -1.796295 |
---|
7 | three | B | foo | -0.326107 | -1.437360 |
---|
8 | one | C | foo | 1.634899 | -0.036184 |
---|
9 | one | A | bar | -0.099110 | 1.219143 |
---|
10 | two | B | bar | 0.140044 | 2.462987 |
---|
11 | three | C | bar | 1.043458 | -0.416262 |
---|
pd.pivot_table(df,values="D",index=["A","B"],columns=["C"])
| C | bar | foo |
---|
A | B | | |
---|
one | A | -0.099110 | 0.013846 |
---|
B | -0.022848 | 1.785051 |
---|
C | -0.962751 | 1.634899 |
---|
three | A | -0.623111 | NaN |
---|
B | NaN | -0.326107 |
---|
C | 1.043458 | NaN |
---|
two | A | NaN | 0.991734 |
---|
B | 0.140044 | NaN |
---|
C | NaN | 2.020651 |
---|
时间序列
- pandas具有简单、强大、高效的功能,用于在频率转换期间执行重采样操作(例如,将秒数据转换为5分钟数据)。这在但不限于金融应用程序中极为常见
rng = pd.date_range("6/8/2022",periods=100,freq="S")
rng
DatetimeIndex(['2022-06-08 00:00:00', '2022-06-08 00:00:01',
'2022-06-08 00:00:02', '2022-06-08 00:00:03',
'2022-06-08 00:00:04', '2022-06-08 00:00:05',
'2022-06-08 00:00:06', '2022-06-08 00:00:07',
'2022-06-08 00:00:08', '2022-06-08 00:00:09',
'2022-06-08 00:00:10', '2022-06-08 00:00:11',
'2022-06-08 00:00:12', '2022-06-08 00:00:13',
'2022-06-08 00:00:14', '2022-06-08 00:00:15',
'2022-06-08 00:00:16', '2022-06-08 00:00:17',
'2022-06-08 00:00:18', '2022-06-08 00:00:19',
'2022-06-08 00:00:20', '2022-06-08 00:00:21',
'2022-06-08 00:00:22', '2022-06-08 00:00:23',
'2022-06-08 00:00:24', '2022-06-08 00:00:25',
'2022-06-08 00:00:26', '2022-06-08 00:00:27',
'2022-06-08 00:00:28', '2022-06-08 00:00:29',
'2022-06-08 00:00:30', '2022-06-08 00:00:31',
'2022-06-08 00:00:32', '2022-06-08 00:00:33',
'2022-06-08 00:00:34', '2022-06-08 00:00:35',
'2022-06-08 00:00:36', '2022-06-08 00:00:37',
'2022-06-08 00:00:38', '2022-06-08 00:00:39',
'2022-06-08 00:00:40', '2022-06-08 00:00:41',
'2022-06-08 00:00:42', '2022-06-08 00:00:43',
'2022-06-08 00:00:44', '2022-06-08 00:00:45',
'2022-06-08 00:00:46', '2022-06-08 00:00:47',
'2022-06-08 00:00:48', '2022-06-08 00:00:49',
'2022-06-08 00:00:50', '2022-06-08 00:00:51',
'2022-06-08 00:00:52', '2022-06-08 00:00:53',
'2022-06-08 00:00:54', '2022-06-08 00:00:55',
'2022-06-08 00:00:56', '2022-06-08 00:00:57',
'2022-06-08 00:00:58', '2022-06-08 00:00:59',
'2022-06-08 00:01:00', '2022-06-08 00:01:01',
'2022-06-08 00:01:02', '2022-06-08 00:01:03',
'2022-06-08 00:01:04', '2022-06-08 00:01:05',
'2022-06-08 00:01:06', '2022-06-08 00:01:07',
'2022-06-08 00:01:08', '2022-06-08 00:01:09',
'2022-06-08 00:01:10', '2022-06-08 00:01:11',
'2022-06-08 00:01:12', '2022-06-08 00:01:13',
'2022-06-08 00:01:14', '2022-06-08 00:01:15',
'2022-06-08 00:01:16', '2022-06-08 00:01:17',
'2022-06-08 00:01:18', '2022-06-08 00:01:19',
'2022-06-08 00:01:20', '2022-06-08 00:01:21',
'2022-06-08 00:01:22', '2022-06-08 00:01:23',
'2022-06-08 00:01:24', '2022-06-08 00:01:25',
'2022-06-08 00:01:26', '2022-06-08 00:01:27',
'2022-06-08 00:01:28', '2022-06-08 00:01:29',
'2022-06-08 00:01:30', '2022-06-08 00:01:31',
'2022-06-08 00:01:32', '2022-06-08 00:01:33',
'2022-06-08 00:01:34', '2022-06-08 00:01:35',
'2022-06-08 00:01:36', '2022-06-08 00:01:37',
'2022-06-08 00:01:38', '2022-06-08 00:01:39'],
dtype='datetime64[ns]', freq='S')
ts = pd.Series(np.random.randint(0,500,len(rng)),index=rng)
ts
2022-06-08 00:00:00 355
2022-06-08 00:00:01 109
2022-06-08 00:00:02 457
2022-06-08 00:00:03 481
2022-06-08 00:00:04 220
...
2022-06-08 00:01:35 104
2022-06-08 00:01:36 461
2022-06-08 00:01:37 176
2022-06-08 00:01:38 37
2022-06-08 00:01:39 26
Freq: S, Length: 100, dtype: int32
ts.resample("5Min").sum()
2022-06-08 24384
Freq: 5T, dtype: int32
pd.date_range(
start=None,
end=None,
periods=None,
freq=None,
tz=None,
normalize=False,
name=None,
closed=None,
**kwargs,
)
rng = pd.date_range("6/8/2022 14:30",periods=3,freq="D")
rng
DatetimeIndex(['2022-06-08 14:30:00', '2022-06-09 14:30:00',
'2022-06-10 14:30:00'],
dtype='datetime64[ns]', freq='D')
pd.date_range("6/8/2022 14:30",periods=6,freq="D")
DatetimeIndex(['2022-06-08 14:30:00', '2022-06-09 14:30:00',
'2022-06-10 14:30:00', '2022-06-11 14:30:00',
'2022-06-12 14:30:00', '2022-06-13 14:30:00'],
dtype='datetime64[ns]', freq='D')
pd.date_range("6/8/2022 14:30",periods=6,freq="T")
DatetimeIndex(['2022-06-08 14:30:00', '2022-06-08 14:31:00',
'2022-06-08 14:32:00', '2022-06-08 14:33:00',
'2022-06-08 14:34:00', '2022-06-08 14:35:00'],
dtype='datetime64[ns]', freq='T')
pd.date_range("6/8/2022 14:30",periods=6,freq="D")
DatetimeIndex(['2022-06-08 14:30:00', '2022-06-09 14:30:00',
'2022-06-10 14:30:00', '2022-06-11 14:30:00',
'2022-06-12 14:30:00', '2022-06-13 14:30:00'],
dtype='datetime64[ns]', freq='D')
rng
DatetimeIndex(['2022-06-08 14:30:00', '2022-06-09 14:30:00',
'2022-06-10 14:30:00', '2022-06-11 14:30:00',
'2022-06-12 14:30:00', '2022-06-13 14:30:00'],
dtype='datetime64[ns]', freq='D')
ts = pd.Series(np.random.randn(len(rng)),rng)
ts
2022-06-08 14:30:00 -1.849361
2022-06-09 14:30:00 1.354631
2022-06-10 14:30:00 0.412876
2022-06-11 14:30:00 1.465844
2022-06-12 14:30:00 0.665059
2022-06-13 14:30:00 2.036140
Freq: D, dtype: float64
ts_utc = ts.tz_localize("UTC")
ts_utc
2022-06-08 14:30:00+00:00 -1.849361
2022-06-09 14:30:00+00:00 1.354631
2022-06-10 14:30:00+00:00 0.412876
2022-06-11 14:30:00+00:00 1.465844
2022-06-12 14:30:00+00:00 0.665059
2022-06-13 14:30:00+00:00 2.036140
Freq: D, dtype: float64
ts_utc.tz_convert("US/Eastern")
2022-06-08 10:30:00-04:00 -1.849361
2022-06-09 10:30:00-04:00 1.354631
2022-06-10 10:30:00-04:00 0.412876
2022-06-11 10:30:00-04:00 1.465844
2022-06-12 10:30:00-04:00 0.665059
2022-06-13 10:30:00-04:00 2.036140
Freq: D, dtype: float64
rng = pd.date_range("6/8/2022",periods=5,freq="M")
rng
DatetimeIndex(['2022-06-30', '2022-07-31', '2022-08-31', '2022-09-30',
'2022-10-31'],
dtype='datetime64[ns]', freq='M')
pd.date_range("6/8/2022",periods=8,freq="M")
DatetimeIndex(['2022-06-30', '2022-07-31', '2022-08-31', '2022-09-30',
'2022-10-31', '2022-11-30', '2022-12-31', '2023-01-31'],
dtype='datetime64[ns]', freq='M')
ts = pd.Series(np.random.randn(len(rng)),index=rng)
ts
2022-06-30 1.211478
2022-07-31 0.086780
2022-08-31 -0.373740
2022-09-30 -1.602854
2022-10-31 0.428028
Freq: M, dtype: float64
ps = ts.to_period()
ps
2022-06 1.211478
2022-07 0.086780
2022-08 -0.373740
2022-09 -1.602854
2022-10 0.428028
Freq: M, dtype: float64
ps.to_timestamp()
2022-06-01 1.211478
2022-07-01 0.086780
2022-08-01 -0.373740
2022-09-01 -1.602854
2022-10-01 0.428028
Freq: MS, dtype: float64
prng = pd.period_range("2021Q1","2022Q4",freq="Q-NOV")
prng
PeriodIndex(['2021Q1', '2021Q2', '2021Q3', '2021Q4', '2022Q1', '2022Q2',
'2022Q3', '2022Q4'],
dtype='period[Q-NOV]')
ts = pd.Series(np.random.randn(len(prng)),prng)
ts
2021Q1 -0.506796
2021Q2 -0.481430
2021Q3 -0.078390
2021Q4 -0.080919
2022Q1 0.057916
2022Q2 -0.151808
2022Q3 -0.936490
2022Q4 -0.320068
Freq: Q-NOV, dtype: float64
ts.index = (prng.asfreq("M","e") + 1).asfreq("H","s") + 9
ts.index
PeriodIndex(['2021-03-01 09:00', '2021-06-01 09:00', '2021-09-01 09:00',
'2021-12-01 09:00', '2022-03-01 09:00', '2022-06-01 09:00',
'2022-09-01 09:00', '2022-12-01 09:00'],
dtype='period[H]')
ts.head()
2021-03-01 09:00 -0.506796
2021-06-01 09:00 -0.481430
2021-09-01 09:00 -0.078390
2021-12-01 09:00 -0.080919
2022-03-01 09:00 0.057916
Freq: H, dtype: float64
分类
- pandas可以在DataFrame.如需完整文档
df = pd.DataFrame(
{"id":[1,2,3,4,5,6],"raw_grade":["a","b","b","a","a","e"]}
)
df
| id | raw_grade |
---|
0 | 1 | a |
---|
1 | 2 | b |
---|
2 | 3 | b |
---|
3 | 4 | a |
---|
4 | 5 | a |
---|
5 | 6 | e |
---|
df["grade"] = df["raw_grade"].astype("category")
df["grade"]
0 a
1 b
2 b
3 a
4 a
5 e
Name: grade, dtype: category
Categories (3, object): ['a', 'b', 'e']
df["grade"].cat.categories = ["very good","good","very bad"]
df
| id | raw_grade | grade |
---|
0 | 1 | a | very good |
---|
1 | 2 | b | good |
---|
2 | 3 | b | good |
---|
3 | 4 | a | very good |
---|
4 | 5 | a | very good |
---|
5 | 6 | e | very bad |
---|
- 重新排序类别并同时添加缺失的类别(默认情况下Series.cat()返回一个新的方法):Series
df["grade"] = df["grade"].cat.set_categories(
["very bad","bad","medium","good","very good"]
)
df["grade"]
0 very good
1 good
2 good
3 very good
4 very good
5 very bad
Name: grade, dtype: category
Categories (5, object): ['very bad', 'bad', 'medium', 'good', 'very good']
df
| id | raw_grade | grade |
---|
0 | 1 | a | very good |
---|
1 | 2 | b | good |
---|
2 | 3 | b | good |
---|
3 | 4 | a | very good |
---|
4 | 5 | a | very good |
---|
5 | 6 | e | very bad |
---|
df.sort_values(by="grade")
| id | raw_grade | grade |
---|
5 | 6 | e | very bad |
---|
1 | 2 | b | good |
---|
2 | 3 | b | good |
---|
0 | 1 | a | very good |
---|
3 | 4 | a | very good |
---|
4 | 5 | a | very good |
---|
df.groupby("grade").size()
grade
very bad 1
bad 0
medium 0
good 2
very good 3
dtype: int64
绘图
- 我们使用标准约定来引用 matplotlib API:
import matplotlib.pyplot as plt
plt.close("all")
ts = pd.Series(np.random.randn(1000),index=pd.date_range("6/8/2022",periods=1000))
ts = ts.cumsum()
ts
2022-06-08 -0.416538
2022-06-09 -1.186893
2022-06-10 -0.974144
2022-06-11 -0.929173
2022-06-12 0.371832
...
2025-02-27 14.514577
2025-02-28 15.186525
2025-03-01 15.595083
2025-03-02 16.554780
2025-03-03 17.165945
Freq: D, Length: 1000, dtype: float64
ts.plot()
<AxesSubplot:>
- 如果在Jupyter Notebook 下运行,绘图将出现在plot().否则使用matplotlib.pyplot.show显示它或matplotlib.pyplot.savefig将其写入文件。
plt
<module 'matplotlib.pyplot' from 'D:\\software\\anaconda\\lib\\site-packages\\matplotlib\\pyplot.py'>
plt.show()
- 在DataFrame上,该plot()方法可以方便地绘制带有标签的所有列
ts.index
DatetimeIndex(['2022-06-08', '2022-06-09', '2022-06-10', '2022-06-11',
'2022-06-12', '2022-06-13', '2022-06-14', '2022-06-15',
'2022-06-16', '2022-06-17',
...
'2025-02-22', '2025-02-23', '2025-02-24', '2025-02-25',
'2025-02-26', '2025-02-27', '2025-02-28', '2025-03-01',
'2025-03-02', '2025-03-03'],
dtype='datetime64[ns]', length=1000, freq='D')
df = pd.DataFrame(
np.random.randn(1000,4),index=ts.index,columns=["A","B","C","D"]
)
df
| A | B | C | D |
---|
2022-06-08 | -1.411527 | -0.124331 | -0.748194 | 0.795625 |
---|
2022-06-09 | 0.327356 | 1.127876 | -0.176681 | -0.140429 |
---|
2022-06-10 | -0.546087 | 0.056621 | 0.879618 | 0.111533 |
---|
2022-06-11 | -0.723865 | -1.197658 | -0.134488 | 0.762858 |
---|
2022-06-12 | -0.584152 | -0.205798 | -0.457109 | 0.613583 |
---|
... | ... | ... | ... | ... |
---|
2025-02-27 | 0.952618 | 0.809016 | -1.256770 | 0.544052 |
---|
2025-02-28 | -0.325551 | -1.333431 | -2.593479 | 0.753844 |
---|
2025-03-01 | 0.072350 | 0.950298 | 1.112801 | 0.644935 |
---|
2025-03-02 | -0.149229 | -0.704682 | -1.647990 | 0.780895 |
---|
2025-03-03 | 0.944789 | 0.680362 | 0.892620 | -1.074460 |
---|
1000 rows × 4 columns
df.cumsum()
| A | B | C | D |
---|
2022-06-08 | -1.411527 | -0.124331 | -0.748194 | 0.795625 |
---|
2022-06-09 | -1.084171 | 1.003545 | -0.924875 | 0.655196 |
---|
2022-06-10 | -1.630259 | 1.060166 | -0.045257 | 0.766730 |
---|
2022-06-11 | -2.354124 | -0.137492 | -0.179745 | 1.529588 |
---|
2022-06-12 | -2.938276 | -0.343290 | -0.636854 | 2.143171 |
---|
... | ... | ... | ... | ... |
---|
2025-02-27 | 5.425325 | 33.513824 | 33.972694 | 0.586048 |
---|
2025-02-28 | 5.099775 | 32.180394 | 31.379215 | 1.339892 |
---|
2025-03-01 | 5.172124 | 33.130692 | 32.492016 | 1.984828 |
---|
2025-03-02 | 5.022895 | 32.426009 | 30.844026 | 2.765723 |
---|
2025-03-03 | 5.967683 | 33.106372 | 31.736647 | 1.691263 |
---|
1000 rows × 4 columns
plt.figure()
<Figure size 432x288 with 0 Axes>
<Figure size 432x288 with 0 Axes>
df.plot()
<AxesSubplot:>
plt.legend(loc='best')
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
<matplotlib.legend.Legend at 0x265646b7d90>
输入/输出数据
CSV
df.to_csv("foo.csv")
pd.read_csv("foo.csv")
| Unnamed: 0 | A | B | C | D |
---|
0 | 2022-06-08 | -1.411527 | -0.124331 | -0.748194 | 0.795625 |
---|
1 | 2022-06-09 | 0.327356 | 1.127876 | -0.176681 | -0.140429 |
---|
2 | 2022-06-10 | -0.546087 | 0.056621 | 0.879618 | 0.111533 |
---|
3 | 2022-06-11 | -0.723865 | -1.197658 | -0.134488 | 0.762858 |
---|
4 | 2022-06-12 | -0.584152 | -0.205798 | -0.457109 | 0.613583 |
---|
... | ... | ... | ... | ... | ... |
---|
995 | 2025-02-27 | 0.952618 | 0.809016 | -1.256770 | 0.544052 |
---|
996 | 2025-02-28 | -0.325551 | -1.333431 | -2.593479 | 0.753844 |
---|
997 | 2025-03-01 | 0.072350 | 0.950298 | 1.112801 | 0.644935 |
---|
998 | 2025-03-02 | -0.149229 | -0.704682 | -1.647990 | 0.780895 |
---|
999 | 2025-03-03 | 0.944789 | 0.680362 | 0.892620 | -1.074460 |
---|
1000 rows × 5 columns
HDF5
df.to_hdf("foo.h5","df")
pd
<module 'pandas' from 'D:\\software\\anaconda\\lib\\site-packages\\pandas\\__init__.py'>
pd.read_hdf("foo.h5","df")
| A | B | C | D |
---|
2022-06-08 | -1.411527 | -0.124331 | -0.748194 | 0.795625 |
---|
2022-06-09 | 0.327356 | 1.127876 | -0.176681 | -0.140429 |
---|
2022-06-10 | -0.546087 | 0.056621 | 0.879618 | 0.111533 |
---|
2022-06-11 | -0.723865 | -1.197658 | -0.134488 | 0.762858 |
---|
2022-06-12 | -0.584152 | -0.205798 | -0.457109 | 0.613583 |
---|
... | ... | ... | ... | ... |
---|
2025-02-27 | 0.952618 | 0.809016 | -1.256770 | 0.544052 |
---|
2025-02-28 | -0.325551 | -1.333431 | -2.593479 | 0.753844 |
---|
2025-03-01 | 0.072350 | 0.950298 | 1.112801 | 0.644935 |
---|
2025-03-02 | -0.149229 | -0.704682 | -1.647990 | 0.780895 |
---|
2025-03-03 | 0.944789 | 0.680362 | 0.892620 | -1.074460 |
---|
1000 rows × 4 columns
Excel
读取和写入MS Excel
df.to_excel("foo.xlsx",sheet_name="Sheet1")
pd.read_excel("foo.xlsx","Sheet1",index_col=None,na_values=['NA'])
| Unnamed: 0 | A | B | C | D |
---|
0 | 2022-06-08 | -1.411527 | -0.124331 | -0.748194 | 0.795625 |
---|
1 | 2022-06-09 | 0.327356 | 1.127876 | -0.176681 | -0.140429 |
---|
2 | 2022-06-10 | -0.546087 | 0.056621 | 0.879618 | 0.111533 |
---|
3 | 2022-06-11 | -0.723865 | -1.197658 | -0.134488 | 0.762858 |
---|
4 | 2022-06-12 | -0.584152 | -0.205798 | -0.457109 | 0.613583 |
---|
... | ... | ... | ... | ... | ... |
---|
995 | 2025-02-27 | 0.952618 | 0.809016 | -1.256770 | 0.544052 |
---|
996 | 2025-02-28 | -0.325551 | -1.333431 | -2.593479 | 0.753844 |
---|
997 | 2025-03-01 | 0.072350 | 0.950298 | 1.112801 | 0.644935 |
---|
998 | 2025-03-02 | -0.149229 | -0.704682 | -1.647990 | 0.780895 |
---|
999 | 2025-03-03 | 0.944789 | 0.680362 | 0.892620 | -1.074460 |
---|
1000 rows × 5 columns
|