df[“column”] 和 df.column 的区别
import numpy as np
import pandas as pd
columns = ["name","id"]
names = ["wangfang","zhangsan","jeff","peter","hell"]
ids = [i** 2 for i in range(5)]
df = pd.DataFrame(data=np.array([names,ids]).T,columns=columns)
df
| name | id |
---|
0 | wangfang | 0 |
---|
1 | zhangsan | 1 |
---|
2 | jeff | 4 |
---|
3 | peter | 9 |
---|
4 | hell | 16 |
---|
- 通过 df.column 和 df[“column”] 的方式都可以得到列值
- 但是 df.column 的方式有限制条件:那就是 column 的名称必须是有效的
有效的列名演示
df.name
0 wangfang
1 zhangsan
2 jeff
3 peter
4 hell
Name: name, dtype: object
df["name"]
0 wangfang
1 zhangsan
2 jeff
3 peter
4 hell
Name: name, dtype: object
无效列名演示
series_ = pd.Series([24,38,66,52,31])
df["年龄~是"]= series_
df
| name | id | 年龄~是 |
---|
0 | wangfang | 0 | 24 |
---|
1 | zhangsan | 1 | 38 |
---|
2 | jeff | 4 | 66 |
---|
3 | peter | 9 | 52 |
---|
4 | hell | 16 | 31 |
---|
df["年龄~是"]
0 24
1 38
2 66
3 52
4 31
Name: 年龄~是, dtype: int64
df.column 不能创建列也不能修改df
df["add"] = [1,2,3,4,5]
df
| name | id | 年龄~是 | add |
---|
0 | wangfang | 0 | 24 | 1 |
---|
1 | zhangsan | 1 | 38 | 2 |
---|
2 | jeff | 4 | 66 | 3 |
---|
3 | peter | 9 | 52 | 4 |
---|
4 | hell | 16 | 31 | 5 |
---|
df["add"][0]= 100
df
D:\Anaconda3\envs\data\lib\site-packages\ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
"""Entry point for launching an IPython kernel.
| name | id | 年龄~是 | add |
---|
0 | wangfang | 0 | 24 | 100 |
---|
1 | zhangsan | 1 | 38 | 2 |
---|
2 | jeff | 4 | 66 | 3 |
---|
3 | peter | 9 | 52 | 4 |
---|
4 | hell | 16 | 31 | 5 |
---|
df.add = [1,6,4,2,2]
df.add[0] = 10
df.add_ = [0,0,0,0,0]
df
| name | id | 年龄~是 | add |
---|
0 | wangfang | 0 | 24 | 100 |
---|
1 | zhangsan | 1 | 38 | 2 |
---|
2 | jeff | 4 | 66 | 3 |
---|
3 | peter | 9 | 52 | 4 |
---|
4 | hell | 16 | 31 | 5 |
---|
index 可重复不可修改
index 是可重复的
data = np.array([1,2,4,5,8])
series = pd.Series(data=data,index=['a','a','b','c','c'])
series
a 1
a 2
b 4
c 5
c 8
dtype: int32
series["a"]
a 1
a 2
dtype: int32
index 是不可以单独修改的
series.index[0] = "4"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-a68b0d9a21b2> in <module>
----> 1 series.index[0] = "4"
D:\Anaconda3\envs\data\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
4082
4083 def __setitem__(self, key, value):
-> 4084 raise TypeError("Index does not support mutable operations")
4085
4086 def __getitem__(self, key):
TypeError: Index does not support mutable operations
index 可以批量替换
- 但是要保证给出的 index 数组和原数据长度一致
- 通过 reindex 方法进行批量替换则不需要保持数据长度一致;因为不存在的项会按照 NAN 值填补
series.index = [str(i) for i in range(5)]
series
0 1
1 2
2 4
3 5
4 8
dtype: int32
series.reindex([str(i) for i in range(6)])
0 1.0
1 2.0
2 4.0
3 5.0
4 8.0
5 NaN
dtype: float64
- 当然这种 NaN 值可以用 method 中的方法进行填补,选择 ffill 就是用前一个值进行填补
series.reindex([str(i) for i in range(6)],method="ffill")
0 1
1 2
2 4
3 5
4 8
5 8
dtype: int32
reindex 的常用参数
- method:插值的方式 “ffill” “bfill” 前项填充和后项填充
- fill_value:用某个特殊值进行填充
- limit: 前项或者后项填充时所需填充的最大尺寸间隙
- tolerance: 前项或者后项填充时,所需填充的不精确匹配下的最大尺寸间隙
- level
- copy
|