pandas学习(二) Filtering and Sorting Data
- drop_duplicates() 删除数据集中的重复项
- .loc() 只能使用标签索引,不能使用整数索引
- .iloc() 只能使用整数索引,不能使用标签索引
- .isin() 检查 DataFrame 是否包含指定的值
- .startswith() 选择以“”字母开头的数据
1.数据集-1
导入数据 data:image/s3,"s3://crabby-images/cfa14/cfa1438a03bea91cda2aafea1184241bba7a698e" alt="在这里插入图片描述"
1.1 有多少产品的价格超过 $10.00?
prices = [float(value[1 : -1]) for value in chipo.item_price]
chipo.item_price = prices
chipo_filtered = chipo.drop_duplicates(['item_name','quantity','choice_description'])
chipo_one_prod = chipo_filtered[chipo_filtered.quantity == 1]
chipo_one_prod
chipo_one_prod[chipo_one_prod['item_price']>10]
data:image/s3,"s3://crabby-images/0754b/0754b393149e5586771e5f3f41c1ad82c714af0e" alt="在这里插入图片描述"
1.2 每件物品的单价是多少?
chipo_filtered = chipo.drop_duplicates(['item_name','quantity'])
chipo[(chipo['item_name'] == 'Chicken Bowl') & (chipo['quantity'] == 1)]
data:image/s3,"s3://crabby-images/4ea2e/4ea2ee405eb2415f2ee25303be8c097626756c04" alt="在这里插入图片描述"
1.3 根据item_name排序
chipo.item_name.sort_values()
chipo.sort_values(by = "item_name")
data:image/s3,"s3://crabby-images/7f398/7f398fb52c41417b7aa6141a23f7baae99cd9e1d" alt="在这里插入图片描述"
1.4 订购的最贵商品的数量是多少?
chipo.sort_values(by = "item_price", ascending = False).head(1)
data:image/s3,"s3://crabby-images/3a9fa/3a9fa85923e7d96a6d402896c318c728b51f9dd1" alt="在这里插入图片描述"
1.5 通过len()统计Veggie Salad Bowl 被点了多少次
chipo_salad = chipo[chipo.item_name == "Veggie Salad Bowl"]
len(chipo_salad)
2.数据集·2
导入数据 data:image/s3,"s3://crabby-images/54060/54060eb9582426558370a9d7e4cbeb58660b4e73" alt="在这里插入图片描述"
2.1 仅查看“Team”、“Yellow Cards”和“Red Cards”列,并将它们分配给称为“discipline”的数据
discipline = euro12[['Team', 'Yellow Cards', 'Red Cards']]
discipline
data:image/s3,"s3://crabby-images/aea4b/aea4b1fe8c032f4ff7a42b8c1438daa8efb7ef96" alt="在这里插入图片描述"
2.2 按Red Cards和Yellow Cards排序
discipline.sort_values(['Red Cards', 'Yellow Cards'], ascending = False)
data:image/s3,"s3://crabby-images/7283d/7283dad5000020402075bde8da8d783ba6d2e38c" alt="在这里插入图片描述"
2.3 计算每支球队的平均黄牌
round(discipline['Yellow Cards'].mean())
2.4 选择出进球数高于6的球队
euro12[euro12.Goals > 6]
2.5 选择以 G 开头的团队
euro12[euro12.Team.str.startswith('G')]
data:image/s3,"s3://crabby-images/5ab06/5ab060a01043d2eca7a9a2f517eee1a284d1a95b" alt="在这里插入图片描述"
2.6 选择前7行
euro12.iloc[: , 0:7]
2.7 选择除最后 3 列之外的所有列。
euro12.iloc[: , :-3]
2.8 只展示 England, Italy and Russia 的 Shooting Accuracy
euro12.Team.isin(['England', 'Italy', 'Russia'])
data:image/s3,"s3://crabby-images/fc32a/fc32a3ae3bfac68dd98352b320a55ad1e015c6eb" alt="在这里插入图片描述"
euro12[euro12.Team.isin(['England', 'Italy', 'Russia'])]
data:image/s3,"s3://crabby-images/b92d7/b92d73d6da7e807b447f17cf8df5258cdf3954a7" alt="在这里插入图片描述"
euro12.loc[euro12.Team.isin(['England', 'Italy', 'Russia']), ['Team','Shooting Accuracy']]
data:image/s3,"s3://crabby-images/bae44/bae444c8c0a3d33eb0b777c155e2aee300c756a9" alt="在这里插入图片描述"
3.数据集-3
data:image/s3,"s3://crabby-images/47c8a/47c8a43a87b68a0b26be50a1a9aa0d6f38105218" alt="在这里插入图片描述"
3.1 将origin设置为dataframe的index
army.set_index('origin', inplace=True)
army.head()
data:image/s3,"s3://crabby-images/7cdee/7cdee69846f544481926e837b47781b7601af806" alt="在这里插入图片描述"
3.2 选择3到7行和3到6列
army.iloc[2:7, 2:6]
3.3 选择 deaths 大于 50
army[army["deaths"] > 50]
3.4 选择所有 regiments 名字不为 “Dragoons” 的
army[army["regiment"] != "Dragoons"]
3.5 选择叫 Texas 和 Arizona 的
army.loc[["Texas", "Arizona"], :]
3.6 选择名为 death 的列中的第三个单元格
army.loc[:, ["deaths"]].iloc[2]
|