IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> Python知识库 -> 第二章:第一节数据清洗及特征处理-作业 -> 正文阅读

[Python知识库]第二章:第一节数据清洗及特征处理-作业

【回顾&引言】前面一章的内容大家可以感觉到我们主要是对基础知识做一个梳理,让大家了解数据分析的一些操作,主要做了数据的各个角度的观察。那么在这里,我们主要是做数据分析的流程性学习,主要是包括了数据清洗以及数据的特征处理,数据重构以及数据可视化。这些内容是为数据分析最后的建模和模型评价做一个铺垫。

开始之前,导入numpy、pandas包和数据

#加载所需的库
import numpy as np
import pandas as pd
#加载数据train.csv
df = pd.read_csv(r"C:\Users\Administrator\Desktop\hands-on-data-analysis-master\hands-on-data-analysis-master\第二章项目集合\train.csv")
df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS

2 第二章:数据清洗及特征处理

我们拿到的数据通常是不干净的,所谓的不干净,就是数据中有缺失值,有一些异常点等,需要经过一定的处理才能继续做后面的分析或建模,所以拿到数据的第一步是进行数据清洗,本章我们将学习缺失值、重复值、字符串和数据转换等操作,将数据清洗成可以分析或建模的亚子。

2.1 缺失值观察与处理

我们拿到的数据经常会有很多缺失值,比如我们可以看到Cabin列存在NaN,那其他列还有没有缺失值,这些缺失值要怎么处理呢

2.1.1 任务一:缺失值观察

(1) 请查看每个特征缺失值个数
(2) 请查看Age, Cabin, Embarked列的数据
以上方式都有多种方式,所以大家多多益善

#写入代码
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
#写入代码
df[df["Age"] == None] = 0
# df.head()
df

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

#写入代码
df[df["Age"].isnull()] = 0
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
888000000.00000.000000
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

df[df["Age"] == np.nan] = 0
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
888000000.00000.000000
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

2.1.2 任务二:对缺失值进行处理

(1)处理缺失值一般有几种思路

(2) 请尝试对Age列的数据的缺失值进行处理

(3) 请尝试使用不同的方法直接对整张表的缺失值进行处理

#处理缺失值的一般思路:
#提醒:可使用的函数有--->dropna函数与fillna函数
df["Age"].dropna(inplace = True)#将 DataFrame 与同一变量中的有效条目保持一致
df

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
888000000.00000.000000
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

#写入代码
df["Age"].fillna(0)#用 0 替换所有 NaN 元素。


0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     0.0
889    26.0
890    32.0
Name: Age, Length: 891, dtype: float64
#写入代码
df.fillna(0)


PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.25000S
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.92500S
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.05000S
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.00000S
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
888000000.00000.000000
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.75000Q

891 rows × 12 columns

#写入代码
df.dropna()

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
5000000.00000.000000
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
.......................................
878000000.00000.000000
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
888000000.00000.000000
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C

360 rows × 12 columns

【思考1】dropna和fillna有哪些参数,分别如何使用呢?

【思考】检索空缺值用np.nan,None以及.isnull()哪个更好,这是为什么?如果其中某个方式无法找到缺失值,原因又是为什么?

#思考回答



【参考】https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html

【参考】https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

2.2 重复值观察与处理

由于这样那样的原因,数据中会不会存在重复值呢,如果存在要怎样处理呢

2.2.1 任务一:请查看数据中的重复值

#写入代码
df[df.duplicated()]#查看重复值
df

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

2.2.2 任务二:对重复值进行处理

(1)重复值有哪些处理方式呢?

(2)处理我们数据的重复值

方法多多益善

#重复值有哪些处理方式:
df = df.drop_duplicates()#删除重复项
df

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ

891 rows × 12 columns

2.2.3 任务三:将前面清洗的数据保存为csv格式

#写入代码
df.to_csv("train_clear.csv")


2.3 特征观察与处理

我们对特征进行一下观察,可以把特征大概分为两大类:
数值型特征:Survived ,Pclass, Age ,SibSp, Parch, Fare,其中Survived, Pclass为离散型数值特征,Age,SibSp, Parch, Fare为连续型数值特征
文本型特征:Name, Sex, Cabin,Embarked, Ticket,其中Sex, Cabin, Embarked, Ticket为类别型文本特征,数值型特征一般可以直接用于模型的训练,但有时候为了模型的稳定性及鲁棒性会对连续变量进行离散化。文本型特征往往需要转换成数值型特征才能用于建模分析。

2.3.1 任务一:对年龄进行分箱(离散化)处理

(1) 分箱操作是什么?

(2) 将连续变量Age平均分箱成5个年龄段,并分别用类别变量12345表示

(3) 将连续变量Age划分为[0,5) [5,15) [15,30) [30,50) [50,80)五个年龄段,并分别用类别变量12345表示

(4) 将连续变量Age按10% 30% 50% 70% 90%五个年龄段,并用分类变量12345表示

(5) 将上面的获得的数据分别进行保存,保存为csv格式

#分箱操作是什么:



#写入代码 
#将连续变量Age平均分箱成5个年龄段,并分别用类别变量12345表示
df["Ageband"] = pd.cut(df["Age"],5,labels=[1,2,3,4,5])

df.head()
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeband
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS2
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C3
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS2
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S3
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS3
#写入代码
df.to_csv("train_avg.csv")

#写入代码
#将连续变量Age划分为(0,5] (5,15] (15,30] (30,50] (50,80]五个年龄段,并分别用类别变量12345表示
df["AgeBand"] = pd.cut(df["Age"],[0,5,15,30,50,80],labels=[1,2,3,4,5])
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeBand
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS3
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C4
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS3
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S4
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS4
..........................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS3
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S3
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNSNaN
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C3
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ4

891 rows × 13 columns

df["Ageaand"] = pd.cut(df["Age"],5,labels = [1,2,3,4,5])
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeBandAgeaand
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS32
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C43
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS32
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S43
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS43
.............................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS32
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S32
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNSNaNNaN
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C32
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ42

891 rows × 14 columns

df['Farebands'] = pd.cut(df['Fare'],5,labels = [1,2,3,4,5])
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeBandAgeaandFarebands
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS321
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C431
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS321
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S431
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS431
................................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS321
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S321
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNSNaNNaN1
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C321
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ421

891 rows × 15 columns

【参考】https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html

【参考】https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html

2.3.2 任务二:对文本变量进行转换

(1) 查看文本变量名及种类
(2) 将文本变量Sex, Cabin ,Embarked用数值变量12345表示
(3) 将文本变量Sex, Cabin, Embarked用one-hot编码表示

#写入代码
#(1) 查看文本变量名及种类
df['Sex'].value_counts()


male      577
female    314
Name: Sex, dtype: int64
#写入代码

df['Cabin'].value_counts()

B96 B98        4
C23 C25 C27    4
G6             4
C22 C26        3
E101           3
              ..
E12            1
F G63          1
D49            1
C87            1
A16            1
Name: Cabin, Length: 147, dtype: int64
#写入代码
df["Embarked"].value_counts()


S    644
C    168
Q     77
Name: Embarked, dtype: int64
df["Sex"].unique() #去重 并查看去重后元素种类
array(['male', 'female'], dtype=object)
df['Sex'].nunique()#去重 显示去重后元素个数
2
#(2) 将文本变量Sex, Cabin ,Embarked用数值变量12345表示
#replace 旧字符串赋值为新字符串
df['Sex_num'] = df["Sex"].replace(["male","female"],[1,2])
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeBandAgeaandFarebandsSex_num
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS3211
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C4312
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS3212
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S4312
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS4311
...................................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS3211
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S3212
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNSNaNNaN12
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C3211
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ4211

891 rows × 16 columns

#map 指定旧字符串 赋值给指定新字符串,用字典形式传入  注意  map()
df["Sex_nums"] = df['Sex'].map({"male":1,"female":2})
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeBandAgeaandFarebandsSex_numSex_nums
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS32111
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C43122
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS32122
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S43122
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS43111
......................................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS32111
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S32122
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNSNaNNaN122
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C32111
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ42111

891 rows × 17 columns

#将类别文本转换为one-hot编码

#方法一: OneHotEncoder
for feat in ["Age", "Embarked"]:
#     x = pd.get_dummies(df["Age"] // 6)
#     x = pd.get_dummies(pd.cut(df['Age'],5))
    x = pd.get_dummies(df[feat], prefix=feat)
    df = pd.concat([df, x], axis=1)
    #df[feat] = pd.get_dummies(df[feat], prefix=feat)

2.3.3 任务三:从纯文本Name特征里提取出Titles的特征(所谓的Titles就是Mr,Miss,Mrs等)

#extract 对于数据进行提取
# s = pd.Series(['a1', 'b2', 'c3'])
# s.str.extract(r'([ab])(\d)')
#写入代码

df["title"] = df.Name.str.extract("([A-Za-z]+)\.",expand = False)
df
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarkedAgeBandAgeaandFarebandsSex_numSex_numstitle
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS32111Mr
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C43122Mrs
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS32122Miss
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S43122Mrs
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS43111Mr
.........................................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS32111Rev
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S32122Miss
88888903Johnston, Miss. Catherine Helen "Carrie"femaleNaN12W./C. 660723.4500NaNSNaNNaN122Miss
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C32111Mr
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ42111Mr

891 rows × 18 columns

#保存最终你完成的已经清理好的数据
df.to_csv('train_deep.csv')
  Python知识库 最新文章
Python中String模块
【Python】 14-CVS文件操作
python的panda库读写文件
使用Nordic的nrf52840实现蓝牙DFU过程
【Python学习记录】numpy数组用法整理
Python学习笔记
python字符串和列表
python如何从txt文件中解析出有效的数据
Python编程从入门到实践自学/3.1-3.2
python变量
上一篇文章      下一篇文章      查看所有文章
加:2021-07-15 16:07:41  更:2021-07-15 16:10:07 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年12日历 -2024/12/25 14:54:20-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码
数据统计