1 写在前面的小知识点

series是一维的，而dataframe是二维的！
DataFrame对象既有行索引，又有列索引
行索引，是不同行的索引，index，0轴，axis=0
列索引，是不同列的索引，columns，1轴，axis=1

2 read_csv() 函数参数篇

panda.read_csv的功能：读取csv文本文件到DataFrame变量中。

pandas.read_csv(filepath_or_buffer, sep=‘, ‘, delimiter=None, header=‘infer’, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=‘infer’, thousands=None, decimal=b’.’, lineterminator=None, quotechar=‘"’, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=None, compact_ints=None, use_unsigned=None, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)[source]

下面介绍比较常用的参数：

filepath_or_buffer ：字符串，文件路径等等。

import pandas as pd

filepath = "heart.csv"
trainsets = pd.read_csv(filepath)
print(trainsets.head())  # 输出前几个数据（默认是5个）
print(trainsets.info())  # 输出表格信息

输出情况：

   age  sex  cp  trestbps  chol  fbs  ...  exang  oldpeak  slope  ca  thal  target
0   63    1   3       145   233    1  ...      0      2.3      0   0     1       1
1   37    1   2       130   250    0  ...      0      3.5      0   0     2       1
2   41    0   1       130   204    0  ...      0      1.4      2   0     2       1
3   56    1   1       120   236    0  ...      0      0.8      2   0     2       1
4   57    0   0       120   354    0  ...      1      0.6      2   0     2       1

[5 rows x 14 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       303 non-null    int64  
 1   sex       303 non-null    int64  
 2   cp        303 non-null    int64  
 3   trestbps  303 non-null    int64  
 4   chol      303 non-null    int64  
 5   fbs       303 non-null    int64  
 6   restecg   303 non-null    int64  
 7   thalach   303 non-null    int64  
 8   exang     303 non-null    int64  
 9   oldpeak   303 non-null    float64
 10  slope     303 non-null    int64  
 11  ca        303 non-null    int64  
 12  thal      303 non-null    int64  
 13  target    303 non-null    int64  
dtypes: float64(1), int64(13)
memory usage: 33.3 KB
None

sep和delimiter作为字符串的分隔符。

panda.read_csv(file, sep=',')

header：表头（不属于数据部分）
使用方法：
①如果不设置header，则默认第一行为header；
②如果数据有表头的话，header=0（不设置根据默认来）；
③如果数据无表头的话，则必须设置header=None，否则第一行将会被设置为表头！
index_col：标识行标而设置的列，默认是None
①如果数据集A列不是行标，则无需；
②如果数据集A列是行标，则index_col=0，表示数据集属性不包括第一列。