用Pandas Dataframe 做数据对比时踩的坑:
1.? String 转成 Dataframe后数据类型被置为了object
| cnt |
| 461 |
expect_table = request.context[entity]
logger.info(f"EXPECTED DATAFRAME: {expect_table}")
for field_count, field in enumerate(datatable.query_table.rows): # pylint: disable=W0612-
actualdf = actualdf.append(field, ignore_index=True)
2021-09-13 17:48:49,841 - test.step_defs.test_datalake_steps - INFO - EXPECTED DATAFRAME: cnt
0 461
2021-09-13 17:48:49,862 - test.step_defs.test_datalake_steps - INFO - Actual DATAFRAME: cnt
0 461
Attribute "dtype" are different
E [left]: object
E [right]: int64
2. 用pandas的数据类型转换方法pd.to_numeric解决:
datatable.query_table = parse_str_table(table_with_header)
expect_table = request.context[entity]
logger.info(f"EXPECTED DATAFRAME: {expect_table}")
actualdf = pd.DataFrame()
for field_count, field in enumerate(datatable.query_table.rows): # pylint: disable=W0612-
actualdf = actualdf.append(field, ignore_index=True)
actualdf=actualdf.apply(pd.to_numeric, errors='ignore')
logger.info(f"Actual DATAFRAME: {actualdf}")
assert_frame_equal(actualdf, expect_table)
assert actualdf.equals(expect_table, )
to_numeric具体用法参见:
pd.to_numeric - 晓尘 - 博客园
参数 erros='ignore'表示只对数字字符串转换,其它类型一律不转换,包含时间类型
|