1,解析没有嵌套的
2,解析有嵌套的
你必须知道的Pandas 解析json数据的函数-json_normalize()_Roach007的博客-CSDN博客写了好几篇文章了,今天写点很少人写但是很有用的!记得点赞收藏加关注哦。前言:Json数据介绍Json是一个应用及其广泛的用来传输和交换数据的格式,它被应用在数据库中,也被用于API请求结果数据集中。虽然它应用广泛,机器很容易阅读且节省空间,但是却不利于人来阅读和进一步做数据分析,因此通常情况下需要在获取json数据后,将其转化为表格格式的数据,以方便人来阅读和理解。常见的Json数据格式有2种,均以键值对的形式存储数据,只是包装数据的方法有所差异:a. 一般JSON对象采用{}将键值对数据括起来.https://blog.csdn.net/Roach007/article/details/119529772
pandas.io.json.json_normalize — pandas 0.17.0 documentationhttps://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.io.json.json_normalize.html json.dumps()和json.loads() - hjianhui - 博客园一、概念理解 1、json.dumps()和json.loads()是json格式处理函数(可以这么理解,json是字符串) (1)json.dumps()函数是将一个Python数据类型列表进行jshttps://www.cnblogs.com/hjianhui/p/10387057.html
3,例子:解析没有嵌套的
import pandas as pd
from presto_cli import presto_client
import numpy as np
import datetime
from pandas.io.json import json_normalize
import time
import json
import joblib
vsql ="""
SELECT DISTINCT apply_id
,name
,var_data
FROM
xxx
WHERE dt >= '2021-10-26' and xxx in ('xxx')
"""
class connectHiv(object):
def __init__(self,sql,path ="xxx",port = xxx, username="xxx", source="xxx"):
self.path = path
self.prot = port
self.username = username
self.source = source
self.sql = sql
self.getCursor()
def getCursor(self):
self.CURSOR =presto_client.connect(self.path, port=self.prot, username=self.username,
group="xxx", password='xxx', catalog="hive", schema="xxx",
).cursor()
def querySQL(self):
self.CURSOR.execute(self.sql)
result = self.CURSOR.fetchall()
return result
def getSql(self):
print(self.sql)
#获取数据库连接
cursor = connectHiv(sql=vsql)
result = cursor.querySQL()
#将数据存储成表格形式
df11111 = pd.DataFrame(result)
df11111.columns=['xxx','xxx']
df_to_use = df11111[df11111['apply_id'] != 'apply_id']
df_to_use.reset_index(inplace=True)
global false, null, true
false = null = true = ''
all_df =None
order_id_list = df_to_use['apply_id'].values.tolist() # 以order_id 或apply_id for循环
for ai in range(len(order_id_list)):
try:
data_dump = [json.loads(df_to_use['var_data'][ai])]
# jsonformat = json.dumps(data_dump, sort_keys=True, indent=4, separators=(',', ': '))
df1 = pd.DataFrame.from_dict(json_normalize(data_dump), orient='columns')
dforder=pd.DataFrame()
dforder['apply_id']=[df_to_use['apply_id'][ai]]
#dforder['business_code'] = [df_to_use['business_code'][ai]]
df = pd.merge(dforder,df1,left_index=True,right_index=True)
if all_df is None:
all_df = df
print(ai)
else:
all_df = pd.concat([all_df, df])
print(ai)
except:
print(df_to_use['id'][ai])
f = "/失败订单记录.txt"
a = 1
false_order = str(df_to_use['id'][ai])
with open(f,"a") as file:
for i in range(a):
file.write(false_order+"\n")
a +=1
all_df=all_df.drop_duplicates()
df_to_use_dup = df_to_use.drop_duplicates('apply_id')
final_df = pd.merge(df_to_use_dup,all_df,how='left',on='apply_id')
|