第一次尝试用Python 连接clickhouse 数据库,踩了不少坑,特此记录,帮助后人少犯错!
运行环境:
- python 3.8.3
- clickhouse_driver==0.2.3
- clickhouse_sqlalchemy==0.2.0
- sqlalchemy==1.4.32
clickhouse_driver连接的两种方式
1.Client
借鉴网上的方法
from clickhouse_driver import Client
client = Client(host=host, port=8123, database=database,user=user ,password=pw)
sql = 'SHOW TABLES'
res = client.execute(sql)
报错:UnexpectedPacketFromServerError: Code: 102 原因:端口问题,HTTP协议(默认端口8123);TCP (Native)协议(默认端口号为9000),Python里的clickhouse_driver用的tcp端口9000,DBeaver使用的是HTTP协议所以可以使用8123端口。
修改后
from clickhouse_driver import Client
client = Client(host=host, port=9000, database=database,user=user ,password=pw)
sql = 'SHOW TABLES'
res = client.execute(sql)
报错:SocketTimeoutError: Code: 209. 原因:这里贴上 GitHub 上作者说的解决方案,传送门 发现这个错误的原因,也是因为没有设置9000端口?感觉很懵比。于是放弃了Client,试了一下另一种连接方式。
2.connect
from clickhouse_driver import connect
conn = connect(f'clickhouse://{user}:{pw}@{host}:9000/{database}')
cursor = conn.cursor()
cursor.execute('SHOW TABLES')
报了一样的错误,服了。 最后放弃了clickhouse_driver,尝试用clickhouse_sqlalchemy与sqlalchemy成功解决
clickhouse_sqlalchemy
直接附上成功连接的代码。
from clickhouse_sqlalchemy import make_session
from sqlalchemy import create_engine
import pandas as pd
conf = {
"user": "xxx",
"password": "xxx",
"server_host": "xx.xxx.xx.xxx",
"port": "8123",
"db": "xxx"
}
connection = 'clickhouse://{user}:{password}@{server_host}:{port}/{db}'.format(**conf)
engine = create_engine(connection, pool_size=100, pool_recycle=3600, pool_timeout=20)
sql = 'SHOW TABLES'
session = make_session(engine)
cursor = session.execute(sql)
try:
fields = cursor._metadata.keys
df = pd.DataFrame([dict(zip(fields, item)) for item in cursor.fetchall()])
finally:
cursor.close()
session.close()
|