利用python生成器读取海量数据
当一个大文件,只有一行,用特殊的分隔符分割. 用read或者readline一次性读取比如500G的文件是不具有操作性的. 封装一个分包读取的函数很有必要.
def read_big_file_in_one_line(f, separator):
"""生成器函数
:param f: 文件标识符
:param separator: 分隔符
"""
buf = ""
while True:
while separator in buf:
pos = buf.index(separator)
yield buf[:pos]
buf = buf[pos + len(separator):]
chunk = f.read(1024)
if not chunk:
yield buf
break
buf += chunk
生成器函数会自动返回yield的结果.
示例小数据 demodata.txt
jsjdljdjlf,dkjdljlksfjkds,dsddd,dsd
读取示例
with open("demodata.txt") as fid:
for line in read_big_file_in_one_line(fid, ","):
print(line)
结果:
jsjdljdjlf
dkjdljlksfjkds
dsddd
dsd
|