Pytorch Load Dataset 多线程加载读取数据
单线程读取数据时
以agnews dataset为例,num_worker=1 时读取时间如下:
Load Test Data Spends 12.183895587921143 seconds Load Test Data Spends 200.42685055732727 seconds
DataLoader(dataset, num_workers=2,collate_fn=collate_fn) 时
Load Test Data Spends 11.577017307281494 seconds Load Train Data Spends 199.58622908592224 seconds
DataLoader(dataset, num_workers=4,collate_fn=collate_fn) 时
Load Test Data Spends 11.68491816520691 seconds Load Train Data Spends 183.27479600906372 seconds
DataLoader(dataset, num_workers=8,collate_fn=collate_fn) 时
Load Test Data Spends 11.205335140228271 seconds Load Train Data Spends 183.1354115009308 seconds
此时限制为GPU,CPU加载的矩阵
→
\rightarrow
→GPU流水线。流水线处理时间是定死的。
GPU多卡:
|