IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 大数据 -> DGL分布式流程 -> 正文阅读

[大数据]DGL分布式流程

官网document

dgl distributed training user guide

interacting processes

dgl分布式中主要有server、sampler、trainer
在这里插入图片描述

  • Server processes run on each machine that stores a graph partition (this includes the graph structure and node/edge features). These servers work together to serve the graph data to trainers. Note that one machine may run multiple server processes simultaneously to parallelize computation as well as network communication.也就是说server不仅管数据,也管通信

  • Sampler processes interact with the servers and sample nodes and edges to generate mini-batches for training.

  • Trainers contain multiple classes to interact with servers. It has DistGraph to get access to partitioned graph data and has DistEmbedding and DistTensor to access the node/edge features/embeddings. It has DistDataLoader to interact with samplers to get mini-batches.
    注意DistEmbedding、DisGraph、DisTensor、DisDataLoader等这几个分布式API

API

initialize

this API builds connections with DGL servers and creates sampler processes

DisGraph

Each machine is responsible for one and only one partition. It loads the partition data (the graph structure and the node data and edge data in the partition) and makes it accessible to all trainers in the cluster.
注意这里DisGraph即有单机版也有分布式版
单机版:测试开发,可以测试下单机版DisGraph
分布式版:DistGraph connects with the servers in the cluster of machines and access them through the network. 说明server之间还是通信来传输图信息的啊

DisTensor

Currently, DGL does not provide protection for concurrent writes from multiple trainers when a machine runs multiple servers. This may result in data corruption. One way to avoid concurrent writes to the same row of data is to run one server process on a machine.

怎样在一台机器上运行一个server process?

DisEmbedding

Internally, distributed embeddings are built on top of distributed tensors, and, thus, has very similar behaviors to distributed tensors. For example, when embeddings are created, they are sharded and stored across all machines in the cluster. It can be uniquely identified by a name.

embedding所有机器共享的话,如果实现通信?Distensor是不是也是共享?

DisSampling

有两种level,但不论哪种level

low-level

需要自己写代码定义如何sample
dgl.sampling.sample_neighbors()
For the lower-level sampling API, it provides sample_neighbors() for distributed neighborhood sampling on DistGraph.

所以DisGraph是整张图而DisSampling是采样为minibatch

high-level

经典算法NodeDataLoader和EdgeDataLoader

异构

关于异构图的描述
Below is an example adjancency matrix of a heterogeneous graph showing the homogeneous ID assignment. Here, the graph has two types of nodes (T0 and T1 ), and four types of edges (R0, R1, R2, R3 ). There are a total of 400 nodes in the graph and each type has 200 nodes. Nodes of T0 have IDs in [0,200), while nodes of T1 have IDs in [200, 400). In this example, if we use a tuple to identify the nodes, nodes of T0 are identified as (T0, type-wise ID), where type-wise ID falls in [0, 200); nodes of T1 are identified as (T1, type-wise ID), where type-wise ID also falls in [0, 200).
在这里插入图片描述

DGL分布式脚本文件

copy_files.py

复制切割图和training脚本到指定机器上(via ip_config)
但是需要每台机器之间ssh无密访问

  • 这种方式是不是不需要NFS?
    在使用copy_files.py的时候发现需要.npy文件,但是之前切割的图里面并没有.npy文件,于是在使用partition_graph的时候,将reshuffle改成false,发现.npy文件出来了,可见下面👇
    If reshuffle=False, node IDs and edge IDs of a partition do not fall into contiguous
    ID ranges. In this case, DGL stores node/edge mappings (from
    node/edge IDs to partition IDs) in separate files (node_map.npy and edge_map.npy).
    The node/edge mappings are stored in numpy files.

launch.py

具体请参考我的博客
默认端口号是30050,当出现端口占用的时候需要kill掉相关进程

  大数据 最新文章
实现Kafka至少消费一次
亚马逊云科技:还在苦于ETL?Zero ETL的时代
初探MapReduce
【SpringBoot框架篇】32.基于注解+redis实现
Elasticsearch:如何减少 Elasticsearch 集
Go redis操作
Redis面试题
专题五 Redis高并发场景
基于GBase8s和Calcite的多数据源查询
Redis——底层数据结构原理
上一篇文章      下一篇文章      查看所有文章
加:2021-12-03 13:06:19  更:2021-12-03 13:07:06 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 -2025/1/17 14:09:08-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码