[大数据] Zookeeper源码解析-快照日志的查看与分析

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> Zookeeper源码解析-快照日志的查看与分析 -> 正文阅读

[大数据]Zookeeper源码解析-快照日志的查看与分析

前言：

前一篇文章中分析了事务日志的相关内容。在Zookeeper中，还有一个重要的日志就是快照日志。

快照日志本质上是Zookeeper全部节点信息的一个快照，从内存中保存在磁盘上。

1.查看快照日志

快照日志默认存储在?%ZOOKEEPER_DIR%/data/文件夹下，笔者的目录下产生了如下一个快照日志文件

同样的，这也是一个二进制文件，无法直接查看。而Zookeeper也提供了一个查看类org.apache.zookeeper.server.SnapshotFormatter。通过在main()方法中指定快照日志路径即可，笔者在查看snapshot.123c2文件时产生以下内容：

ZNode Details (count=74685):
----
/
  cZxid = 0x00000000000000
  ctime = Thu Jan 01 08:00:00 GMT+08:00 1970
  mZxid = 0x00000000000000
  mtime = Thu Jan 01 08:00:00 GMT+08:00 1970
  pZxid = 0x000000000123c2
  cversion = 74681
  dataVersion = 0
  aclVersion = 0
  ephemeralOwner = 0x00000000000000
  dataLength = 0
----
/hello24507
  cZxid = 0x00000000004c3b
  ctime = Tue Oct 05 17:15:30 GMT+08:00 2021
  mZxid = 0x00000000004c3b
  mtime = Tue Oct 05 17:15:30 GMT+08:00 2021
  pZxid = 0x00000000004c3b
  cversion = 0
  dataVersion = 0
  aclVersion = 0
  ephemeralOwner = 0x00000000000000
  dataLength = 10
----
/hello24508
  cZxid = 0x00000000004c3c
  ctime = Tue Oct 05 17:15:30 GMT+08:00 2021
  mZxid = 0x00000000004c3c
  mtime = Tue Oct 05 17:15:30 GMT+08:00 2021
  pZxid = 0x00000000004c3c
  cversion = 0
  dataVersion = 0
  aclVersion = 0
  ephemeralOwner = 0x00000000000000
  dataLength = 10
----
...

可以看到以上都是每个节点的基本信息（当然没有把节点value值展示出来）。

2.快照日志的生成入口

快照日志是在哪里生成的呢？在之前事务日志查看与分析中，我们有过分析，就是在SyncRequestProcessor?中生成的。代码如下

public class SyncRequestProcessor extends ZooKeeperCriticalThread implements RequestProcessor {
 
    // 默认为100000，后续会用到
    private static int snapCount = ZooKeeperServer.getSnapCount();
    
    // 后续会被设置
    private static int randRoll;
    
    public void run() {
        try {
            int logCount = 0;

            // 设置randRoll值为一个不定值
            setRandRoll(r.nextInt(snapCount/2));
            while (true) {
                Request si = null;
                if (toFlush.isEmpty()) {
                    si = queuedRequests.take();
                } else {
                    si = queuedRequests.poll();
                    if (si == null) {
                        flush(toFlush);
                        continue;
                    }
                }
                if (si == requestOfDeath) {
                    break;
                }
                if (si != null) {
                    // track the number of records written to the log
                    if (zks.getZKDatabase().append(si)) {
                        logCount++;
                        // 添加完事务日志后，判断总共添加的事务日志数是否大于snapCount / 2 + randRoll，snapCount默认为100000，
                        // 也就是说至少执行50000+次事务操作才会生成一次快照日志
                        if (logCount > (snapCount / 2 + randRoll)) {
                            setRandRoll(r.nextInt(snapCount/2));
                            // roll the log
                            zks.getZKDatabase().rollLog();
                            // take a snapshot
                            if (snapInProcess != null && snapInProcess.isAlive()) {
                                LOG.warn("Too busy to snap, skipping");
                            } else {
                                snapInProcess = new ZooKeeperThread("Snapshot Thread") {
                                        public void run() {
                                            try {
                                                // 启动一个子线程单独执行快照日志生成
                                                zks.takeSnapshot();
                                            } catch(Exception e) {
                                                LOG.warn("Unexpected exception", e);
                                            }
                                        }
                                    };
                                snapInProcess.start();
                            }
                            logCount = 0;
                        }
                    } else if (toFlush.isEmpty()) {
                        // optimization for read heavy workloads
                        // iff this is a read, and there are no pending
                        // flushes (writes), then just pass this to the next
                        // processor
                        if (nextProcessor != null) {
                            nextProcessor.processRequest(si);
                            if (nextProcessor instanceof Flushable) {
                                ((Flushable)nextProcessor).flush();
                            }
                        }
                        continue;
                    }
                    toFlush.add(si);
                    if (toFlush.size() > 1000) {
                        flush(toFlush);
                    }
                }
            }
        } catch (Throwable t) {
            handleException(this.getName(), t);
            running = false;
        }
        LOG.info("SyncRequestProcessor exited!");
    }
}

可以看到，快照日志生成的入口就是SyncRequestProcessor，单独启动线程来完成日志生成（我们可以自定义snapCount，笔者测试的时候就是重新设置该值，不然很难看到snapshot log的生成）。

3.ZookeeperServer.takeSnapshot() 生成快照日志

public class ZooKeeperServer implements SessionExpirer, ServerStats.Provider {
	public void takeSnapshot(){

        try {
            // 直接调用FileTxnSnapLog.save，具体见3.1
            txnLogFactory.save(zkDb.getDataTree(), zkDb.getSessionWithTimeOuts());
        } catch (IOException e) {
            LOG.error("Severe unrecoverable error, exiting", e);
            // This is a severe error that we cannot recover from,
            // so we need to exit
            System.exit(10);
        }
    }
}

3.1 FileTxnSnapLog.save()

public class FileTxnSnapLog {
	public void save(DataTree dataTree,
            ConcurrentHashMap<Long, Integer> sessionsWithTimeouts)
        throws IOException {
        long lastZxid = dataTree.lastProcessedZxid;
        // 获取最新一次zxid，以此生成一个文件名
        File snapshotFile = new File(snapDir, Util.makeSnapshotName(lastZxid));
        LOG.info("Snapshotting: 0x{} to {}", Long.toHexString(lastZxid),
                snapshotFile);
        // 序列化dataTree内存信息
        snapLog.serialize(dataTree, sessionsWithTimeouts, snapshotFile);
        
    }
}

3.2 FileSnap.serialize() 真正的序列化操作

public class FileSnap implements SnapShot {
	public synchronized void serialize(DataTree dt, Map<Long, Integer> sessions, File snapShot)
            throws IOException {
        if (!close) {
            OutputStream sessOS = new BufferedOutputStream(new FileOutputStream(snapShot));
            CheckedOutputStream crcOut = new CheckedOutputStream(sessOS, new Adler32());
            //CheckedOutputStream cout = new CheckedOutputStream()
            OutputArchive oa = BinaryOutputArchive.getArchive(crcOut);
            // 同样是先创建文件头
            FileHeader header = new FileHeader(SNAP_MAGIC, VERSION, dbId);
            // 序列化DataTree，具体见3.2.1 
            serialize(dt,sessions,oa, header);
            // 写入checksum值
            long val = crcOut.getChecksum().getValue();
            oa.writeLong(val, "val");
            oa.writeString("/", "path");
            sessOS.flush();
            crcOut.close();
            sessOS.close();
        }
    }
}

3.2.1 SerializeUtils.serializeSnapshot()

public class SerializeUtils {
	public static void serializeSnapshot(DataTree dt,OutputArchive oa,
            Map<Long, Integer> sessions) throws IOException {
        HashMap<Long, Integer> sessSnap = new HashMap<Long, Integer>(sessions);
        // 先将sessionId --> timeout值写入
        oa.writeInt(sessSnap.size(), "count");
        for (Entry<Long, Integer> entry : sessSnap.entrySet()) {
            oa.writeLong(entry.getKey().longValue(), "id");
            oa.writeInt(entry.getValue().intValue(), "timeout");
        }
        
        // 调用DataTree的序列化方法
        dt.serialize(oa, "tree");
    }
}

3.2.2 DataTree.serialize() 序列化DataTree内容

public class DataTree {
	public void serialize(OutputArchive oa, String tag) throws IOException {
        scount = 0;
        aclCache.serialize(oa);
        serializeNode(oa, new StringBuilder(""));
        // / marks end of stream
        // we need to check if clear had been called in between the snapshot.
        if (root != null) {
            oa.writeString("/", "path");
        }
    }
}

有关于DataTree序列化的具体细节就不再详述了，读者可以自行查看。主要就是将DataTree中的所有节点逐个序列化到文件中，包括节点的path value acl等相关信息。