[大数据] HDFS 自定义实现函数将文件追加到末尾的问题

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 大数据 -> HDFS 自定义实现函数将文件追加到末尾的问题 -> 正文阅读

[大数据]HDFS 自定义实现函数将文件追加到末尾的问题

HDFS 自定义实现函数将文件追加到末尾的问题：

在这里插入图片描述

一、实验环境：

Ubuntu16.04
Hadoop2.7.1 伪分布式（只有一个DN）
Eclipse

二、解决方案

Java代码：

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.*;

public class HDFSApi {
    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 复制文件到指定路径
     * 若路径已存在，则进行覆盖
     */
    public static void copyFromLocalFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path localPath = new Path(localFilePath);
        Path remotePath = new Path(remoteFilePath);
        /* fs.copyFromLocalFile 第一个参数表示是否删除源文件，第二个参数表示是否覆盖 */
        fs.copyFromLocalFile(false, true, localPath, remotePath);
        fs.close();
    }

    /**
     * 追加文件内容
     */
    public static void appendToFile(Configuration conf, String localFilePath, String remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        /* 创建一个文件读入流 */
        FileInputStream in = new FileInputStream(localFilePath);
        /* 创建一个文件输出流，输出的内容将追加到文件末尾 */
        FSDataOutputStream out = fs.append(remotePath);
        /* 读写文件内容 */
        byte[] data = new byte[1024];
        int read = -1;
        while ( (read = in.read(data)) > 0 ) {
            out.write(data, 0, read);
        }
        out.close();
        in.close();
        fs.close();
    }

    /**
     * 主函数
     */
    public static void main(String[] args) {
        Configuration conf = new Configuration();
    	conf.set("fs.default.name","hdfs://localhost:9000");
        String localFilePath = "/home/hadoop/text.txt";    // 本地路径
        String remoteFilePath = "/user/hadoop/text.txt";    // HDFS路径
        String choice = "append";    // 若文件存在则追加到文件末尾
//      String choice = "overwrite";    // 若文件存在则覆盖

        try {
            /* 判断文件是否存在 */
            Boolean fileExists = false;
            if (HDFSApi.test(conf, remoteFilePath)) {
                fileExists = true;
                System.out.println(remoteFilePath + " 已存在.");
            } else {
                System.out.println(remoteFilePath + " 不存在.");
            }
            /* 进行处理 */
            if ( !fileExists) { // 文件不存在，则上传
                HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已上传至 " + remoteFilePath);
            } else if ( choice.equals("overwrite") ) {    // 选择覆盖
                HDFSApi.copyFromLocalFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);
            } else if ( choice.equals("append") ) {   // 选择追加
                HDFSApi.appendToFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已追加至 " + remoteFilePath);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

报错信息：Failed to replace a bad datanode the existing pipeline to no more good datanodes begin g available to try.

在这里插入图片描述

直观判定为文件在pineline传输中DN被认为是坏的数据节点，需要新的好的数据节点来确保文件在pineline中传输正常。

官网说明：hdfs-default.xml配置文件

如果写入管道中存在数据节点/网络故障，DFSClient 将尝试从管道中删除失败的数据节点，然后继续使用其余数据节点进行写入。因此，管道中的数据节点数会减少。该功能是向管道添加新的数据节点。这是用于启用/禁用该功能的站点范围的属性(dfs.client.block.write.replace-datanode-on-failure.policy)。当集群大小非常小时（例如 3 个节点或更少），集群管理员可能希望在默认配置文件中将策略设置为 NEVER 或禁用此功能。否则，用户可能会遇到异常高的管道故障率，因为无法找到新的数据节点进行替换。

而且，仅当 dfs.client.block.write.replace-datanode-on-failure.enable 的值为 true 时，才使用此属性。ALWAYS：删除现有数据节点时，始终添加新的数据节点。NEVER：从不添加新的数据节点。默认值：让 r 作为复制编号。设 n 为现有数据节点的数量。仅当 r 大于或等于 3 且（1） floor（r/2）大于或等于 n 时，才添加新的数据节点;或（2） r 大于 n，并且块被hflushed/appended。

在这里插入图片描述

方法一：在Java代码main函数中加入以下两行代码：

conf.set("dfs.client.block.write.replace-datanode-on-failure.policy","NEVER"); 
conf.set("dfs.client.block.write.replace-datanode-on-failure.enable","true");

方法二：在hdfs-site.xml中加入以下代码：

<property>
	<name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
	<value>NEVER</value>
</property>

三、注意点

一般来说，如果集群中DN个数小于等于3 （本机器采用伪分布式模式，只有一个DN，但是为了测试方便，直接开启即可）都不建议开启

创作打卡挑战赛

赢取流量/现金/CSDN周边激励大奖

大数据最新文章

实现Kafka至少消费一次

亚马逊云科技：还在苦于ETL？Zero ETL的时代

初探MapReduce

【SpringBoot框架篇】32.基于注解+redis实现

Elasticsearch：如何减少 Elasticsearch 集

Go redis操作

Redis面试题

专题五 Redis高并发场景

基于GBase8s和Calcite的多数据源查询

Redis——底层数据结构原理

加:2022-05-08 08:10:56 更:2022-05-08 08:11:11

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/6 11:10:28-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码