IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> 【神经网络并行训练】基于MapReduce的BP算法实现 -> 正文阅读

[人工智能]【神经网络并行训练】基于MapReduce的BP算法实现

前言

最近看了一些基于MapReduce的神经网络并行训练方面的论文,老师让我自己去实现一下,更深入的体会其中的原理。

MapReduce是基于java语言的框架,于是一开始想用java写深度学习代码。但是dl4j框架实在太难用了,而且网络上的深度学习教程都是基于python的,所以最终还是决定用python去实现基于MapReduce框架的神经网络。如果行不通的话,后面再考虑用java实现神经网络。

目前大致的学习步骤如下:
1、Ubuntu下安装python3
参考链接
2、Python实现最简单的MapReduce例子,如Wordcount
参考链接1:pycharm运行
参考链接2:命令行运行
3、Pycharm(linux)+Hadoop+Spark(环境搭建)
参考链接
4、MapReduce实现BP算法并行化
5、复现论文代码

一、Ubuntu下安装Python3

输入python -V查看版本,发现python命令不可用

root@ubuntu:/# python -V

Command 'python' not found, did you mean:

  command 'python3' from deb python3
  command 'python' from deb python-is-python3

进入/usr/bin,输入ls -l | grep python,发现python3链接到python3.8

root@ubuntu:/usr/bin# ls -l | grep python 
lrwxrwxrwx 1 root root          23 Jun  2 03:49 pdb3.8 -> ../lib/python3.8/pdb.py
lrwxrwxrwx 1 root root          31 Sep 23  2020 py3versions -> ../share/python3/py3versions.py
lrwxrwxrwx 1 root root           9 Sep 23  2020 python3 -> python3.8
-rwxr-xr-x 1 root root     5490352 Jun  2 03:49 python3.8
lrwxrwxrwx 1 root root          33 Jun  2 03:49 python3.8-config -> x86_64-linux-gnu-python3.8-config
lrwxrwxrwx 1 root root          16 Mar 13  2020 python3-config -> python3.8-config
-rwxr-xr-x 1 root root         384 Mar 27  2020 python3-futurize
-rwxr-xr-x 1 root root         388 Mar 27  2020 python3-pasteurize
-rwxr-xr-x 1 root root        3241 Jun  2 03:49 x86_64-linux-gnu-python3.8-config
lrwxrwxrwx 1 root root          33 Mar 13  2020 x86_64-linux-gnu-python3-config -> x86_64-linux-gnu-python3.8-config

输入python3 -V,发现Ubuntu自带python3,并且版本为3.8.10

root@ubuntu:/# python3 -V
Python 3.8.10

输入pip3 -V查看版本

root@ubuntu:/# pip3 -V
pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.8)

输入pip3 list查看包

root@ubuntu:/# pip3 list
Package                Version             
---------------------- --------------------
apturl                 0.5.2               
bcrypt                 3.1.7               
blinker                1.4                 
Brlapi                 0.7.0               
certifi                2019.11.28          
chardet                3.0.4               
Click                  7.0                 
colorama               0.4.3               
command-not-found      0.3                 
cryptography           2.8

至此,说明Ubuntu具备python3环境。

二、python实现WordCount

1、代码
mapper.py文件

#!/usr/bin/env python3
import sys
for line in sys.stdin:
    line = line.strip()
    words = line.split()
    for word in words:
        print("%s\t%s" % (word, 1))

reducer.py文件

#!/usr/bin/env python3
from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
    word, count = line.split('\t', 1)
    #word, count = line.split()
    try:
        count = int(count)
    except ValueError:  #count如果不是数字的话,直接忽略掉
        continue
    if current_word == word:
        current_count += count
    else:
        if current_word:
            print("%s\t%s" % (current_word, current_count))
        current_count = count
        current_word = word

if word == current_word:  #不要忘记最后的输出
    print("%s\t%s" % (current_word, current_count))

注意#!/usr/bin/env python3,因为本机没有配置环境变量,python命令不生效,所以这里必须写python3
2、在本机上测试代码
在/opt/PycharmProjects/MapReduce/WordCount文件夹下
为了保险起见可以先赋予运行权限

root@ubuntu:/opt/PycharmProjects/MapReduce/WordCount# chmod +x mapper.py
root@ubuntu:/opt/PycharmProjects/MapReduce/WordCount# chmod +x reducer.py

测试mapper.py程序

root@ubuntu:/opt/PycharmProjects/MapReduce/WordCount# echo "aa bb cc dd aa cc" | python3 mapper.py
aa	1
bb	1
cc	1
dd	1
aa	1
cc	1

测试reducer.py程序

root@ubuntu:/opt/PycharmProjects/MapReduce/WordCount# echo "foo foo quux labs foo bar quux" | python3 mapper.py | sort -k1,1 | python3 reducer.py
bar	1
foo	3
labs	1
quux	2

三、在Hadoop上运行Python程序

1、下载文本文件

root@ubuntu:/home/wuwenjing/Downloads/dataset/guteberg# wget http://www.gutenberg.org/files/5000/5000-8.txt
root@ubuntu:/home/wuwenjing/Downloads/dataset/guteberg# wget http://www.gutenberg.org/cache/epub/20417/pg20417.txt

2、在HDFS上创建文件夹,把这两本书传到HDFS上

root@ubuntu:/hadoop/hadoop-2.9.2/sbin# hdfs dfs -mkdir /user/MapReduce/input
root@ubuntu:/hadoop/hadoop-2.9.2/sbin# hdfs dfs -put /home/wuwenjing/Downloads/dataset/gutenberg/*.txt /user/MapReduce/input

3、寻找streaming的jar文件存放地址,找到share文件夹中的hadoop-straming*.jar文件

root@ubuntu:/hadoop# find ./ -name "*streaming*.jar"
./hadoop-2.9.2/share/hadoop/tools/lib/hadoop-streaming-2.9.2.jar
./hadoop-2.9.2/share/hadoop/tools/sources/hadoop-streaming-2.9.2-sources.jar
./hadoop-2.9.2/share/hadoop/tools/sources/hadoop-streaming-2.9.2-test-sources.jar

4、由于通过streaming接口运行的脚本太长了,因此直接建立一个shell名称为run.sh来运行:

root@ubuntu:/hadoop/hadoop-2.9.2# vim run.sh
hadoop jar $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-*.jar \
-file /opt/PycharmProjects/MapReduce/WordCount/mapper.py -mapper  /opt/PycharmProjects/MapReduce/WordCount/mapper.py \
-file /opt/PycharmProjects/MapReduce/WordCount/reducer.py -reducer  /opt/PycharmProjects/MapReduce/WordCount/reducer.py \
-input /user/MapReduce/input/*.txt -output /user/MapReduce/output
root@ubuntu:/hadoop/hadoop-2.9.2# source run.sh

这里报错,报错信息如下:

21/08/30 23:24:34 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
21/08/30 23:24:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
packageJobJar: [/opt/PycharmProjects/MapReduce/WordCount/mapper.py, /opt/PycharmProjects/MapReduce/WordCount/reducer.py] [] /tmp/streamjob4294615261368991400.jar tmpDir=null
21/08/30 23:24:35 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
21/08/30 23:24:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
21/08/30 23:24:35 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
21/08/30 23:24:35 INFO mapred.FileInputFormat: Total input files to process : 2
21/08/30 23:24:35 INFO mapreduce.JobSubmitter: number of splits:2
21/08/30 23:24:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1574514939_0001
21/08/30 23:24:36 INFO mapred.LocalDistributedCacheManager: Localized file:/opt/PycharmProjects/MapReduce/WordCount/mapper.py as file:/hadoop/hadoop-2.9.2/tmp/mapred/local/1630391076183/mapper.py
21/08/30 23:24:36 INFO mapred.LocalDistributedCacheManager: Localized file:/opt/PycharmProjects/MapReduce/WordCount/reducer.py as file:/hadoop/hadoop-2.9.2/tmp/mapred/local/1630391076184/reducer.py
21/08/30 23:24:36 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
21/08/30 23:24:36 INFO mapreduce.Job: Running job: job_local1574514939_0001
21/08/30 23:24:36 INFO mapred.LocalJobRunner: OutputCommitter set in config null
21/08/30 23:24:36 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
21/08/30 23:24:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
21/08/30 23:24:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
21/08/30 23:24:36 INFO mapred.LocalJobRunner: Waiting for map tasks
21/08/30 23:24:36 INFO mapred.LocalJobRunner: Starting task: attempt_local1574514939_0001_m_000000_0
21/08/30 23:24:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
21/08/30 23:24:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
21/08/30 23:24:36 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
21/08/30 23:24:36 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/MapReduce/input/5000-8.txt:0+1428843
21/08/30 23:24:36 INFO mapred.MapTask: numReduceTasks: 1
21/08/30 23:24:36 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
21/08/30 23:24:36 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
21/08/30 23:24:36 INFO mapred.MapTask: soft limit at 83886080
21/08/30 23:24:36 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
21/08/30 23:24:36 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
21/08/30 23:24:36 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
21/08/30 23:24:36 INFO streaming.PipeMapRed: PipeMapRed exec [/hadoop/hadoop-2.9.2/sbin/./mapper.py]
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.work.output.dir is deprecated. Instead, use mapreduce.task.output.dir
21/08/30 23:24:36 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.local.dir is deprecated. Instead, use mapreduce.cluster.local.dir
21/08/30 23:24:36 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
21/08/30 23:24:36 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
21/08/30 23:24:36 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
21/08/30 23:24:36 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
21/08/30 23:24:36 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:36 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:36 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:36 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:36 INFO streaming.PipeMapRed: Records R/W=2820/1
Traceback (most recent call last):
  File "/hadoop/hadoop-2.9.2/sbin/./mapper.py", line 3, in <module>
    for line in sys.stdin:
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 2092: invalid continuation byte
21/08/30 23:24:36 INFO streaming.PipeMapRed: MRErrorThread done
21/08/30 23:24:36 INFO streaming.PipeMapRed: R/W/S=2820/5129/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=wuwenjing
HADOOP_USER=null
last tool output: |Books,	1ed	1-Have	1ement	111|

java.io.IOException: Broken pipe
	at java.io.FileOutputStream.writeBytes(Native Method)
	at java.io.FileOutputStream.write(FileOutputStream.java:326)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
	at java.io.DataOutputStream.write(DataOutputStream.java:107)
	at org.apache.hadoop.streaming.io.TextInputWriter.writeUTF8(TextInputWriter.java:72)
	at org.apache.hadoop.streaming.io.TextInputWriter.writeValue(TextInputWriter.java:51)
	at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:106)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/08/30 23:24:36 WARN streaming.PipeMapRed: java.io.IOException: Stream closed
21/08/30 23:24:36 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
	at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:120)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/08/30 23:24:36 WARN streaming.PipeMapRed: java.io.IOException: Stream closed
21/08/30 23:24:36 INFO streaming.PipeMapRed: PipeMapRed failed!
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/08/30 23:24:36 INFO mapred.LocalJobRunner: Starting task: attempt_local1574514939_0001_m_000001_0
21/08/30 23:24:36 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
21/08/30 23:24:36 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
21/08/30 23:24:36 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
21/08/30 23:24:36 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/MapReduce/input/pg20417.txt:0+674570
21/08/30 23:24:36 INFO mapred.MapTask: numReduceTasks: 1
21/08/30 23:24:36 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
21/08/30 23:24:36 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
21/08/30 23:24:36 INFO mapred.MapTask: soft limit at 83886080
21/08/30 23:24:36 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
21/08/30 23:24:36 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
21/08/30 23:24:36 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
21/08/30 23:24:36 INFO streaming.PipeMapRed: PipeMapRed exec [/hadoop/hadoop-2.9.2/sbin/./mapper.py]
21/08/30 23:24:36 INFO mapred.LineRecordReader: Found UTF-8 BOM and skipped it
21/08/30 23:24:36 INFO streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:36 INFO streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:37 INFO streaming.PipeMapRed: R/W/S=100/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:37 INFO streaming.PipeMapRed: R/W/S=1000/0/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:37 INFO streaming.PipeMapRed: Records R/W=2788/1
21/08/30 23:24:37 INFO streaming.PipeMapRed: R/W/S=10000/49444/0 in:NA [rec/s] out:NA [rec/s]
21/08/30 23:24:37 INFO streaming.PipeMapRed: MRErrorThread done
21/08/30 23:24:37 INFO streaming.PipeMapRed: mapRedFinished
21/08/30 23:24:37 INFO mapred.LocalJobRunner: 
21/08/30 23:24:37 INFO mapred.MapTask: Starting flush of map output
21/08/30 23:24:37 INFO mapred.MapTask: Spilling map output
21/08/30 23:24:37 INFO mapred.MapTask: bufstart = 0; bufend = 866856; bufvoid = 104857600
21/08/30 23:24:37 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 25775024(103100096); length = 439373/6553600
21/08/30 23:24:37 INFO mapred.MapTask: Finished spill 0
21/08/30 23:24:37 INFO mapred.Task: Task:attempt_local1574514939_0001_m_000001_0 is done. And is in the process of committing
21/08/30 23:24:37 INFO mapred.LocalJobRunner: Records R/W=2788/1
21/08/30 23:24:37 INFO mapred.Task: Task 'attempt_local1574514939_0001_m_000001_0' done.
21/08/30 23:24:37 INFO mapred.LocalJobRunner: Finishing task: attempt_local1574514939_0001_m_000001_0
21/08/30 23:24:37 INFO mapred.LocalJobRunner: map task executor complete.
21/08/30 23:24:37 WARN mapred.LocalJobRunner: job_local1574514939_0001
java.lang.Exception: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:491)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:551)
Caused by: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
	at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
	at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:538)
	at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
	at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:459)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
21/08/30 23:24:37 INFO mapreduce.Job: Job job_local1574514939_0001 running in uber mode : false
21/08/30 23:24:37 INFO mapreduce.Job:  map 100% reduce 0%
21/08/30 23:24:37 INFO mapreduce.Job: Job job_local1574514939_0001 failed with state FAILED due to: NA
21/08/30 23:24:37 INFO mapreduce.Job: Counters: 22
	File System Counters
		FILE: Number of bytes read=2069
		FILE: Number of bytes written=1568742
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=809738
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=7
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=1
	Map-Reduce Framework
		Map input records=12760
		Map output records=109844
		Map output bytes=866856
		Map output materialized bytes=1086550
		Input split bytes=106
		Combine input records=0
		Spilled Records=109844
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=0
		Total committed heap usage (bytes)=357564416
	File Input Format Counters 
		Bytes Read=674570
21/08/30 23:24:37 ERROR streaming.StreamJob: Job not successful!
Streaming Command Failed!

仔细看了一下,最早的错误出现在这里:

Traceback (most recent call last):
  File "/hadoop/hadoop-2.9.2/sbin/./mapper.py", line 3, in <module>
    for line in sys.stdin:
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 2092: invalid continuation byte

貌似是因为编码问题
于是我自己创建了一个test.txt上传到HDFS请添加图片描述

root@ubuntu:/hadoop/hadoop-2.9.2/sbin# hdfs dfs -cat /user/MapReduce/output/*
21/08/31 00:02:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
a	1
hello	2
map	4
reduce	3
world	1

运行成功

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2021-08-31 15:27:21  更:2021-08-31 15:29:21 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 -2025/1/11 20:44:42-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码