您现在的位置是:首页 > 博文答疑 > Intellij IDEA搭建hadoop开发环境后系列问题博文答疑
Intellij IDEA搭建hadoop开发环境后系列问题
Leo2018-01-31【8】
简介Intellij IDEA搭建hadoop开发环境后运行WordCount系列问题
准备工作
1,安装JDK 1.8并配置(配置环境变量)
2,下载和安装IntelliJ IDEA(无需配置环境变量)
3,下载hadoop并且解压缩
实验:
1, 创建maven项目:
提醒:SDK如果没有自动搜索到,可以点‘New’然后自己指定JDK的安装路径

2, 在路径src下任意folder创建java class文件WordCount,我这里用src.main.java,

3,copy 如下code到WordCount
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import static sun.misc.Version.println;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
String test;
test = args[0];
System.out.println(test);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
} |
4, 不用修改pom.xml的dependanc.这是很多错误。我们可以把hadoop相关的include进来,
点击file->project structure

如上图添加新的Jars or folder,去hadoop解压路径分别选择和添加如下5个文件夹:


6, 配置 Run - > Edit Configurations, 新建Application 设置Mainclass: WordCount Program Argument : input/ output/

7, 尝试run下:遇到如下系列错误。
错误1:HADOOP_HOME and hadoop.home.dir are unset
解决:没有设置HADOOP_HOME,所以给hadoop配置环境变量
<img width="568" height="105" title="1517386176939361.png" style="width: 568px; height: 105px;" alt="图片.png" src="/upload/58/47/p>
HADOOP_HOME: hadopp解压路径
path: 加入 %HADOOP_HOME%\bin
错误2:Exception in thread "main" java.lang.NullPointerException atjava.lang.ProcessBuilder.start(Unknown Source)
解决:在Hadoop2后官方包里就没有winutils.exe 文件,让windows模拟hadoop环境来进行测试,下载一个放到hadopp解压缩路径的bin目录下面。
具体还想了解的可以访问https://wiki.apache.org/hadoop/WindowsProblems
下载的话可以直接去github上下载自己所需要的版本:https://github.com/steveloughran/winutils
错误3:Error: Exception in thread "main"java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
解决:C:\Windows\System32下缺少hadoop.dll,把这个文件拷贝到C:\Windows\System32下面即可。这个文件可以在“问题2”的github链接上找到下载。