Reducing mapreduce output files

dytcc23 注册会员
2023-01-25 08:59

If you wanted to sum words from all files, you don't need to combine output files, instead, you can use addInputPath multiple times to read multiple files using MultipleInputs class

Alternatively, you should be able to pass input folder as an argument to read all files within it.

If you want to find the word with minimum count per file, you'll need a second reducer

You already have output location as a variable

Path output1 = new Path(args[3];
FileOutputFormat.setOutputPath(job, output1));

So create another job that reads that location

But you might be able to use only one job if you use a Combiner to do the word count, and using the filename as your key