crunch.MaxTemperature.java Source code

Java tutorial

Introduction

Here is the source code for crunch.MaxTemperature.java

Source

    /* XXX Chapter 2. Data Flow
     *
     * http://my.safaribooksonline.com/book/software-engineering-and-development/9781449328917/hdfs-concepts/
     * id3668231#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk3ODE0NDkzMjg5MTclMkZpZDM1ODU5NTEmcXVlcnk9cGFydGl0aW9u
     */

    /*
     * Hadoop Definitive Guide Chapter 6 -- Anatomy of a mapreduce run
     * 
     * Classic MapReduce (MapReduce 1)
     
     * A job run in classic MapReduce is illustrated in Figure 6-1. At the highest level, there are four independent entities:
    
     *   The client, which submits the MapReduce job.
     *   The jobtracker, which coordinates the job run. The jobtracker is a Java application whose main class is JobTracker.
     *   The tasktrackers, which run the tasks that the job has been split into. Tasktrackers are Java applications whose main class is TaskTracker.
     *   The distributed filesystem (normally HDFS, covered in Chapter 3), which is used for sharing job files between the other entities.
    
    
     * Job submission
    
    
     * The submit() method on Job creates an internal JobSummitter instance and calls
     * submitJobInternal() on it (step 1 in Figure 6-1). Having submitted the job, waitForCompletion()
     * polls the jobs progress once per second and reports the progress to the console if it has
     * changed since the last report. When the job completes successfully, the job counters are
     * displayed. Otherwise, the error that caused the job to fail is logged to the console.
    
     * The job submission process implemented by JobSummitter does the following:
    
     *    Asks the jobtracker for a new job ID (by calling getNewJobId() on JobTracker) (step 2).
    
     *   Checks the output specification of the job. For example, if the output directory has not
     *   been specified or it already exists, the job is not submitted and an error is thrown to the
     *   MapReduce program.
    
     *   Computes the input splits for the job. If the splits cannot be computed (because the input
     *   paths dont exist, for example), the job is not submitted and an error is thrown to the
     *   MapReduce program.
    
     * Copies the resources needed to run the job, including the job JAR file, the configuration file,
     * and the computed input splits, to the jobtrackers filesystem in a directory named after the job
     * ID. The job JAR is copied with a high replication factor (controlled by the
     * mapred.submit.replication property, which defaults to 10) so that there are lots of copies across
     * the cluster for the tasktrackers to access when they run tasks for the job (step 3).
    
     * Tells the jobtracker that the job is ready for execution by calling submitJob() on JobTracker
     * (step 4).
    
     * Job initialization
    
     * When the JobTracker receives a call to its submitJob() method, it puts it into an internal queue
     * from where the job scheduler will pick it up and initialize it. Initialization involves creating
     * an object to represent the job being run, which encapsulates its tasks, and bookkeeping
     * information to keep track of the status and progress of its tasks (step 5).
    
     * To create the list of tasks to run, the job scheduler first retrieves the input splits computed
     * by the client from the shared filesystem (step 6). It then creates one map task for each
     * split. The number of reduce tasks to create is determined by the mapred.reduce.tasks property in
     * the Job, which is set by the setNumReduceTasks() method, and the scheduler simply creates this
     * number of reduce tasks to be run. Tasks are given IDs at this point.
    
     * In addition to the map and reduce tasks, two further tasks are created: a job setup task and a
     * job cleanup task. These are run by tasktrackers and are used to run code to set up the job before
     * any map tasks run, and to cleanup after all the reduce tasks are complete. The OutputCommitter
     * that is configured for the job determines the code to be run, and by default this is a
     * FileOutputCommitter. For the job setup task it will create the final output directory for the job
     * and the temporary working space for the task output, and for the job cleanup task it will delete
     * the temporary working space for the task output. The commit protocol is described in more detail
     * in Output Committers.
    
     * Task assignment
    
     * Tasktrackers run a simple loop that periodically sends heartbeat method calls to the
     * jobtracker. Heartbeats tell the jobtracker that a tasktracker is alive, but they also double as a
     * channel for messages. As a part of the heartbeat, a tasktracker will indicate whether it is ready
     * to run a new task, and if it is, the jobtracker will allocate it a task, which it communicates to
     * the tasktracker using the heartbeat return value (step 7).
    
     *  Tasktrackers have a fixed number of slots for map tasks and for reduce tasks, and these are set
     *  independently. For example, a tasktracker may be configured to run two map tasks and two reduce
     *  tasks simultaneously. (The precise number depends on the number of cores and the amount of
     *  memory on the tasktracker; see Memory.) In the context of a given job, the default scheduler
     *  fills empty map task slots before reduce task slots. So if the tasktracker has at least one
     *  empty map task slot, the jobtracker will select a map task; otherwise, it will select a reduce
     *  task.
    
     * To choose a reduce task, the jobtracker simply takes the next in its list of yet-to-be-run reduce
     * tasks, since there are no data locality considerations. For a map task, however, it takes into
     * account the tasktrackers network location and picks a task whose input split is as close as
     * possible to the tasktracker. In the optimal case, the task is data-local
    
     * TaskRunner launches a new Java Virtual Machine (JVM, step 9) to run each task in (step 10), so
     * that any bugs in the user-defined map and reduce functions dont affect the tasktracker (by
     * causing it to crash or hang, for example). However, it is possible to reuse the JVM between
     * tasks; see Task JVM Reuse.
    
     * The child process communicates with its parent through the umbilical interface. It informs the
     * parent of the tasks progress every few seconds until the task is complete.
    
     */

    /* XXX partition, sort, combiner
    
     * Each map task has a circular memory buffer that it writes the output to. The buffer is 100 MB by
     * default, a size that can be tuned by changing the io.sort.mb property. When the contents of the
     * buffer reaches a certain threshold size (io.sort.spill.percent, which has the default 0.80, or
     * 80%), a background thread will start to spill the contents to disk. Map outputs will continue to
     * be written to the buffer while the spill takes place, but if the buffer fills up during this
     * time, the map will block until the spill is complete.
    
     * Spills are written in round-robin fashion to the directories specified by the mapred.local.dir
     * property, in a job-specific subdirectory.
    
     * Before it writes to disk, the thread first divides the data into partitions corresponding to the
     * reducers that they will ultimately be sent to. Within each partition, the background thread
     * performs an in-memory sort by key, and if there is a combiner function, it is run on the output
     * of the sort. Running the combiner function makes for a more compact map output, so there is less
     * data to write to local disk and to transfer to the reducer.
    
     * Each time the memory buffer reaches the spill threshold, a new spill file is created, so after
     * the map task has written its last output record, there could be several spill files. Before the
     * task is finished, the spill files are merged into a single partitioned and sorted output
     * file. The configuration property io.sort.factor controls the maximum number of streams to merge
     * at once; the default is 10.
    
     * If there are at least three spill files (set by the min.num.spills.for.combine property), the
     * combiner is run again before the output file is written. Recall that combiners may be run
     * repeatedly over the input without affecting the final result. If there are only one or two
     * spills, the potential reduction in map output size is not worth the overhead in invoking the
     * combiner, so it is not run again for this map output.
    
     * It is often a good idea to compress the map output as it is written to disk because doing so
     * makes it faster to write to disk, saves disk space, and reduces the amount of data to transfer to
     * the reducer. By default, the output is not compressed, but it is easy to enable this by setting
     * mapred.compress.map.output to true. The compression library to use is specified by
     * mapred.map.output.compression.codec; see Compression for more on compression formats.
    
     * The output files partitions are made available to the reducers over HTTP. The maximum number of
     * worker threads used to serve the file partitions is controlled by the tasktracker.http.threads
     * property; this setting is per tasktracker, not per map task slot. The default of 40 may need to
     * be increased for large clusters running large jobs. In MapReduce 2, this property is not
     * applicable because the maximum number of threads used is set automatically based on the number of
     * processors on the machine. (MapReduce 2 uses Netty, which by default allows up to twice as many
     * threads as there are processors.)
     
     * (below: Applicatoin Master is a YARN concept)
     * {{
     * YARN remedies the scalability shortcomings of classic? MapReduce by splitting the
     * responsibilities of the jobtracker into separate entities. The jobtracker takes care of both job
     * scheduling (matching tasks with tasktrackers) and task progress monitoring (keeping track of
     * tasks, restarting failed or slow tasks, and doing task bookkeeping, such as maintaining counter
     * totals).
    
     * YARN separates these two roles into two independent daemons: a resource manager to manage the use
     * of resources across the cluster and an application master to manage the lifecycle of applications
     * running on the cluster.
     * }}
    
     * How do reducers know which machines to fetch map output from?
    
     * As map tasks complete successfully, they notify their parent tasktracker of the status update,
     * which in turn notifies the jobtracker. (In MapReduce 2, the tasks notify their application master
     * directly.) These notifications are transmitted over the heartbeat communication mechanism
     * described earlier. Therefore, for a given job, the jobtracker (or application master) knows the
     * mapping between map outputs and hosts. A thread in the reducer periodically asks the master for
     * map output hosts until it has retrieved them all.
    
     * Hosts do not delete map outputs from disk as soon as the first reducer has retrieved them, as the
     * reducer may subsequently fail. Instead, they wait until they are told to delete them by the
     * jobtracker (or application master), which is after the job has completed.
    
     * The map outputs are copied to the reduce task JVMs memory if they are small enough (the buffers
     * size is controlled by mapred.job.shuffle.input.buffer.percent, which specifies the proportion of
     * the heap to use for this purpose); otherwise, they are copied to disk. When the in-memory buffer
     * reaches a threshold size (controlled by mapred.job.shuffle.merge.percent) or reaches a threshold
     * number of map outputs (mapred.inmem.merge.threshold), it is merged and spilled to disk. If a
     * combiner is specified, it will be run during the merge to reduce the amount of data written to
     * disk.
    
     * As the copies accumulate on disk, a background thread merges them into larger, sorted files. This
     * saves some time merging later on. Note that any map outputs that were compressed (by the map
     * task) have to be decompressed in memory in order to perform a merge on them.
    
     * When all the map outputs have been copied, the reduce task moves into the sort phase (which
     * should properly be called the merge phase, as the sorting was carried out on the map side), which
     * merges the map outputs, maintaining their sort ordering. This is done in rounds. For example, if
     * there were 50 map outputs and the merge factor was 10 (the default, controlled by the
     * io.sort.factor property, just like in the maps merge), there would be five rounds. Each round
     * would merge 10 files into one, so at the end there would be five intermediate files.
    
     * Rather than have a final round that merges these five files into a single sorted file, the merge
     * saves a trip to disk by directly feeding the reduce function in what is the last phase: the
     * reduce phase. This final merge can come from a mixture of in-memory and on-disk segments.
    
     */

    /* XXX A reduce side join is arguably one of the easiest implementations of a join in MapReduce, and
     * therefore is a very attractive choice. It can be used to execute any of the types of joins
     * described above with relative ease and there is no limitation on the size of your data
     * sets. Also, it can join as many data sets together at once as you need. All that said, a reduce
     * side join will likely require a large amount of network bandwidth because the bulk of the data is
     * sent to the reduce phase. This can take some time, but if you have resources available and arent
     * concerned about execution time, by all means use it! Unfortunately, if all of the data sets are
     * large, this type of join may be your only choice.
    
     * ORA mapreduce patterns gives the following types of joins:
     * 1) reduce side join -- normal join, can handle any size dataset, downside is data transfer overhead
     * 2) replicated join -- small dataset replicated thru hadoop distributed cache, large dataset needs to
     *                       occur on the left.
     * 3) composite join -- misnomer?  this seems very contrived, its essentially a merge join
     * 4) cartesian -- 
     */

    //./ch02/src/main/java/MaxTemperature.java
    // cc MaxTemperature Application to find the maximum temperature in the weather dataset
    // vv MaxTemperature
import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MaxTemperature {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Usage: MaxTemperature <input path> <output path>");
      System.exit(-1);
    }
    
    Job job = new Job();        // XXX
    job.setJarByClass(MaxTemperature.class); // XXX
    job.setJobName("Max temperature");

    FileInputFormat.addInputPath(job, new Path(args[0])); // XXX
    FileOutputFormat.setOutputPath(job, new Path(args[1])); // XXX
    
    job.setMapperClass(MaxTemperatureMapper.class); 
    job.setReducerClass(MaxTemperatureReducer.class);

    job.setOutputKeyClass(Text.class); // XXX sets both Map and Reduce Key class.  For only map setMapOutputKeyClass()
    job.setOutputValueClass(IntWritable.class); // XXX and setMapOutputValueClass()
    
    System.exit(job.waitForCompletion(true) ? 0 : 1); // XXX
  }
}
    // ^^ MaxTemperature

    //=*=*=*=*
    //./ch02/src/main/java/MaxTemperatureMapper.java
    // cc MaxTemperatureMapper Mapper for maximum temperature example
    // vv MaxTemperatureMapper
    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

public class MaxTemperatureMapper
  extends Mapper<LongWritable, Text, Text, IntWritable> { // XXX extends Mapper<I1, I2, O1, O2>

  private static final int MISSING = 9999;
  
  @Override
  public void map(LongWritable key, Text value, Context context) // XXX Context is defined inside Mapper
      throws IOException, InterruptedException {
    
    String line = value.toString(); // XXX
    String year = line.substring(15, 19);
    int airTemperature;
    if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
      airTemperature = Integer.parseInt(line.substring(88, 92));
    } else {
      airTemperature = Integer.parseInt(line.substring(87, 92));
    }
    String quality = line.substring(92, 93);
    if (airTemperature != MISSING && quality.matches("[01459]")) {
      context.write(new Text(year), new IntWritable(airTemperature));
    }
  }
}
    // ^^ MaxTemperatureMapper

    //=*=*=*=*
    //./ch02/src/main/java/MaxTemperatureReducer.java
    // cc MaxTemperatureReducer Reducer for maximum temperature example
    // vv MaxTemperatureReducer
    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;

public class MaxTemperatureReducer
  extends Reducer<Text, IntWritable, Text, IntWritable> { // XXX extends Reducer<I1, I2, O1, O2>
  
  @Override
  public void reduce(Text key, Iterable<IntWritable> values, // XXX Iterable
      Context context)
      throws IOException, InterruptedException {
    
    int maxValue = Integer.MIN_VALUE;
    for (IntWritable value : values) {
      maxValue = Math.max(maxValue, value.get());
    }
    context.write(key, new IntWritable(maxValue));
  }
}
    // ^^ MaxTemperatureReducer

    //=*=*=*=*
    //./ch02/src/main/java/MaxTemperatureWithCombiner.java
    // cc MaxTemperatureWithCombiner Application to find the maximum temperature, using a combiner function for efficiency
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

    // vv MaxTemperatureWithCombiner
public class MaxTemperatureWithCombiner {

  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.err.println("Usage: MaxTemperatureWithCombiner <input path> " +
          "<output path>");
      System.exit(-1);
    }
    
    Job job = new Job();
    job.setJarByClass(MaxTemperatureWithCombiner.class);
    job.setJobName("Max temperature");

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    job.setMapperClass(MaxTemperatureMapper.class);
    /*[*/job.setCombinerClass(MaxTemperatureReducer.class)/*]*/;
    job.setReducerClass(MaxTemperatureReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}
    // ^^ MaxTemperatureWithCombiner

    //=*=*=*=*
    //./ch02/src/main/java/OldMaxTemperature.java
    // cc OldMaxTemperature Application to find the maximum temperature, using the old MapReduce API
    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;

    // vv OldMaxTemperature
    public class OldMaxTemperature {

        static class OldMaxTemperatureMapper /*[*/extends MapReduceBase/*]*/
                /*[*/ implements Mapper/*]*/<LongWritable, Text, Text, IntWritable> {

            private static final int MISSING = 9999;

            @Override
            public void map(LongWritable key, Text value, /*[*/OutputCollector<Text, IntWritable> output,
                    Reporter reporter/*]*/) throws IOException {

                String line = value.toString();
                String year = line.substring(15, 19);
                int airTemperature;
                if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
                    airTemperature = Integer.parseInt(line.substring(88, 92));
                } else {
                    airTemperature = Integer.parseInt(line.substring(87, 92));
                }
                String quality = line.substring(92, 93);
                if (airTemperature != MISSING && quality.matches("[01459]")) {
                    /*[*/output.collect/*]*/(new Text(year), new IntWritable(airTemperature));
                }
            }
        }

        static class OldMaxTemperatureReducer /*[*/extends MapReduceBase/*]*/
                /*[*/ implements Reducer/*]*/<Text, IntWritable, Text, IntWritable> {

            @Override
            public void reduce(Text key, /*[*/Iterator/*]*/<IntWritable> values,
                    /*[*/OutputCollector<Text, IntWritable> output, Reporter reporter/*]*/) throws IOException {

                int maxValue = Integer.MIN_VALUE;
                while (/*[*/values.hasNext()/*]*/) {
                    maxValue = Math.max(maxValue, /*[*/values.next().get()/*]*/);
                }
                /*[*/output.collect/*]*/(key, new IntWritable(maxValue));
            }
        }

        public static void main(String[] args) throws IOException {
            if (args.length != 2) {
                System.err.println("Usage: OldMaxTemperature <input path> <output path>");
                System.exit(-1);
            }

            /*[*/JobConf conf = new JobConf(OldMaxTemperature.class);
            /*]*/
            /*[*/conf/*]*/.setJobName("Max temperature");

            FileInputFormat.addInputPath(/*[*/conf/*]*/, new Path(args[0]));
            FileOutputFormat.setOutputPath(/*[*/conf/*]*/, new Path(args[1]));

            /*[*/conf/*]*/.setMapperClass(OldMaxTemperatureMapper.class);
            /*[*/conf/*]*/.setReducerClass(OldMaxTemperatureReducer.class);

            /*[*/conf/*]*/.setOutputKeyClass(Text.class);
            /*[*/conf/*]*/.setOutputValueClass(IntWritable.class);

            /*[*/JobClient.runJob(conf);/*]*/
        }
    }
    // ^^ OldMaxTemperature
    //=*=*=*=*
    //./ch02/src/main/java/oldapi/MaxTemperature.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;

    public class MaxTemperature {

        public static void main(String[] args) throws IOException {
            if (args.length != 2) {
                System.err.println("Usage: MaxTemperature <input path> <output path>");
                System.exit(-1);
            }

            JobConf conf = new JobConf(MaxTemperature.class);
            conf.setJobName("Max temperature");

            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));

            conf.setMapperClass(MaxTemperatureMapper.class);
            conf.setReducerClass(MaxTemperatureReducer.class);

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            JobClient.runJob(conf);
        }
    }

    //=*=*=*=*
    //./ch02/src/main/java/oldapi/MaxTemperatureMapper.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reporter;

    public class MaxTemperatureMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

        private static final int MISSING = 9999;

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {

            String line = value.toString();
            String year = line.substring(15, 19);
            int airTemperature;
            if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
                airTemperature = Integer.parseInt(line.substring(88, 92));
            } else {
                airTemperature = Integer.parseInt(line.substring(87, 92));
            }
            String quality = line.substring(92, 93);
            if (airTemperature != MISSING && quality.matches("[01459]")) {
                output.collect(new Text(year), new IntWritable(airTemperature));
            }
        }
    }

    //=*=*=*=*
    //./ch02/src/main/java/oldapi/MaxTemperatureReducer.java
    package oldapi;

    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;

    public class MaxTemperatureReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

        public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
                Reporter reporter) throws IOException {

            int maxValue = Integer.MIN_VALUE;
            while (values.hasNext()) {
                maxValue = Math.max(maxValue, values.next().get());
            }
            output.collect(key, new IntWritable(maxValue));
        }
    }

    //=*=*=*=*
    //./ch02/src/main/java/oldapi/MaxTemperatureWithCombiner.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;

public class MaxTemperatureWithCombiner {

  public static void main(String[] args) throws IOException {
    if (args.length != 2) {
      System.err.println("Usage: MaxTemperatureWithCombiner <input path> " +
            "<output path>");
      System.exit(-1);
    }
    
    JobConf conf = new JobConf(MaxTemperatureWithCombiner.class);
    conf.setJobName("Max temperature");

    FileInputFormat.addInputPath(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));
    
    conf.setMapperClass(MaxTemperatureMapper.class);
    /*[*/conf.setCombinerClass(MaxTemperatureReducer.class)/*]*/;
    conf.setReducerClass(MaxTemperatureReducer.class);

    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);

    JobClient.runJob(conf);
  }
}

    //=*=*=*=*
    //./ch03/src/main/java/DateRangePathFilter.java
    import java.text.DateFormat;
    import java.text.ParseException;
    import java.text.SimpleDateFormat;
    import java.util.Date;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.PathFilter;

public class DateRangePathFilter implements PathFilter {
  
  private final Pattern PATTERN = Pattern.compile("^.*/(\\d\\d\\d\\d/\\d\\d/\\d\\d).*$"); // XXX compile
  
  private final Date start, end;

  public DateRangePathFilter(Date start, Date end) {
    this.start = new Date(start.getTime()); // XXX defensive copy in constructor
    this.end = new Date(end.getTime());     // XXX defensive copy in constructor
  }
  
  public boolean accept(Path path) {
      Matcher matcher = PATTERN.matcher(path.toString()); // XXX PATTERN is compiled pattern .matcher(/* string to match */)
    if (matcher.matches()) {                              // XXX matcher.matches() boolean
      DateFormat format = new SimpleDateFormat("yyyy/MM/dd"); // XXX SimpleDateFormat
      try {
        return inInterval(format.parse(matcher.group(1))); // XXX DateFormat.parse returns Date
      } catch (ParseException e) {
        return false;
      }
    }
    return false;
  }

  private boolean inInterval(Date date) {
    return !date.before(start) && !date.after(end); // XXX Date.before(Date) Date.after(Date)
  }

}

    //=*=*=*=*
    //./ch03/src/main/java/FileCopyWithProgress.java
    // cc FileCopyWithProgress Copies a local file to a Hadoop filesystem, and shows progress
    import java.io.BufferedInputStream;
    import java.io.FileInputStream;
    import java.io.InputStream;
    import java.io.OutputStream;
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration; // XXX
    import org.apache.hadoop.fs.FileSystem; // XXX
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.util.Progressable;

    // vv FileCopyWithProgress
public class FileCopyWithProgress {
  public static void main(String[] args) throws Exception {
    String localSrc = args[0];
    String dst = args[1];
    
    InputStream in = new BufferedInputStream(new FileInputStream(localSrc)); // XXX InputStream = BufferedInputStream(new FileInputStream())
    
    Configuration conf = new Configuration(); // XXX hadoop.conf.Configuration
    FileSystem fs = FileSystem.get(URI.create(dst), conf); // XXX hadoop.fs.FileSystem from URI.create + hadoop.conf.Configuration
    OutputStream out = fs.create(new Path(dst), new Progressable() { // XXX fs.create -> stream hadoop.util.Progressable
      public void progress() {
        System.out.print(".");
      }
    });
    
    IOUtils.copyBytes(in, out, 4096, true); // XXX hadoop.IOUtils.copyBytes - CLOSES in and out at the end
  }
}
    // ^^ FileCopyWithProgress

    //=*=*=*=*
    //./ch03/src/main/java/FileSystemCat.java
    // cc FileSystemCat Displays files from a Hadoop filesystem on standard output by using the FileSystem directly
    import java.io.InputStream;
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;

    // vv FileSystemCat
public class FileSystemCat {

  public static void main(String[] args) throws Exception {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    InputStream in = null;
    try {
      in = fs.open(new Path(uri));
      IOUtils.copyBytes(in, System.out, 4096, false);
    } finally {
      IOUtils.closeStream(in);
    }
  }
}
    // ^^ FileSystemCat

    //=*=*=*=*
    //./ch03/src/main/java/FileSystemDoubleCat.java
    // cc FileSystemDoubleCat Displays files from a Hadoop filesystem on standard output twice, by using seek
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataInputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;

    // vv FileSystemDoubleCat
public class FileSystemDoubleCat {

  public static void main(String[] args) throws Exception {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    FSDataInputStream in = null; // XXX FSDataInputStream is Seekable, pervious examples had InputStream (no seeking was done)
    try {
      in = fs.open(new Path(uri));
      IOUtils.copyBytes(in, System.out, 4096, false);
      in.seek(0); // go back to the start of the file XXX
      IOUtils.copyBytes(in, System.out, 4096, false);
    } finally {
      IOUtils.closeStream(in);
    }
  }
}
    // ^^ FileSystemDoubleCat

    //=*=*=*=*
    //./ch03/src/main/java/ListStatus.java
    // cc ListStatus Shows the file statuses for a collection of paths in a Hadoop filesystem 
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileStatus;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.FileUtil;
    import org.apache.hadoop.fs.Path;

    // vv ListStatus
public class ListStatus {

  public static void main(String[] args) throws Exception {
    String uri = args[0];
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(URI.create(uri), conf);
    
    Path[] paths = new Path[args.length];
    for (int i = 0; i < paths.length; i++) {
      paths[i] = new Path(args[i]);
    }
    
    FileStatus[] status = fs.listStatus(paths); // XXX Filesystem.listStatus(Path[]) -> FileStatus (see below)
    Path[] listedPaths = FileUtil.stat2Paths(status); // XXX FileStatus to Path[]
    for (Path p : listedPaths) {
      System.out.println(p);
    }
  }
}
    // ^^ ListStatus

    //=*=*=*=*
    //./ch03/src/main/java/RegexExcludePathFilter.java
    // cc RegexExcludePathFilter A PathFilter for excluding paths that match a regular expression
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.PathFilter;

    // vv RegexExcludePathFilter
public class RegexExcludePathFilter implements PathFilter { // XXX hadoop.fs.PathFilter
  
  private final String regex;

  public RegexExcludePathFilter(String regex) {
    this.regex = regex;
  }

  public boolean accept(Path path) { // XXX interface PathFilter
    return !path.toString().matches(regex);
  }
}
    // ^^ RegexExcludePathFilter

    //=*=*=*=*
    //./ch03/src/main/java/RegexPathFilter.java
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.PathFilter;

public class RegexPathFilter implements PathFilter {
  
  private final String regex;
  private final boolean include;

  public RegexPathFilter(String regex) {
    this(regex, true);
  }
  
  public RegexPathFilter(String regex, boolean include) {
    this.regex = regex;
    this.include = include;
  }

  public boolean accept(Path path) {
    return (path.toString().matches(regex)) ? include : !include;
  }

}

    //=*=*=*=*
    //./ch03/src/main/java/URLCat.java
    // cc URLCat Displays files from a Hadoop filesystem on standard output using a URLStreamHandler
    import java.io.InputStream;
    import java.net.URL;

    import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
    import org.apache.hadoop.io.IOUtils;

    // vv URLCat
public class URLCat {

    /* XXX
     *
     * in = new URL("hdfs://host/path").openStream();  To make new URL understand hdfs://:

     * Theres a little bit more work required to make Java recognize Hadoops hdfs URL scheme. This
     * is achieved by calling the setURLStreamHandlerFactory method on URL with an instance of
     * FsUrlStreamHandlerFactory. This method can be called only once per JVM, so it is typically
     * executed in a static block. This limitation means that if some other part of your
     * programperhaps a third-party component outside your controlsets a URLStreamHandlerFactory,
     * you wont be able to use this approach for reading data from Hadoop. The next section
     * discusses an alternative.

     */
  static {
    URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
  }
  
  public static void main(String[] args) throws Exception {
    InputStream in = null;
    try {
      in = new URL(args[0]).openStream();
      IOUtils.copyBytes(in, System.out, 4096, false);
    } finally {
      IOUtils.closeStream(in);
    }
  }
}
    // ^^ URLCat

    //=*=*=*=*
    //./ch03/src/test/java/CoherencyModelTest.java
    // == CoherencyModelTest
    // == CoherencyModelTest-NotVisibleAfterFlush
    // == CoherencyModelTest-VisibleAfterFlushAndSync
    // == CoherencyModelTest-LocalFileVisibleAfterFlush
    // == CoherencyModelTest-VisibleAfterClose
    import static org.hamcrest.CoreMatchers.is; // XXX
    import static org.junit.Assert.assertThat; // XXX 
    // XXX junit gives assertEquals which is inferior for reasons given here: http://blogs.atlassian.com/2009/06/how_hamcrest_can_save_your_sou/

    import java.io.File;
    import java.io.FileOutputStream;
    import java.io.IOException;
    import java.io.OutputStream;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataOutputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.hdfs.MiniDFSCluster;
    import org.junit.*;

public class CoherencyModelTest {
    /* XXX Hadoop has a set of testing classes, called MiniDFSCluster, MiniMRCluster, and
     * MiniYARNCluster, that provide a programmatic way of creating in-process clusters. Unlike the
     * local job runner, these allow testing against the full HDFS and MapReduce machinery. Bear in
     * mind, too, that tasktrackers in a mini-cluster launch separate JVMs to run tasks in, which
     * can make debugging more difficult.
     */
  private MiniDFSCluster cluster; // use an in-process HDFS cluster for testing
  private FileSystem fs;

  @Before
  public void setUp() throws IOException { // XXX set  up FileSystem in setUp
    Configuration conf = new Configuration();
    if (System.getProperty("test.build.data") == null) {
      System.setProperty("test.build.data", "/tmp");
    }
    cluster = new MiniDFSCluster(conf, 1, true, null); // XXX
    fs = cluster.getFileSystem(); // XXX
  }
  
  @After
  public void tearDown() throws IOException {
    fs.close();
    cluster.shutdown();
  }
  
  @Test
  public void fileExistsImmediatelyAfterCreation() throws IOException {
    // vv CoherencyModelTest
    Path p = new Path("p");
    fs.create(p);
    assertThat(fs.exists(p), is(true)); // XXX
    // ^^ CoherencyModelTest
    assertThat(fs.delete(p, true), is(true)); // XXX
  }
  
  @Test
  public void fileContentIsNotVisibleAfterFlush() throws IOException {
    // vv CoherencyModelTest-NotVisibleAfterFlush
    Path p = new Path("p");
    OutputStream out = fs.create(p);
    out.write("content".getBytes("UTF-8")); // XXX OutputStream.write(bytes[]) String.getBytes(encoding)
    /*[*/out.flush();/*]*/
    assertThat(fs.getFileStatus(p).getLen(), is(0L)); // XXX fs.getFileStatus(path).getLen()
    // ^^ CoherencyModelTest-NotVisibleAfterFlush
    out.close();
    assertThat(fs.delete(p, true), is(true));
  }
  
  @Test
  @Ignore("See https://issues.apache.org/jira/browse/HADOOP-4379")
  public void fileContentIsVisibleAfterFlushAndSync() throws IOException {
    // vv CoherencyModelTest-VisibleAfterFlushAndSync
    Path p = new Path("p");
    FSDataOutputStream out = fs.create(p);
    out.write("content".getBytes("UTF-8"));
    out.flush();
    // XXX org.apache.hadoop.fs.Syncable this is why you need FSDataOutputStream, OutputStream is not syncable
    /*[*/out.sync();/*]*/
    assertThat(fs.getFileStatus(p).getLen(), is(((long) "content".length())));
    // ^^ CoherencyModelTest-VisibleAfterFlushAndSync
    out.close();
    assertThat(fs.delete(p, true), is(true));
  }
  
  
  @Test
  public void localFileContentIsVisibleAfterFlushAndSync() throws IOException {
    File localFile = File.createTempFile("tmp", "");
    assertThat(localFile.exists(), is(true));
    // vv CoherencyModelTest-LocalFileVisibleAfterFlush
    FileOutputStream out = new FileOutputStream(localFile);
    out.write("content".getBytes("UTF-8"));
    out.flush(); // flush to operating system
    out.getFD().sync(); // sync to disk XXX getFD() returns object of class FileDescriptor (gives sync() and valid())
    assertThat(localFile.length(), is(((long) "content".length())));
    // ^^ CoherencyModelTest-LocalFileVisibleAfterFlush
    out.close();
    assertThat(localFile.delete(), is(true));
  }
  
  @Test
  public void fileContentIsVisibleAfterClose() throws IOException {
    // vv CoherencyModelTest-VisibleAfterClose
    Path p = new Path("p");     // hadoop.fs.Path
    OutputStream out = fs.create(p);
    out.write("content".getBytes("UTF-8"));
    /*[*/out.close();/*]*/
    assertThat(fs.getFileStatus(p).getLen(), is(((long) "content".length()))); // FileStatus also gives path, owner, replication, 
    // ^^ CoherencyModelTest-VisibleAfterClose
    assertThat(fs.delete(p, true), is(true));
  }

}

    //=*=*=*=*
    //./ch03/src/test/java/FileSystemDeleteTest.java
    import static org.hamcrest.CoreMatchers.*;
    import static org.junit.Assert.*;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataOutputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.junit.Before; // XXX
    import org.junit.Test; // XXX

public class FileSystemDeleteTest {

  private FileSystem fs;
  
  @Before
  public void setUp() throws Exception {
    fs = FileSystem.get(new Configuration());
    writeFile(fs, new Path("dir/file"));
  }
  
  private void writeFile(FileSystem fileSys, Path name) throws IOException {
    FSDataOutputStream stm = fileSys.create(name); // XXX FileSystem.create(Path)
    stm.close();
  }
  
  @Test
  public void deleteFile() throws Exception {
      /* XXX assertThat(call, is(true/false) */
    assertThat(fs.delete(new Path("dir/file"), false), is(true));
    assertThat(fs.exists(new Path("dir/file")), is(false));
    assertThat(fs.exists(new Path("dir")), is(true));
    assertThat(fs.delete(new Path("dir"), false), is(true));
    assertThat(fs.exists(new Path("dir")), is(false));
  }

  @Test
  public void deleteNonEmptyDirectoryNonRecursivelyFails() throws Exception {
    try {
      fs.delete(new Path("dir"), false); // XXX non-recursive
      fail("Shouldn't delete non-empty directory"); // XXX fail("if not needed")
    } catch (IOException e) {
      // expected
    }
  }
  
  @Test
  public void deleteDirectory() throws Exception {
    assertThat(fs.delete(new Path("dir"), true), is(true));
    assertThat(fs.exists(new Path("dir")), is(false));
  }
  
}

    //=*=*=*=*
    //./ch03/src/test/java/FileSystemGlobTest.java
    // == FileSystemGlobTest
    import static org.hamcrest.CoreMatchers.*;
    import static org.junit.Assert.*;

    import java.io.IOException;
    import java.text.ParseException;
    import java.text.SimpleDateFormat;
    import java.util.*;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.FileUtil;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.PathFilter;
    import org.junit.*;

public class FileSystemGlobTest {
  
  private static final String BASE_PATH = "/tmp/" +
    FileSystemGlobTest.class.getSimpleName(); // XXX classname - package name ie java.lang.String -> String
  
  private FileSystem fs;
  
  @Before
  public void setUp() throws Exception {
    fs = FileSystem.get(new Configuration());
    fs.mkdirs(new Path(BASE_PATH, "2007/12/30")); // XXX mkdirs
    fs.mkdirs(new Path(BASE_PATH, "2007/12/31"));
    fs.mkdirs(new Path(BASE_PATH, "2008/01/01"));
    fs.mkdirs(new Path(BASE_PATH, "2008/01/02"));
  }
  
  @After
  public void tearDown() throws Exception {
    fs.delete(new Path(BASE_PATH), true); // XXX recursive
  }
  
  @Test
  public void glob() throws Exception {
    assertThat(glob("/*"), is(paths("/2007", "/2008"))); // XXX ??? /tmp???
    assertThat(glob("/*/*"), is(paths("/2007/12", "/2008/01")));
    
    // bug?
    //assertThat(glob("/*/12/*"), is(paths("/2007/12/30", "/2007/12/31")));

    assertThat(glob("/200?"), is(paths("/2007", "/2008")));
    assertThat(glob("/200[78]"), is(paths("/2007", "/2008")));
    assertThat(glob("/200[7-8]"), is(paths("/2007", "/2008")));
    assertThat(glob("/200[^01234569]"), is(paths("/2007", "/2008"))); // XXX are globs just regular expressions, no FileSystem.globStatus accepts a subset of regex

    assertThat(glob("/*/*/{31,01}"), is(paths("/2007/12/31", "/2008/01/01")));
    assertThat(glob("/*/*/3{0,1}"), is(paths("/2007/12/30", "/2007/12/31")));

    assertThat(glob("/*/{12/31,01/01}"), is(paths("/2007/12/31", "/2008/01/01"))); // XXX glob with alternatives

    // bug?
    //assertThat(glob("/2007/12/30/data\\[2007-12-30\\]"), is(paths("/2007/12/30/data[2007-12-30]")));

  }
  
  @Test
  public void regexIncludes() throws Exception {
    assertThat(glob("/*", new RegexPathFilter("^.*/2007$")), is(paths("/2007")));
    
    // bug?
    //assertThat(glob("/*/*/*", new RegexPathFilter("^.*/2007/12/31$")), is(paths("/2007/12/31")));
    // this works but shouldn't be necessary? see https://issues.apache.org/jira/browse/HADOOP-3497
    assertThat(glob("/*/*/*", new RegexPathFilter("^.*/2007(/12(/31)?)?$")), is(paths("/2007/12/31")));
  }
  
  @Test
  public void regexExcludes() throws Exception {
    assertThat(glob("/*", new RegexPathFilter("^.*/2007$", false)), is(paths("/2008")));
    assertThat(glob("/2007/*/*", new RegexPathFilter("^.*/2007/12/31$", false)), is(paths("/2007/12/30")));
  }
  
  @Test
  public void regexExcludesWithRegexExcludePathFilter() throws Exception {
    assertThat(glob("/*", new RegexExcludePathFilter("^.*/2007$")), is(paths("/2008"))); // XXX
    assertThat(glob("/2007/*/*", new RegexExcludePathFilter("^.*/2007/12/31$")), is(paths("/2007/12/30")));
  }
     
  public void testDateRange() throws Exception {
    DateRangePathFilter filter = new DateRangePathFilter(date("2007/12/31"),
        date("2008/01/01"));    // XXX DateRangePathFilter
    assertThat(glob("/*/*/*", filter), is(paths("/2007/12/31", "/2008/01/01")));  
  } 
  
  private Path[] glob(String pattern) throws IOException {
    return FileUtil.stat2Paths(fs.globStatus(new Path(BASE_PATH + pattern))); // XXX 
  }
  
  private Path[] glob(String pattern, PathFilter pathFilter) throws IOException {
    return FileUtil.stat2Paths(fs.globStatus(new Path(BASE_PATH + pattern), pathFilter)); // XXX
  }
  
  private Path[] paths(String... pathStrings) { // XXX String...
    Path[] paths = new Path[pathStrings.length];
    for (int i = 0; i < paths.length; i++) {
      paths[i] = new Path("file:" + BASE_PATH + pathStrings[i]);
    }
    return paths;
  }
  
  private Date date(String date) throws ParseException {
    return new SimpleDateFormat("yyyy/MM/dd").parse(date); // XXX
  }
}

    //=*=*=*=*
    //./ch03/src/test/java/ShowFileStatusTest.java
    // cc ShowFileStatusTest Demonstrates file status information
    import static org.junit.Assert.*;
    import static org.hamcrest.Matchers.*;

    import java.io.*;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.hdfs.MiniDFSCluster;
    import org.junit.*;

    // vv ShowFileStatusTest
    public class ShowFileStatusTest {

        private MiniDFSCluster cluster; // use an in-process HDFS cluster for testing
        private FileSystem fs;

        @Before // XXX
        public void setUp() throws IOException {
            Configuration conf = new Configuration(); // XXX hadoop.conf.Configuration
            if (System.getProperty("test.build.data") == null) {
                System.setProperty("test.build.data", "/tmp");
            }
            cluster = new MiniDFSCluster(conf, 1, true, null); // XXX
            fs = cluster.getFileSystem(); // XXX
            OutputStream out = fs.create(new Path("/dir/file")); // XXX hadoop.fs.Path
            out.write("content".getBytes("UTF-8"));
            out.close();
        }

        @After // XXX
        public void tearDown() throws IOException {
            if (fs != null) {
                fs.close();
            } // XXX
            if (cluster != null) {
                cluster.shutdown();
            } // XXX
        }

        @Test(expected = FileNotFoundException.class) // XXX @TesT(expected = FNFE.class)
        public void throwsFileNotFoundForNonExistentFile() throws IOException {
            fs.getFileStatus(new Path("no-such-file"));
        }

        @Test
        public void fileStatusForFile() throws IOException {
            Path file = new Path("/dir/file"); // XXX new Path creates the file
            FileStatus stat = fs.getFileStatus(file);
            assertThat(stat.getPath().toUri().getPath(), is("/dir/file")); // XXX FileStatus.getPath().toUri() -> URI .getPath()
            assertThat(stat.isDir(), is(false)); // XXX assertThat(actual, Matcher) Matcher provides matches method
            assertThat(stat.getLen(), is(7L));
            assertThat(stat.getModificationTime(), is(lessThanOrEqualTo(System.currentTimeMillis())));
            assertThat(stat.getReplication(), is((short) 1)); // XXX Matcher<Short> is(Short value) -> o.h.core.Is.<Short>is(Short) static factory method from oh.core.Is
            // XXX which calls the constructor is(Matcher<Short> matcher) with the matcher equalTo(Short) hamcrest-all-1.3-source.java:Is.java:65
            assertThat(stat.getBlockSize(), is(64 * 1024 * 1024L));
            assertThat(stat.getOwner(), is(System.getProperty("user.name")));
            assertThat(stat.getGroup(), is("supergroup"));
            assertThat(stat.getPermission().toString(), is("rw-r--r--"));
        }

        @Test
        public void fileStatusForDirectory() throws IOException {
            Path dir = new Path("/dir"); // XXX new Path creates the directory
            FileStatus stat = fs.getFileStatus(dir);
            assertThat(stat.getPath().toUri().getPath(), is("/dir"));
            assertThat(stat.isDir(), is(true));
            assertThat(stat.getLen(), is(0L));
            assertThat(stat.getModificationTime(), is(lessThanOrEqualTo(System.currentTimeMillis())));
            assertThat(stat.getReplication(), is((short) 0));
            assertThat(stat.getBlockSize(), is(0L));
            assertThat(stat.getOwner(), is(System.getProperty("user.name")));
            assertThat(stat.getGroup(), is("supergroup"));
            assertThat(stat.getPermission().toString(), is("rwxr-xr-x"));
        }

    }
    // ^^ ShowFileStatusTest

    //=*=*=*=*
    //./ch04/src/main/examples/FileDecompressor.java.input.txt
hadoop FileDecompressor
    file.gz
    //=*=*=*=*
    //./ch04/src/main/examples/MapFileWriteDemo.java.input.txt
    hadoop
    MapFileWriteDemo numbers.
    map
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileMapReduceSort.java.input.txt
    hadoop
    jar $HADOOP_INSTALL/hadoop-*-examples.jar sort-r 1\-
    inFormat org.apache.hadoop.mapred.SequenceFileInputFormat\-outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat\-outKey org.apache.hadoop.io.IntWritable\-
    outValue org.apache.hadoop.io.Text\
  numbers.seq sorted
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileMapReduceSortResults.java.input.txt
    hadoop fs-text sorted/part-00000|head
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileMapReduceSortResults.java.output.txt
    1 Nine,ten,
    a big
    fat hen 2 Seven,eight,
    lay them straight 3 Five,six,
    pick up sticks 4 Three,four,
    shut the door 5 One,two,
    buckle my shoe 6 Nine,ten,
    a big
    fat hen 7 Seven,eight,
    lay them straight 8 Five,six,
    pick up sticks 9 Three,four,
    shut the door 10 One,two,
    buckle my shoe
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileMapReduceSortResults.java.pre.sh
    #
    Produce sorted
    seq file
hadoop SequenceFileWriteDemo
    numbers.seq

    hadoop
    jar $HADOOP_INSTALL/hadoop-*-examples.jar sort-r 1\-
    inFormat org.apache.hadoop.mapred.SequenceFileInputFormat\-outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat\-outKey org.apache.hadoop.io.IntWritable\-
    outValue org.apache.hadoop.io.Text\
    numbers.seq sorted
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileReadDemo.java.input.txt
hadoop SequenceFileReadDemo numbers.seq
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileReadDemo.java.output.txt
    [128]100 One,two,
    buckle my shoe[173]99 Three,four,
    shut the door[220]98 Five,six,
    pick up sticks[264]97 Seven,eight,
    lay them straight[314]96 Nine,ten,
    a big fat hen[359]95 One,two,
    buckle my shoe[404]94 Three,four,
    shut the door[451]93 Five,six,
    pick up sticks[495]92 Seven,eight,
    lay them straight[545]91 Nine,ten,
    a big fat hen[590]90 One,two,
    buckle my shoe...[1976]60 One,two,
    buckle my shoe[2021*]59 Three,four,
    shut the door[2088]58 Five,six,
    pick up sticks[2132]57 Seven,eight,
    lay them straight[2182]56 Nine,ten,
    a big
    fat hen...[4557]5 One,two,
    buckle my shoe[4602]4 Three,four,
    shut the door[4649]3 Five,six,
    pick up sticks[4693]2 Seven,eight,
    lay them straight[4743]1 Nine,ten,
    a big
    fat hen
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileReadDemo.java.pre.sh
    #
    Make sure
    file is
    there to
    be read
    hadoop SequenceFileWriteDemo
    numbers.seq
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileToMapFileConverter-fix.java.input.txt
    hadoop
    MapFileFixer numbers.
    map
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileToMapFileConverter-mv.java.input.txt
    hadoop fs-
    mv numbers.map/part-00000 numbers.map/
    data
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileToMapFileConverter-sort.java.input.txt
    hadoop
    jar $HADOOP_INSTALL/hadoop-*-
    examples.jar sort-r 1\-
    inFormat org.apache.hadoop.mapred.SequenceFileInputFormat\-
    outFormat org.apache.hadoop.mapred.SequenceFileOutputFormat\-
    outKey org.apache.hadoop.io.IntWritable\-
    outValue org.apache.hadoop.io.Text\
    numbers.seq numbers.
    map
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileWriteDemo.java.input.txt
    hadoop
    SequenceFileWriteDemo numbers.seq
    //=*=*=*=*
    //./ch04/src/main/examples/SequenceFileWriteDemo.java.output.txt
    [128]100 One,two,
    buckle my shoe[173]99 Three,four,
    shut the door[220]98 Five,six,
    pick up sticks[264]97 Seven,eight,
    lay them straight[314]96 Nine,ten,
    a big fat hen[359]95 One,two,
    buckle my shoe[404]94 Three,four,
    shut the door[451]93 Five,six,
    pick up sticks[495]92 Seven,eight,
    lay them straight[545]91 Nine,ten,
    a big
    fat hen...[1976]60 One,two,
    buckle my shoe[2021]59 Three,four,
    shut the door[2088]58 Five,six,
    pick up sticks[2132]57 Seven,eight,
    lay them straight[2182]56 Nine,ten,
    a big
    fat hen...[4557]5 One,two,
    buckle my shoe[4602]4 Three,four,
    shut the door[4649]3 Five,six,
    pick up sticks[4693]2 Seven,eight,
    lay them straight[4743]1 Nine,ten,
    a big
    fat hen
    //=*=*=*=*
    //./ch04/src/main/examples/StreamCompressor.java.input.txt
    echo"Text"|
    hadoop StreamCompressor org.apache.hadoop.io.compress.GzipCodec\|gunzip-
    //=*=*=*=*
    //./ch04/src/main/examples/StreamCompressor.java.output.txt
    Text
    //=*=*=*=*
    //./ch04/src/main/examples/TextIterator.java.input.txt
    hadoop TextIterator
    //=*=*=*=*
    //./ch04/src/main/examples/TextIterator.java.output.txt
    41 df 6771 10400
    //=*=*=*=*
    //./ch04/src/main/java/FileDecompressor.java
    // cc FileDecompressor A program to decompress a compressed file using a codec inferred from the file's extension
    import java.io.InputStream;
    import java.io.OutputStream;
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.CompressionCodecFactory;

    // vv FileDecompressor
    public class FileDecompressor {

        public static void main(String[] args) throws Exception {
            String uri = args[0];
            Configuration conf = new Configuration();
            FileSystem fs = FileSystem.get(URI.create(uri), conf);

            Path inputPath = new Path(uri);
            CompressionCodecFactory factory = new CompressionCodecFactory(conf);
            CompressionCodec codec = factory.getCodec(inputPath);
            if (codec == null) {
                System.err.println("No codec found for " + uri);
                System.exit(1);
            }

            String outputUri = CompressionCodecFactory.removeSuffix(uri, codec.getDefaultExtension());

            InputStream in = null;
            OutputStream out = null;
            try {
                in = codec.createInputStream(fs.open(inputPath));
                out = fs.create(new Path(outputUri));
                IOUtils.copyBytes(in, out, conf);
            } finally {
                IOUtils.closeStream(in);
                IOUtils.closeStream(out);
            }
        }
    }
    // ^^ FileDecompressor

    //=*=*=*=*
    //./ch04/src/main/java/IntPair.java
    import java.io.*;

    import org.apache.hadoop.io.*;

    public class IntPair implements WritableComparable<IntPair> {

        private int first;
        private int second;

        public IntPair() {
        }

        public IntPair(int first, int second) {
            set(first, second);
        }

        public void set(int first, int second) {
            this.first = first;
            this.second = second;
        }

        public int getFirst() {
            return first;
        }

        public int getSecond() {
            return second;
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeInt(first);
            out.writeInt(second);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            first = in.readInt();
            second = in.readInt();
        }

        @Override
        public int hashCode() {
            return first * 163 + second;
        }

        @Override
        public boolean equals(Object o) {
            if (o instanceof IntPair) {
                IntPair ip = (IntPair) o;
                return first == ip.first && second == ip.second;
            }
            return false;
        }

        @Override
        public String toString() {
            return first + "\t" + second;
        }

        @Override
        public int compareTo(IntPair ip) {
            int cmp = compare(first, ip.first);
            if (cmp != 0) {
                return cmp;
            }
            return compare(second, ip.second);
        }

        /**
         * Convenience method for comparing two ints.
         */
        public static int compare(int a, int b) {
            return (a < b ? -1 : (a == b ? 0 : 1));
        }

    }

    //=*=*=*=*
    //./ch04/src/main/java/MapFileFixer.java
    // cc MapFileFixer Re-creates the index for a MapFile
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.MapFile;
    import org.apache.hadoop.io.SequenceFile;

    // vv MapFileFixer
    public class MapFileFixer {

        public static void main(String[] args) throws Exception {
            String mapUri = args[0];

            Configuration conf = new Configuration();

            FileSystem fs = FileSystem.get(URI.create(mapUri), conf);
            Path map = new Path(mapUri);
            Path mapData = new Path(map, MapFile.DATA_FILE_NAME);

            // Get key and value types from data sequence file
            SequenceFile.Reader reader = new SequenceFile.Reader(fs, mapData, conf);
            Class keyClass = reader.getKeyClass();
            Class valueClass = reader.getValueClass();
            reader.close();

            // Create the map file index file
            long entries = MapFile.fix(fs, map, keyClass, valueClass, false, conf);
            System.out.printf("Created MapFile %s with %d entries\n", map, entries);
        }
    }
    // ^^ MapFileFixer

    //=*=*=*=*
    //./ch04/src/main/java/MapFileWriteDemo.java
    // cc MapFileWriteDemo Writing a MapFile
    import java.io.IOException;
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.MapFile;
    import org.apache.hadoop.io.Text;

    // vv MapFileWriteDemo
    public class MapFileWriteDemo {

        private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door",
                "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

        public static void main(String[] args) throws IOException {
            String uri = args[0];
            Configuration conf = new Configuration();
            FileSystem fs = FileSystem.get(URI.create(uri), conf);

            IntWritable key = new IntWritable();
            Text value = new Text();
            MapFile.Writer writer = null;
            try {
                writer = new MapFile.Writer(conf, fs, uri, key.getClass(), value.getClass());

                for (int i = 0; i < 1024; i++) {
                    key.set(i + 1);
                    value.set(DATA[i % DATA.length]);
                    writer.append(key, value);
                }
            } finally {
                IOUtils.closeStream(writer);
            }
        }
    }
    // ^^ MapFileWriteDemo

    //=*=*=*=*
    //./ch04/src/main/java/MaxTemperatureWithCompression.java
    // cc MaxTemperatureWithCompression Application to run the maximum temperature job producing compressed output
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

    //vv MaxTemperatureWithCompression
    public class MaxTemperatureWithCompression {

        public static void main(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.println("Usage: MaxTemperatureWithCompression <input path> " + "<output path>");
                System.exit(-1);
            }

            Job job = new Job();
            job.setJarByClass(MaxTemperature.class);

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            /*[*/FileOutputFormat.setCompressOutput(job, true);
            FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);/*]*/

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    //^^ MaxTemperatureWithCompression

    //=*=*=*=*
    //./ch04/src/main/java/MaxTemperatureWithMapOutputCompression.java
    // == MaxTemperatureWithMapOutputCompression
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

    public class MaxTemperatureWithMapOutputCompression {

        public static void main(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.println("Usage: MaxTemperatureWithMapOutputCompression " + "<input path> <output path>");
                System.exit(-1);
            }

            // vv MaxTemperatureWithMapOutputCompression
            Configuration conf = new Configuration();
            conf.setBoolean("mapred.compress.map.output", true);
            conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);
            Job job = new Job(conf);
            // ^^ MaxTemperatureWithMapOutputCompression
            job.setJarByClass(MaxTemperature.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }

    //=*=*=*=*
    //./ch04/src/main/java/PooledStreamCompressor.java
    // cc PooledStreamCompressor A program to compress data read from standard input and write it to standard output using a pooled compressor
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.compress.*;
    import org.apache.hadoop.util.ReflectionUtils;

    // vv PooledStreamCompressor
    public class PooledStreamCompressor {

        public static void main(String[] args) throws Exception {
            String codecClassname = args[0];
            Class<?> codecClass = Class.forName(codecClassname);
            Configuration conf = new Configuration();
            CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf);
            /*[*/Compressor compressor = null;
            try {
                compressor = CodecPool.getCompressor(codec);/*]*/
                CompressionOutputStream out = codec.createOutputStream(System.out, /*[*/compressor/*]*/);
                IOUtils.copyBytes(System.in, out, 4096, false);
                out.finish();
                /*[*/} finally {
                CodecPool.returnCompressor(compressor);
            } /*]*/
        }
    }
    // ^^ PooledStreamCompressor

    //=*=*=*=*
    //./ch04/src/main/java/SequenceFileReadDemo.java
    // cc SequenceFileReadDemo Reading a SequenceFile
    import java.io.IOException;
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.SequenceFile;
    import org.apache.hadoop.io.Writable;
    import org.apache.hadoop.util.ReflectionUtils;

    // vv SequenceFileReadDemo
    public class SequenceFileReadDemo {

        public static void main(String[] args) throws IOException {
            String uri = args[0];
            Configuration conf = new Configuration();
            FileSystem fs = FileSystem.get(URI.create(uri), conf);
            Path path = new Path(uri);

            SequenceFile.Reader reader = null;
            try {
                reader = new SequenceFile.Reader(fs, path, conf);
                Writable key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
                Writable value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);
                long position = reader.getPosition();
                while (reader.next(key, value)) {
                    String syncSeen = reader.syncSeen() ? "*" : "";
                    System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key, value);
                    position = reader.getPosition(); // beginning of next record
                }
            } finally {
                IOUtils.closeStream(reader);
            }
        }
    }
    // ^^ SequenceFileReadDemo
    //=*=*=*=*
    //./ch04/src/main/java/SequenceFileWriteDemo.java
    // cc SequenceFileWriteDemo Writing a SequenceFile
    import java.io.IOException;
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.SequenceFile;
    import org.apache.hadoop.io.Text;

    // vv SequenceFileWriteDemo
    public class SequenceFileWriteDemo {

        private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door",
                "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" };

        public static void main(String[] args) throws IOException {
            String uri = args[0];
            Configuration conf = new Configuration();
            FileSystem fs = FileSystem.get(URI.create(uri), conf);
            Path path = new Path(uri);

            IntWritable key = new IntWritable();
            Text value = new Text();
            SequenceFile.Writer writer = null;
            try {
                writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass());

                for (int i = 0; i < 100; i++) {
                    key.set(100 - i);
                    value.set(DATA[i % DATA.length]);
                    System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value);
                    writer.append(key, value);
                }
            } finally {
                IOUtils.closeStream(writer);
            }
        }
    }
    // ^^ SequenceFileWriteDemo

    //=*=*=*=*
    //./ch04/src/main/java/StreamCompressor.java
    // cc StreamCompressor A program to compress data read from standard input and write it to standard output
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.CompressionOutputStream;
    import org.apache.hadoop.util.ReflectionUtils;

    // vv StreamCompressor
    public class StreamCompressor {

        public static void main(String[] args) throws Exception {
            String codecClassname = args[0];
            Class<?> codecClass = Class.forName(codecClassname);
            Configuration conf = new Configuration();
            CompressionCodec codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf);

            CompressionOutputStream out = codec.createOutputStream(System.out);
            IOUtils.copyBytes(System.in, out, 4096, false);
            out.finish();
        }
    }
    // ^^ StreamCompressor

    //=*=*=*=*
    //./ch04/src/main/java/TextArrayWritable.java
    // == TextArrayWritable
    import org.apache.hadoop.io.ArrayWritable;
    import org.apache.hadoop.io.Text;

    // vv TextArrayWritable
    public class TextArrayWritable extends ArrayWritable {
        public TextArrayWritable() {
            super(Text.class);
        }
    }
    // ^^ TextArrayWritable

    //=*=*=*=*
    //./ch04/src/main/java/TextIterator.java
    // cc TextIterator Iterating over the characters in a Text object
    import java.nio.ByteBuffer;

    import org.apache.hadoop.io.Text;

    // vv TextIterator
    public class TextIterator {

        public static void main(String[] args) {
            Text t = new Text("\u0041\u00DF\u6771\uD801\uDC00");

            ByteBuffer buf = ByteBuffer.wrap(t.getBytes(), 0, t.getLength());
            int cp;
            while (buf.hasRemaining() && (cp = Text.bytesToCodePoint(buf)) != -1) {
                System.out.println(Integer.toHexString(cp));
            }
        }
    }
    // ^^ TextIterator

    //=*=*=*=*
    //./ch04/src/main/java/TextPair.java
    // cc TextPair A Writable implementation that stores a pair of Text objects
    // cc TextPairComparator A RawComparator for comparing TextPair byte representations
    // cc TextPairFirstComparator A custom RawComparator for comparing the first field of TextPair byte representations
    // vv TextPair
    import java.io.*;

    import org.apache.hadoop.io.*;

    public class TextPair implements WritableComparable<TextPair> {

        private Text first;
        private Text second;

        public TextPair() {
            set(new Text(), new Text());
        }

        public TextPair(String first, String second) {
            set(new Text(first), new Text(second));
        }

        public TextPair(Text first, Text second) {
            set(first, second);
        }

        public void set(Text first, Text second) {
            this.first = first;
            this.second = second;
        }

        public Text getFirst() {
            return first;
        }

        public Text getSecond() {
            return second;
        }

        @Override
        public void write(DataOutput out) throws IOException {
            first.write(out);
            second.write(out);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            first.readFields(in);
            second.readFields(in);
        }

        @Override
        public int hashCode() {
            return first.hashCode() * 163 + second.hashCode();
        }

        @Override
        public boolean equals(Object o) {
            if (o instanceof TextPair) {
                TextPair tp = (TextPair) o;
                return first.equals(tp.first) && second.equals(tp.second);
            }
            return false;
        }

        @Override
        public String toString() {
            return first + "\t" + second;
        }

        @Override
        public int compareTo(TextPair tp) {
            int cmp = first.compareTo(tp.first);
            if (cmp != 0) {
                return cmp;
            }
            return second.compareTo(tp.second);
        }
        // ^^ TextPair

        // vv TextPairComparator
        public static class Comparator extends WritableComparator {

            private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();

            public Comparator() {
                super(TextPair.class);
            }

            @Override
            public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

                try {
                    int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
                    int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
                    int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
                    if (cmp != 0) {
                        return cmp;
                    }
                    return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1, b2, s2 + firstL2, l2 - firstL2);
                } catch (IOException e) {
                    throw new IllegalArgumentException(e);
                }
            }
        }

        static {
            WritableComparator.define(TextPair.class, new Comparator());
        }
        // ^^ TextPairComparator

        // vv TextPairFirstComparator
        public static class FirstComparator extends WritableComparator {

            private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();

            public FirstComparator() {
                super(TextPair.class);
            }

            @Override
            public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

                try {
                    int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
                    int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
                    return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
                } catch (IOException e) {
                    throw new IllegalArgumentException(e);
                }
            }

            @Override
            public int compare(WritableComparable a, WritableComparable b) {
                if (a instanceof TextPair && b instanceof TextPair) {
                    return ((TextPair) a).first.compareTo(((TextPair) b).first);
                }
                return super.compare(a, b);
            }
        }
        // ^^ TextPairFirstComparator

        // vv TextPair
    }
    // ^^ TextPair

    //=*=*=*=*
    //./ch04/src/main/java/oldapi/IntPair.java
    package oldapi;
    import java.io.*;

    import org.apache.hadoop.io.*;

    public class IntPair implements WritableComparable<IntPair> {

        private int first;
        private int second;

        public IntPair() {
        }

        public IntPair(int first, int second) {
            set(first, second);
        }

        public void set(int first, int second) {
            this.first = first;
            this.second = second;
        }

        public int getFirst() {
            return first;
        }

        public int getSecond() {
            return second;
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeInt(first);
            out.writeInt(second);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            first = in.readInt();
            second = in.readInt();
        }

        @Override
        public int hashCode() {
            return first * 163 + second;
        }

        @Override
        public boolean equals(Object o) {
            if (o instanceof IntPair) {
                IntPair ip = (IntPair) o;
                return first == ip.first && second == ip.second;
            }
            return false;
        }

        @Override
        public String toString() {
            return first + "\t" + second;
        }

        @Override
        public int compareTo(IntPair ip) {
            int cmp = compare(first, ip.first);
            if (cmp != 0) {
                return cmp;
            }
            return compare(second, ip.second);
        }

        /**
         * Convenience method for comparing two ints.
         */
        public static int compare(int a, int b) {
            return (a < b ? -1 : (a == b ? 0 : 1));
        }

    }

    //=*=*=*=*
    //./ch04/src/main/java/oldapi/MaxTemperatureWithCompression.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;

    public class MaxTemperatureWithCompression {

        public static void main(String[] args) throws IOException {
            if (args.length != 2) {
                System.err.println("Usage: MaxTemperatureWithCompression <input path> " + "<output path>");
                System.exit(-1);
            }

            JobConf conf = new JobConf(MaxTemperatureWithCompression.class);
            conf.setJobName("Max temperature with output compression");

            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            /*[*/FileOutputFormat.setCompressOutput(conf, true);
            FileOutputFormat.setOutputCompressorClass(conf, GzipCodec.class);/*]*/

            conf.setMapperClass(MaxTemperatureMapper.class);
            conf.setCombinerClass(MaxTemperatureReducer.class);
            conf.setReducerClass(MaxTemperatureReducer.class);

            JobClient.runJob(conf);
        }
    }

    //=*=*=*=*
    //./ch04/src/main/java/oldapi/MaxTemperatureWithMapOutputCompression.java
    // == OldMaxTemperatureWithMapOutputCompression
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.compress.*;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;

    public class MaxTemperatureWithMapOutputCompression {

        public static void main(String[] args) throws IOException {
            if (args.length != 2) {
                System.err.println("Usage: MaxTemperatureWithMapOutputCompression " + "<input path> <output path>");
                System.exit(-1);
            }

            JobConf conf = new JobConf(MaxTemperatureWithCompression.class);
            conf.setJobName("Max temperature with map output compression");

            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            // vv OldMaxTemperatureWithMapOutputCompression
            conf.setCompressMapOutput(true);
            conf.setMapOutputCompressorClass(GzipCodec.class);
            // ^^ OldMaxTemperatureWithMapOutputCompression

            conf.setMapperClass(MaxTemperatureMapper.class);
            conf.setCombinerClass(MaxTemperatureReducer.class);
            conf.setReducerClass(MaxTemperatureReducer.class);

            JobClient.runJob(conf);
        }

    }

    //=*=*=*=*
    //./ch04/src/main/java/oldapi/TextPair.java
    package oldapi;
    // cc TextPair A Writable implementation that stores a pair of Text objects
    // cc TextPairComparator A RawComparator for comparing TextPair byte representations
    // cc TextPairFirstComparator A custom RawComparator for comparing the first field of TextPair byte representations
    // vv TextPair
    import java.io.*;

    import org.apache.hadoop.io.*;

    public class TextPair implements WritableComparable<TextPair> {

        private Text first;
        private Text second;

        public TextPair() {
            set(new Text(), new Text());
        }

        public TextPair(String first, String second) {
            set(new Text(first), new Text(second));
        }

        public TextPair(Text first, Text second) {
            set(first, second);
        }

        public void set(Text first, Text second) {
            this.first = first;
            this.second = second;
        }

        public Text getFirst() {
            return first;
        }

        public Text getSecond() {
            return second;
        }

        @Override
        public void write(DataOutput out) throws IOException {
            first.write(out);
            second.write(out);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            first.readFields(in);
            second.readFields(in);
        }

        @Override
        public int hashCode() {
            return first.hashCode() * 163 + second.hashCode();
        }

        @Override
        public boolean equals(Object o) {
            if (o instanceof TextPair) {
                TextPair tp = (TextPair) o;
                return first.equals(tp.first) && second.equals(tp.second);
            }
            return false;
        }

        @Override
        public String toString() {
            return first + "\t" + second;
        }

        @Override
        public int compareTo(TextPair tp) {
            int cmp = first.compareTo(tp.first);
            if (cmp != 0) {
                return cmp;
            }
            return second.compareTo(tp.second);
        }
        // ^^ TextPair

        // vv TextPairComparator
        public static class Comparator extends WritableComparator {

            private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();

            public Comparator() {
                super(TextPair.class);
            }

            @Override
            public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

                try {
                    int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
                    int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
                    int cmp = TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
                    if (cmp != 0) {
                        return cmp;
                    }
                    return TEXT_COMPARATOR.compare(b1, s1 + firstL1, l1 - firstL1, b2, s2 + firstL2, l2 - firstL2);
                } catch (IOException e) {
                    throw new IllegalArgumentException(e);
                }
            }
        }

        static {
            WritableComparator.define(TextPair.class, new Comparator());
        }
        // ^^ TextPairComparator

        // vv TextPairFirstComparator
        public static class FirstComparator extends WritableComparator {

            private static final Text.Comparator TEXT_COMPARATOR = new Text.Comparator();

            public FirstComparator() {
                super(TextPair.class);
            }

            @Override
            public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

                try {
                    int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
                    int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
                    return TEXT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
                } catch (IOException e) {
                    throw new IllegalArgumentException(e);
                }
            }

            @Override
            public int compare(WritableComparable a, WritableComparable b) {
                if (a instanceof TextPair && b instanceof TextPair) {
                    return ((TextPair) a).first.compareTo(((TextPair) b).first);
                }
                return super.compare(a, b);
            }
        }
        // ^^ TextPairFirstComparator

        // vv TextPair
    }
    // ^^ TextPair

    //=*=*=*=*
    //./ch04/src/test/java/ArrayWritableTest.java
    // == ArrayWritableTest
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.junit.Test;

    public class ArrayWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            // vv ArrayWritableTest
            ArrayWritable writable = new ArrayWritable(Text.class);
            // ^^ ArrayWritableTest
            writable.set(new Text[] { new Text("cat"), new Text("dog") });

            TextArrayWritable dest = new TextArrayWritable();
            WritableUtils.cloneInto(dest, writable);
            assertThat(dest.get().length, is(2));
            // TODO: fix cast, also use single assert
            assertThat((Text) dest.get()[0], is(new Text("cat")));
            assertThat((Text) dest.get()[1], is(new Text("dog")));

            Text[] copy = (Text[]) dest.toArray();
            assertThat(copy[0], is(new Text("cat")));
            assertThat(copy[1], is(new Text("dog")));
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/BinaryOrTextWritable.java
    import org.apache.hadoop.io.BytesWritable;
    import org.apache.hadoop.io.GenericWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.Writable;

    public class BinaryOrTextWritable extends GenericWritable {
        private static Class[] TYPES = { BytesWritable.class, Text.class };

        @Override
        protected Class<? extends Writable>[] getTypes() {
            return TYPES;
        }

    }

    //=*=*=*=*
    //./ch04/src/test/java/BooleanWritableTest.java
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.BooleanWritable;
    import org.junit.Test;

    public class BooleanWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            BooleanWritable src = new BooleanWritable(true);
            BooleanWritable dest = new BooleanWritable();
            assertThat(writeTo(src, dest), is("01"));
            assertThat(dest.get(), is(src.get()));
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/BytesWritableTest.java
    // == BytesWritableTest
    // == BytesWritableTest-Capacity
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.BytesWritable;
    import org.apache.hadoop.util.StringUtils;
    import org.junit.Test;

    public class BytesWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            // vv BytesWritableTest
            BytesWritable b = new BytesWritable(new byte[] { 3, 5 });
            byte[] bytes = serialize(b);
            assertThat(StringUtils.byteToHexString(bytes), is("000000020305"));
            // ^^ BytesWritableTest

            // vv BytesWritableTest-Capacity
            b.setCapacity(11);
            assertThat(b.getLength(), is(2));
            assertThat(b.getBytes().length, is(11));
            // ^^ BytesWritableTest-Capacity
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/FileDecompressorTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.*;
    import java.util.Scanner;
    import org.apache.hadoop.fs.FileUtil;
    import org.apache.hadoop.io.IOUtils;
    import org.junit.Test;

    public class FileDecompressorTest {

        @Test
        public void decompressesGzippedFile() throws Exception {
            File file = File.createTempFile("file", ".gz");
            file.deleteOnExit();
            InputStream in = this.getClass().getResourceAsStream("/file.gz");
            IOUtils.copyBytes(in, new FileOutputStream(file), 4096, true);

            String path = file.getAbsolutePath();
            FileDecompressor.main(new String[] { path });

            String decompressedPath = path.substring(0, path.length() - 3);
            assertThat(readFile(new File(decompressedPath)), is("Text\n"));
        }

        private String readFile(File file) throws IOException {
            return new Scanner(file).useDelimiter("\\A").next();
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/GenericWritableTest.java
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.junit.Test;

    public class GenericWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            BinaryOrTextWritable src = new BinaryOrTextWritable();
            src.set(new Text("text"));
            BinaryOrTextWritable dest = new BinaryOrTextWritable();
            WritableUtils.cloneInto(dest, src);
            assertThat((Text) dest.get(), is(new Text("text")));

            src.set(new BytesWritable(new byte[] { 3, 5 }));
            WritableUtils.cloneInto(dest, src);
            assertThat(((BytesWritable) dest.get()).getLength(), is(2)); // TODO proper assert
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/IntPairTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;

    import org.apache.hadoop.io.*;
    import org.junit.Test;

    public class IntPairTest extends WritableTestBase {

        private IntPair ip1 = new IntPair(1, 2);
        private IntPair ip2 = new IntPair(2, 1);
        private IntPair ip3 = new IntPair(1, 12);
        private IntPair ip4 = new IntPair(11, 2);
        private IntPair ip5 = new IntPair(Integer.MAX_VALUE, 2);
        private IntPair ip6 = new IntPair(Integer.MAX_VALUE, Integer.MAX_VALUE);

        @Test
        public void testComparator() throws IOException {
            check(ip1, ip1, 0);
            check(ip1, ip2, -1);
            check(ip3, ip4, -1);
            check(ip2, ip4, -1);
            check(ip3, ip5, -1);
            check(ip5, ip6, -1);
        }

        private void check(IntPair ip1, IntPair ip2, int c) throws IOException {
            check(WritableComparator.get(IntPair.class), ip1, ip2, c);
        }

        private void check(RawComparator comp, IntPair ip1, IntPair ip2, int c) throws IOException {
            checkOnce(comp, ip1, ip2, c);
            checkOnce(comp, ip2, ip1, -c);
        }

        private void checkOnce(RawComparator comp, IntPair ip1, IntPair ip2, int c) throws IOException {
            assertThat("Object", signum(comp.compare(ip1, ip2)), is(c));
            byte[] out1 = serialize(ip1);
            byte[] out2 = serialize(ip2);
            assertThat("Raw", signum(comp.compare(out1, 0, out1.length, out2, 0, out2.length)), is(c));
        }

        private int signum(int i) {
            return i < 0 ? -1 : (i == 0 ? 0 : 1);
        }

    }

    //=*=*=*=*
    //./ch04/src/test/java/IntWritableTest.java
    // == IntWritableTest
    // == IntWritableTest-ValueConstructor
    // == IntWritableTest-SerializedLength
    // == IntWritableTest-SerializedBytes
    // == IntWritableTest-Deserialization
    // == IntWritableTest-Comparator
    // == IntWritableTest-ObjectComparison
    // == IntWritableTest-BytesComparison
    import static org.hamcrest.Matchers.*;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.util.StringUtils;
    import org.junit.Test;

    public class IntWritableTest extends WritableTestBase {

        @Test
        public void walkthroughWithNoArgsConstructor() throws IOException {
            // vv IntWritableTest
            IntWritable writable = new IntWritable();
            writable.set(163);
            // ^^ IntWritableTest
            checkWalkthrough(writable);
        }

        @Test
        public void walkthroughWithValueConstructor() throws IOException {
            // vv IntWritableTest-ValueConstructor
            IntWritable writable = new IntWritable(163);
            // ^^ IntWritableTest-ValueConstructor
            checkWalkthrough(writable);
        }

        private void checkWalkthrough(IntWritable writable) throws IOException {
            // vv IntWritableTest-SerializedLength
            byte[] bytes = serialize(writable);
            assertThat(bytes.length, is(4));
            // ^^ IntWritableTest-SerializedLength

            // vv IntWritableTest-SerializedBytes
            assertThat(StringUtils.byteToHexString(bytes), is("000000a3"));
            // ^^ IntWritableTest-SerializedBytes

            // vv IntWritableTest-Deserialization
            IntWritable newWritable = new IntWritable();
            deserialize(newWritable, bytes);
            assertThat(newWritable.get(), is(163));
            // ^^ IntWritableTest-Deserialization
        }

        @Test
        public void comparator() throws IOException {
            // vv IntWritableTest-Comparator
            RawComparator<IntWritable> comparator = WritableComparator.get(IntWritable.class);
            // ^^ IntWritableTest-Comparator

            // vv IntWritableTest-ObjectComparison
            IntWritable w1 = new IntWritable(163);
            IntWritable w2 = new IntWritable(67);
            assertThat(comparator.compare(w1, w2), greaterThan(0));
            // ^^ IntWritableTest-ObjectComparison

            // vv IntWritableTest-BytesComparison
            byte[] b1 = serialize(w1);
            byte[] b2 = serialize(w2);
            assertThat(comparator.compare(b1, 0, b1.length, b2, 0, b2.length), greaterThan(0));
            // ^^ IntWritableTest-BytesComparison
        }

        @Test
        public void test() throws IOException {
            IntWritable src = new IntWritable(163);
            IntWritable dest = new IntWritable();
            assertThat(writeTo(src, dest), is("000000a3"));
            assertThat(dest.get(), is(src.get()));
        }

    }

    //=*=*=*=*
    //./ch04/src/test/java/MapFileSeekTest.java
    // == MapFileSeekTest
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import java.net.URI;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.util.ReflectionUtils;
    import org.junit.*;

    public class MapFileSeekTest {

        private static final String MAP_URI = "test.numbers.map";
        private FileSystem fs;
        private MapFile.Reader reader;
        private WritableComparable<?> key;
        private Writable value;

        @Before
        public void setUp() throws IOException {
            MapFileWriteDemo.main(new String[] { MAP_URI });

            Configuration conf = new Configuration();
            fs = FileSystem.get(URI.create(MAP_URI), conf);

            reader = new MapFile.Reader(fs, MAP_URI, conf);
            key = (WritableComparable<?>) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
            value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);
        }

        @After
        public void tearDown() throws IOException {
            fs.delete(new Path(MAP_URI), true);
        }

        @Test
        public void get() throws Exception {
            // vv MapFileSeekTest
            Text value = new Text();
            reader.get(new IntWritable(496), value);
            assertThat(value.toString(), is("One, two, buckle my shoe"));
            // ^^ MapFileSeekTest
        }

        @Test
        public void seek() throws Exception {
            assertThat(reader.seek(new IntWritable(496)), is(true));
            assertThat(reader.next(key, value), is(true));
            assertThat(((IntWritable) key).get(), is(497));
            assertThat(((Text) value).toString(), is("Three, four, shut the door"));
        }

    }

    //=*=*=*=*
    //./ch04/src/test/java/MapWritableTest.java
    // == MapWritableTest
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.junit.Test;

    public class MapWritableTest extends WritableTestBase {

        @Test
        public void mapWritable() throws IOException {
            // vv MapWritableTest
            MapWritable src = new MapWritable();
            src.put(new IntWritable(1), new Text("cat"));
            src.put(new VIntWritable(2), new LongWritable(163));

            MapWritable dest = new MapWritable();
            WritableUtils.cloneInto(dest, src);
            assertThat((Text) dest.get(new IntWritable(1)), is(new Text("cat")));
            assertThat((LongWritable) dest.get(new VIntWritable(2)), is(new LongWritable(163)));
            // ^^ MapWritableTest
        }

        @Test
        public void setWritableEmulation() throws IOException {
            MapWritable src = new MapWritable();
            src.put(new IntWritable(1), NullWritable.get());
            src.put(new IntWritable(2), NullWritable.get());

            MapWritable dest = new MapWritable();
            WritableUtils.cloneInto(dest, src);
            assertThat(dest.containsKey(new IntWritable(1)), is(true));
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/NullWritableTest.java
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.NullWritable;
    import org.junit.Test;

    public class NullWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            NullWritable writable = NullWritable.get();
            assertThat(serialize(writable).length, is(0));
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/ObjectWritableTest.java
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.junit.Test;

    public class ObjectWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            ObjectWritable src = new ObjectWritable(Integer.TYPE, 163);
            ObjectWritable dest = new ObjectWritable();
            WritableUtils.cloneInto(dest, src);
            assertThat((Integer) dest.get(), is(163));
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/SequenceFileSeekAndSyncTest.java
    // == SequenceFileSeekAndSyncTest
    // == SequenceFileSeekAndSyncTest-SeekNonRecordBoundary
    // == SequenceFileSeekAndSyncTest-SyncNonRecordBoundary
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import java.net.URI;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.util.ReflectionUtils;
    import org.junit.*;

    public class SequenceFileSeekAndSyncTest {

        private static final String SF_URI = "test.numbers.seq";
        private FileSystem fs;
        private SequenceFile.Reader reader;
        private Writable key;
        private Writable value;

        @Before
        public void setUp() throws IOException {
            SequenceFileWriteDemo.main(new String[] { SF_URI });

            Configuration conf = new Configuration();
            fs = FileSystem.get(URI.create(SF_URI), conf);
            Path path = new Path(SF_URI);

            reader = new SequenceFile.Reader(fs, path, conf);
            key = (Writable) ReflectionUtils.newInstance(reader.getKeyClass(), conf);
            value = (Writable) ReflectionUtils.newInstance(reader.getValueClass(), conf);
        }

        @After
        public void tearDown() throws IOException {
            fs.delete(new Path(SF_URI), true);
        }

        @Test
        public void seekToRecordBoundary() throws IOException {
            // vv SequenceFileSeekAndSyncTest
            reader.seek(359);
            assertThat(reader.next(key, value), is(true));
            assertThat(((IntWritable) key).get(), is(95));
            // ^^ SequenceFileSeekAndSyncTest
        }

        @Test(expected = IOException.class)
        public void seekToNonRecordBoundary() throws IOException {
            // vv SequenceFileSeekAndSyncTest-SeekNonRecordBoundary
            reader.seek(360);
            reader.next(key, value); // fails with IOException
            // ^^ SequenceFileSeekAndSyncTest-SeekNonRecordBoundary
        }

        @Test
        public void syncFromNonRecordBoundary() throws IOException {
            // vv SequenceFileSeekAndSyncTest-SyncNonRecordBoundary
            reader.sync(360);
            assertThat(reader.getPosition(), is(2021L));
            assertThat(reader.next(key, value), is(true));
            assertThat(((IntWritable) key).get(), is(59));
            // ^^ SequenceFileSeekAndSyncTest-SyncNonRecordBoundary
        }

        @Test
        public void syncAfterLastSyncPoint() throws IOException {
            reader.sync(4557);
            assertThat(reader.getPosition(), is(4788L));
            assertThat(reader.next(key, value), is(false));
        }

    }

    //=*=*=*=*
    //./ch04/src/test/java/StringTextComparisonTest.java
    // cc StringTextComparisonTest Tests showing the differences between the String and Text classes
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.*;

    import org.apache.hadoop.io.Text;
    import org.junit.Test;

    // vv StringTextComparisonTest
    public class StringTextComparisonTest {

        @Test
        public void string() throws UnsupportedEncodingException {

            String s = "\u0041\u00DF\u6771\uD801\uDC00";
            assertThat(s.length(), is(5));
            assertThat(s.getBytes("UTF-8").length, is(10));

            assertThat(s.indexOf("\u0041"), is(0));
            assertThat(s.indexOf("\u00DF"), is(1));
            assertThat(s.indexOf("\u6771"), is(2));
            assertThat(s.indexOf("\uD801\uDC00"), is(3));

            assertThat(s.charAt(0), is('\u0041'));
            assertThat(s.charAt(1), is('\u00DF'));
            assertThat(s.charAt(2), is('\u6771'));
            assertThat(s.charAt(3), is('\uD801'));
            assertThat(s.charAt(4), is('\uDC00'));

            assertThat(s.codePointAt(0), is(0x0041));
            assertThat(s.codePointAt(1), is(0x00DF));
            assertThat(s.codePointAt(2), is(0x6771));
            assertThat(s.codePointAt(3), is(0x10400));
        }

        @Test
        public void text() {

            Text t = new Text("\u0041\u00DF\u6771\uD801\uDC00");
            assertThat(t.getLength(), is(10));

            assertThat(t.find("\u0041"), is(0));
            assertThat(t.find("\u00DF"), is(1));
            assertThat(t.find("\u6771"), is(3));
            assertThat(t.find("\uD801\uDC00"), is(6));

            assertThat(t.charAt(0), is(0x0041));
            assertThat(t.charAt(1), is(0x00DF));
            assertThat(t.charAt(3), is(0x6771));
            assertThat(t.charAt(6), is(0x10400));
        }
    }
    // ^^ StringTextComparisonTest

    //=*=*=*=*
    //./ch04/src/test/java/TextPairTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;

    import org.apache.hadoop.io.*;
    import org.junit.Test;

    public class TextPairTest extends WritableTestBase {

        private TextPair tp1 = new TextPair("a", "b");
        private TextPair tp2 = new TextPair("b", "a");
        private TextPair tp3 = new TextPair("a", "ab");
        private TextPair tp4 = new TextPair("aa", "b");
        private TextPair tp5 = new TextPair(nTimes("a", 128), "b");
        private TextPair tp6 = new TextPair(nTimes("a", 128), nTimes("a", 128));
        private TextPair tp7 = new TextPair(nTimes("a", 128), nTimes("b", 128));

        private static String nTimes(String s, int n) {
            StringBuilder sb = new StringBuilder(n);
            for (int i = 0; i < n; i++) {
                sb.append(s);
            }
            return sb.toString();
        }

        @Test
        public void testComparator() throws IOException {
            check(tp1, tp1, 0);
            check(tp1, tp2, -1);
            check(tp3, tp4, -1);
            check(tp2, tp4, 1);
            check(tp3, tp5, -1);
            check(tp5, tp6, 1);
            check(tp5, tp7, -1);
        }

        @Test
        public void testFirstComparator() throws IOException {
            RawComparator comp = new TextPair.FirstComparator();
            check(comp, tp1, tp1, 0);
            check(comp, tp1, tp2, -1);
            check(comp, tp3, tp4, -1);
            check(comp, tp2, tp4, 1);
            check(comp, tp3, tp5, -1);
            check(comp, tp5, tp6, 0);
            check(comp, tp5, tp7, 0);
        }

        private void check(TextPair tp1, TextPair tp2, int c) throws IOException {
            check(WritableComparator.get(TextPair.class), tp1, tp2, c);
        }

        private void check(RawComparator comp, TextPair tp1, TextPair tp2, int c) throws IOException {
            checkOnce(comp, tp1, tp2, c);
            checkOnce(comp, tp2, tp1, -c);
        }

        private void checkOnce(RawComparator comp, TextPair tp1, TextPair tp2, int c) throws IOException {
            assertThat("Object", signum(comp.compare(tp1, tp2)), is(c));
            byte[] out1 = serialize(tp1);
            byte[] out2 = serialize(tp2);
            assertThat("Raw", signum(comp.compare(out1, 0, out1.length, out2, 0, out2.length)), is(c));
        }

        private int signum(int i) {
            return i < 0 ? -1 : (i == 0 ? 0 : 1);
        }

    }

    //=*=*=*=*
    //./ch04/src/test/java/TextTest.java
    // == TextTest
    // == TextTest-Find
    // == TextTest-Mutability
    // == TextTest-ByteArrayNotShortened
    // == TextTest-ToString
    // == TextTest-Comparison
    import static org.hamcrest.Matchers.*;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.junit.Test;

    public class TextTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            // vv TextTest
            Text t = new Text("hadoop");
            assertThat(t.getLength(), is(6));
            assertThat(t.getBytes().length, is(6));

            assertThat(t.charAt(2), is((int) 'd'));
            assertThat("Out of bounds", t.charAt(100), is(-1));
            // ^^ TextTest
        }

        @Test
        public void find() throws IOException {
            // vv TextTest-Find
            Text t = new Text("hadoop");
            assertThat("Find a substring", t.find("do"), is(2));
            assertThat("Finds first 'o'", t.find("o"), is(3));
            assertThat("Finds 'o' from position 4 or later", t.find("o", 4), is(4));
            assertThat("No match", t.find("pig"), is(-1));
            // ^^ TextTest-Find
        }

        @Test
        public void mutability() throws IOException {
            // vv TextTest-Mutability
            Text t = new Text("hadoop");
            t.set("pig");
            assertThat(t.getLength(), is(3));
            assertThat(t.getBytes().length, is(3));
            // ^^ TextTest-Mutability
        }

        @Test
        public void byteArrayNotShortened() throws IOException {
            // vv TextTest-ByteArrayNotShortened
            Text t = new Text("hadoop");
            t.set(/*[*/new Text("pig")/*]*/);
            assertThat(t.getLength(), is(3));
            assertThat("Byte length not shortened", t.getBytes().length, /*[*/is(6)/*]*/);
            // ^^ TextTest-ByteArrayNotShortened
        }

        @Test
        public void toStringMethod() throws IOException {
            // vv TextTest-ToString
            assertThat(new Text("hadoop").toString(), is("hadoop"));
            // ^^ TextTest-ToString
        }

        @Test
        public void comparison() throws IOException {
            // vv TextTest-Comparison
            assertThat("\ud800\udc00".compareTo("\ue000"), lessThan(0));
            assertThat(new Text("\ud800\udc00").compareTo(new Text("\ue000")), greaterThan(0));
            // ^^ TextTest-Comparison
        }

        @Test
        public void withSupplementaryCharacters() throws IOException {

            String s = "\u0041\u00DF\u6771\uD801\uDC00";
            assertThat(s.length(), is(5));
            assertThat(s.getBytes("UTF-8").length, is(10));

            assertThat(s.indexOf('\u0041'), is(0));
            assertThat(s.indexOf('\u00DF'), is(1));
            assertThat(s.indexOf('\u6771'), is(2));
            assertThat(s.indexOf('\uD801'), is(3));
            assertThat(s.indexOf('\uDC00'), is(4));

            assertThat(s.charAt(0), is('\u0041'));
            assertThat(s.charAt(1), is('\u00DF'));
            assertThat(s.charAt(2), is('\u6771'));
            assertThat(s.charAt(3), is('\uD801'));
            assertThat(s.charAt(4), is('\uDC00'));

            Text t = new Text("\u0041\u00DF\u6771\uD801\uDC00");

            assertThat(serializeToString(t), is("0a41c39fe69db1f0909080"));

            assertThat(t.charAt(t.find("\u0041")), is(0x0041));
            assertThat(t.charAt(t.find("\u00DF")), is(0x00DF));
            assertThat(t.charAt(t.find("\u6771")), is(0x6771));
            assertThat(t.charAt(t.find("\uD801\uDC00")), is(0x10400));

        }

    }

    //=*=*=*=*
    //./ch04/src/test/java/VIntWritableTest.java
    // == VIntWritableTest
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.VIntWritable;
    import org.apache.hadoop.util.StringUtils;
    import org.junit.Test;

    public class VIntWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            // vv VIntWritableTest
            byte[] data = serialize(new VIntWritable(163));
            assertThat(StringUtils.byteToHexString(data), is("8fa3"));
            // ^^ VIntWritableTest
        }

        @Test
        public void testSizes() throws IOException {
            assertThat(serializeToString(new VIntWritable(1)), is("01")); // 1 byte
            assertThat(serializeToString(new VIntWritable(-112)), is("90")); // 1 byte
            assertThat(serializeToString(new VIntWritable(127)), is("7f")); // 1 byte
            assertThat(serializeToString(new VIntWritable(128)), is("8f80")); // 2 byte
            assertThat(serializeToString(new VIntWritable(163)), is("8fa3")); // 2 byte
            assertThat(serializeToString(new VIntWritable(Integer.MAX_VALUE)), is("8c7fffffff")); // 5 byte
            assertThat(serializeToString(new VIntWritable(Integer.MIN_VALUE)), is("847fffffff")); // 5 byte
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/VLongWritableTest.java
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.io.VLongWritable;
    import org.junit.Test;

    public class VLongWritableTest extends WritableTestBase {

        @Test
        public void test() throws IOException {
            assertThat(serializeToString(new VLongWritable(1)), is("01")); // 1 byte
            assertThat(serializeToString(new VLongWritable(127)), is("7f")); // 1 byte
            assertThat(serializeToString(new VLongWritable(128)), is("8f80")); // 2 byte
            assertThat(serializeToString(new VLongWritable(163)), is("8fa3")); // 2 byte
            assertThat(serializeToString(new VLongWritable(Long.MAX_VALUE)), is("887fffffffffffffff")); // 9 byte
            assertThat(serializeToString(new VLongWritable(Long.MIN_VALUE)), is("807fffffffffffffff")); // 9 byte
        }
    }

    //=*=*=*=*
    //./ch04/src/test/java/WritableTestBase.java
    // == WritableTestBase
    // == WritableTestBase-Deserialize
    import java.io.*;
    import org.apache.hadoop.io.Writable;
    import org.apache.hadoop.util.StringUtils;

    public class WritableTestBase {

        // vv WritableTestBase
        public static byte[] serialize(Writable writable) throws IOException {
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            DataOutputStream dataOut = new DataOutputStream(out);
            writable.write(dataOut);
            dataOut.close();
            return out.toByteArray();
        }
        // ^^ WritableTestBase

        // vv WritableTestBase-Deserialize
        public static byte[] deserialize(Writable writable, byte[] bytes) throws IOException {
            ByteArrayInputStream in = new ByteArrayInputStream(bytes);
            DataInputStream dataIn = new DataInputStream(in);
            writable.readFields(dataIn);
            dataIn.close();
            return bytes;
        }
        // ^^ WritableTestBase-Deserialize

        public static String serializeToString(Writable src) throws IOException {
            return StringUtils.byteToHexString(serialize(src));
        }

        public static String writeTo(Writable src, Writable dest) throws IOException {
            byte[] data = deserialize(dest, serialize(src));
            return StringUtils.byteToHexString(data);
        }

    }

    //=*=*=*=*
    //./ch04-avro/src/main/java/AvroGenericMaxTemperature.java
    // cc AvroGenericMaxTemperature MapReduce program to find the maximum temperature, creating Avro output

    import java.io.IOException;

    import org.apache.avro.Schema;
    import org.apache.avro.generic.GenericData;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.avro.mapred.AvroCollector;
    import org.apache.avro.mapred.AvroJob;
    import org.apache.avro.mapred.AvroMapper;
    import org.apache.avro.mapred.AvroReducer;
    import org.apache.avro.mapred.AvroUtf8InputFormat;
    import org.apache.avro.mapred.Pair;
    import org.apache.avro.util.Utf8;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    //vv AvroGenericMaxTemperature
    public class AvroGenericMaxTemperature extends Configured implements Tool {

        private static final Schema SCHEMA = new Schema.Parser().parse("{" + "  \"type\": \"record\","
                + "  \"name\": \"WeatherRecord\"," + "  \"doc\": \"A weather reading.\"," + "  \"fields\": ["
                + "    {\"name\": \"year\", \"type\": \"int\"}," + "    {\"name\": \"temperature\", \"type\": \"int\"},"
                + "    {\"name\": \"stationId\", \"type\": \"string\"}" + "  ]" + "}");

        public static class MaxTemperatureMapper extends AvroMapper<Utf8, Pair<Integer, GenericRecord>> {
            private NcdcRecordParser parser = new NcdcRecordParser();
            private GenericRecord record = new GenericData.Record(SCHEMA);

            @Override
            public void map(Utf8 line, AvroCollector<Pair<Integer, GenericRecord>> collector, Reporter reporter)
                    throws IOException {
                parser.parse(line.toString());
                if (parser.isValidTemperature()) {
                    record.put("year", parser.getYearInt());
                    record.put("temperature", parser.getAirTemperature());
                    record.put("stationId", parser.getStationId());
                    collector.collect(new Pair<Integer, GenericRecord>(parser.getYearInt(), record));
                }
            }
        }

        public static class MaxTemperatureReducer extends AvroReducer<Integer, GenericRecord, GenericRecord> {

            @Override
            public void reduce(Integer key, Iterable<GenericRecord> values, AvroCollector<GenericRecord> collector,
                    Reporter reporter) throws IOException {
                GenericRecord max = null;
                for (GenericRecord value : values) {
                    if (max == null || (Integer) value.get("temperature") > (Integer) max.get("temperature")) {
                        max = newWeatherRecord(value);
                    }
                }
                collector.collect(max);
            }

            private GenericRecord newWeatherRecord(GenericRecord value) {
                GenericRecord record = new GenericData.Record(SCHEMA);
                record.put("year", value.get("year"));
                record.put("temperature", value.get("temperature"));
                record.put("stationId", value.get("stationId"));
                return record;
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            JobConf conf = new JobConf(getConf(), getClass());
            conf.setJobName("Max temperature");

            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));

            AvroJob.setInputSchema(conf, Schema.create(Schema.Type.STRING));
            AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(Schema.create(Schema.Type.INT), SCHEMA));
            AvroJob.setOutputSchema(conf, SCHEMA);

            conf.setInputFormat(AvroUtf8InputFormat.class);

            AvroJob.setMapperClass(conf, MaxTemperatureMapper.class);
            AvroJob.setReducerClass(conf, MaxTemperatureReducer.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new AvroGenericMaxTemperature(), args);
            System.exit(exitCode);
        }
    }
    // ^^ AvroGenericMaxTemperature

    //=*=*=*=*
    //./ch04-avro/src/main/java/AvroProjection.java
    import java.io.File;

    import org.apache.avro.Schema;
    import org.apache.avro.mapred.AvroJob;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    public class AvroProjection extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {

            if (args.length != 3) {
                System.err.printf("Usage: %s [generic options] <input> <output> <schema-file>\n",
                        getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            String input = args[0];
            String output = args[1];
            String schemaFile = args[2];

            JobConf conf = new JobConf(getConf(), getClass());
            conf.setJobName("Avro projection");

            FileInputFormat.addInputPath(conf, new Path(input));
            FileOutputFormat.setOutputPath(conf, new Path(output));

            Schema schema = new Schema.Parser().parse(new File(schemaFile));
            AvroJob.setInputSchema(conf, schema);
            AvroJob.setMapOutputSchema(conf, schema);
            AvroJob.setOutputSchema(conf, schema);
            conf.setNumReduceTasks(0);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new AvroProjection(), args);
            System.exit(exitCode);
        }

    }

    //=*=*=*=*
    //./ch04-avro/src/main/java/AvroSort.java
    // cc AvroSort A MapReduce program to sort an Avro data file

    import java.io.File;
    import java.io.IOException;

    import org.apache.avro.Schema;
    import org.apache.avro.mapred.AvroCollector;
    import org.apache.avro.mapred.AvroJob;
    import org.apache.avro.mapred.AvroMapper;
    import org.apache.avro.mapred.AvroReducer;
    import org.apache.avro.mapred.Pair;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    //vv AvroSort
    public class AvroSort extends Configured implements Tool {

        static class SortMapper<K> extends AvroMapper<K, Pair<K, K>> {
            public void map(K datum, AvroCollector<Pair<K, K>> collector, Reporter reporter) throws IOException {
                collector.collect(new Pair<K, K>(datum, null, datum, null));
            }
        }

        static class SortReducer<K> extends AvroReducer<K, K, K> {
            public void reduce(K key, Iterable<K> values, AvroCollector<K> collector, Reporter reporter)
                    throws IOException {
                for (K value : values) {
                    collector.collect(value);
                }
            }
        }

        @Override
        public int run(String[] args) throws Exception {

            if (args.length != 3) {
                System.err.printf("Usage: %s [generic options] <input> <output> <schema-file>\n",
                        getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            String input = args[0];
            String output = args[1];
            String schemaFile = args[2];

            JobConf conf = new JobConf(getConf(), getClass());
            conf.setJobName("Avro sort");

            FileInputFormat.addInputPath(conf, new Path(input));
            FileOutputFormat.setOutputPath(conf, new Path(output));

            Schema schema = new Schema.Parser().parse(new File(schemaFile));
            AvroJob.setInputSchema(conf, schema);
            Schema intermediateSchema = Pair.getPairSchema(schema, schema);
            AvroJob.setMapOutputSchema(conf, intermediateSchema);
            AvroJob.setOutputSchema(conf, schema);

            AvroJob.setMapperClass(conf, SortMapper.class);
            AvroJob.setReducerClass(conf, SortReducer.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new AvroSort(), args);
            System.exit(exitCode);
        }
    }
    // ^^ AvroSort

    //=*=*=*=*
    //./ch04-avro/src/main/java/AvroSpecificMaxTemperature.java
    import java.io.IOException;

    import org.apache.avro.Schema;
    import org.apache.avro.mapred.AvroCollector;
    import org.apache.avro.mapred.AvroJob;
    import org.apache.avro.mapred.AvroMapper;
    import org.apache.avro.mapred.AvroReducer;
    import org.apache.avro.mapred.AvroUtf8InputFormat;
    import org.apache.avro.mapred.Pair;
    import org.apache.avro.util.Utf8;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    import specific.WeatherRecord;

    public class AvroSpecificMaxTemperature extends Configured implements Tool {

        public static class MaxTemperatureMapper extends AvroMapper<Utf8, Pair<Integer, WeatherRecord>> {
            private NcdcRecordParser parser = new NcdcRecordParser();
            private WeatherRecord record = new WeatherRecord();

            @Override
            public void map(Utf8 line, AvroCollector<Pair<Integer, WeatherRecord>> collector, Reporter reporter)
                    throws IOException {
                parser.parse(line.toString());
                if (parser.isValidTemperature()) {
                    record.year = parser.getYearInt();
                    record.temperature = parser.getAirTemperature();
                    record.stationId = new Utf8(parser.getStationId());
                    collector.collect(new Pair<Integer, WeatherRecord>(parser.getYearInt(), record));
                }
            }
        }

        public static class MaxTemperatureReducer extends AvroReducer<Integer, WeatherRecord, WeatherRecord> {

            @Override
            public void reduce(Integer key, Iterable<WeatherRecord> values, AvroCollector<WeatherRecord> collector,
                    Reporter reporter) throws IOException {
                WeatherRecord max = null;
                for (WeatherRecord value : values) {
                    if (max == null || value.temperature > max.temperature) {
                        max = newWeatherRecord(value);
                    }
                }
                collector.collect(max);
            }
        }

        public static class MaxTemperatureCombiner
                extends AvroReducer<Integer, WeatherRecord, Pair<Integer, WeatherRecord>> {

            @Override
            public void reduce(Integer key, Iterable<WeatherRecord> values,
                    AvroCollector<Pair<Integer, WeatherRecord>> collector, Reporter reporter) throws IOException {
                WeatherRecord max = null;
                for (WeatherRecord value : values) {
                    if (max == null || value.temperature > max.temperature) {
                        max = newWeatherRecord(value);
                    }
                }
                collector.collect(new Pair<Integer, WeatherRecord>(key, max));
            }
        }

        private static WeatherRecord newWeatherRecord(WeatherRecord value) {
            WeatherRecord record = new WeatherRecord();
            record.year = value.year;
            record.temperature = value.temperature;
            record.stationId = value.stationId;
            return record;
        }

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            JobConf conf = new JobConf(getConf(), getClass());
            conf.setJobName("Max temperature");

            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));

            AvroJob.setInputSchema(conf, Schema.create(Schema.Type.STRING));
            AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(Schema.create(Schema.Type.INT), WeatherRecord.SCHEMA$));
            AvroJob.setOutputSchema(conf, WeatherRecord.SCHEMA$);

            conf.setInputFormat(AvroUtf8InputFormat.class);

            AvroJob.setMapperClass(conf, MaxTemperatureMapper.class);
            AvroJob.setCombinerClass(conf, MaxTemperatureCombiner.class);
            AvroJob.setReducerClass(conf, MaxTemperatureReducer.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new AvroSpecificMaxTemperature(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch04-avro/src/main/java/NcdcRecordParser.java
    import java.text.*;
    import java.util.Date;

    import org.apache.hadoop.io.Text;

    public class NcdcRecordParser {

        private static final int MISSING_TEMPERATURE = 9999;

        private static final DateFormat DATE_FORMAT = new SimpleDateFormat("yyyyMMddHHmm");

        private String stationId;
        private String observationDateString;
        private String year;
        private String airTemperatureString;
        private int airTemperature;
        private boolean airTemperatureMalformed;
        private String quality;

        public void parse(String record) {
            stationId = record.substring(4, 10) + "-" + record.substring(10, 15);
            observationDateString = record.substring(15, 27);
            year = record.substring(15, 19);
            airTemperatureMalformed = false;
            // Remove leading plus sign as parseInt doesn't like them
            if (record.charAt(87) == '+') {
                airTemperatureString = record.substring(88, 92);
                airTemperature = Integer.parseInt(airTemperatureString);
            } else if (record.charAt(87) == '-') {
                airTemperatureString = record.substring(87, 92);
                airTemperature = Integer.parseInt(airTemperatureString);
            } else {
                airTemperatureMalformed = true;
            }
            airTemperature = Integer.parseInt(airTemperatureString);
            quality = record.substring(92, 93);
        }

        public void parse(Text record) {
            parse(record.toString());
        }

        public boolean isValidTemperature() {
            return !airTemperatureMalformed && airTemperature != MISSING_TEMPERATURE && quality.matches("[01459]");
        }

        public boolean isMalformedTemperature() {
            return airTemperatureMalformed;
        }

        public boolean isMissingTemperature() {
            return airTemperature == MISSING_TEMPERATURE;
        }

        public String getStationId() {
            return stationId;
        }

        public Date getObservationDate() {
            try {
                System.out.println(observationDateString);
                return DATE_FORMAT.parse(observationDateString);
            } catch (ParseException e) {
                throw new IllegalArgumentException(e);
            }
        }

        public String getYear() {
            return year;
        }

        public int getYearInt() {
            return Integer.parseInt(year);
        }

        public int getAirTemperature() {
            return airTemperature;
        }

        public String getAirTemperatureString() {
            return airTemperatureString;
        }

        public String getQuality() {
            return quality;
        }

    }

    //=*=*=*=*
    //./ch04-avro/src/test/java/AvroTest.java
    // == AvroParseSchema
    // == AvroGenericRecordCreation
    // == AvroGenericRecordSerialization
    // == AvroGenericRecordDeserialization
    // == AvroSpecificStringPair
    // == AvroDataFileCreation
    // == AvroDataFileGetSchema
    // == AvroDataFileRead
    // == AvroDataFileIterator
    // == AvroDataFileShortIterator
    // == AvroSchemaResolution
    // == AvroSchemaResolutionWithDataFile
    import static org.hamcrest.CoreMatchers.*;
    import static org.junit.Assert.assertThat;
    import static org.junit.Assert.fail;

    import java.io.ByteArrayOutputStream;
    import java.io.EOFException;
    import java.io.File;
    import java.io.IOException;

    import org.apache.avro.AvroTypeException;
    import org.apache.avro.Schema;
    import org.apache.avro.file.DataFileReader;
    import org.apache.avro.file.DataFileWriter;
    import org.apache.avro.generic.GenericData;
    import org.apache.avro.generic.GenericDatumReader;
    import org.apache.avro.generic.GenericDatumWriter;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.avro.io.DatumReader;
    import org.apache.avro.io.DatumWriter;
    import org.apache.avro.io.Decoder;
    import org.apache.avro.io.DecoderFactory;
    import org.apache.avro.io.Encoder;
    import org.apache.avro.io.EncoderFactory;
    import org.apache.avro.specific.SpecificDatumReader;
    import org.apache.avro.specific.SpecificDatumWriter;
    import org.apache.avro.util.Utf8;
    import org.junit.Ignore;
    import org.junit.Test;

    public class AvroTest {

        @Test
        public void testInt() throws IOException {
            Schema schema = new Schema.Parser().parse("\"int\"");

            int datum = 163;

            ByteArrayOutputStream out = new ByteArrayOutputStream();
            DatumWriter<Integer> writer = new GenericDatumWriter<Integer>(schema);
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null /* reuse */);
            writer.write(datum, encoder); // boxed
            encoder.flush();
            out.close();

            DatumReader<Integer> reader = new GenericDatumReader<Integer>(schema); // have to tell it the schema - it's not in the data stream!
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null /* reuse */);
            Integer result = reader.read(null /* reuse */, decoder);
            assertThat(result, is(163));

            try {
                reader.read(null, decoder);
                fail("Expected EOFException");
            } catch (EOFException e) {
                // expected
            }
        }

        @Test
        @Ignore("Requires Avro 1.6.0 or later")
        public void testGenericString() throws IOException {
            Schema schema = new Schema.Parser().parse("{\"type\": \"string\", \"avro.java.string\": \"String\"}");

            String datum = "foo";

            ByteArrayOutputStream out = new ByteArrayOutputStream();
            DatumWriter<String> writer = new GenericDatumWriter<String>(schema);
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null /* reuse */);
            writer.write(datum, encoder); // boxed
            encoder.flush();
            out.close();

            DatumReader<String> reader = new GenericDatumReader<String>(schema);
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null /* reuse */);
            String result = reader.read(null /* reuse */, decoder);
            assertThat(result, equalTo("foo"));

            try {
                reader.read(null, decoder);
                fail("Expected EOFException");
            } catch (EOFException e) {
                // expected
            }
        }

        @Test
        public void testPairGeneric() throws IOException {
            // vv AvroParseSchema
            Schema.Parser parser = new Schema.Parser();
            Schema schema = parser.parse(getClass().getResourceAsStream("StringPair.avsc"));
            // ^^ AvroParseSchema

            // vv AvroGenericRecordCreation
            GenericRecord datum = new GenericData.Record(schema);
            datum.put("left", "L");
            datum.put("right", "R");
            // ^^ AvroGenericRecordCreation

            // vv AvroGenericRecordSerialization
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
            writer.write(datum, encoder);
            encoder.flush();
            out.close();
            // ^^ AvroGenericRecordSerialization

            // vv AvroGenericRecordDeserialization
            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
            GenericRecord result = reader.read(null, decoder);
            assertThat(result.get("left").toString(), is("L"));
            assertThat(result.get("right").toString(), is("R"));
            // ^^ AvroGenericRecordDeserialization
        }

        @Test
        public void testPairSpecific() throws IOException {

            // vv AvroSpecificStringPair
            /*[*/StringPair datum = new StringPair();
            datum.left = "L";
            datum.right = "R";/*]*/

            ByteArrayOutputStream out = new ByteArrayOutputStream();
            /*[*/DatumWriter<StringPair> writer = new SpecificDatumWriter<StringPair>(StringPair.class);/*]*/
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
            writer.write(datum, encoder);
            encoder.flush();
            out.close();

            /*[*/DatumReader<StringPair> reader = new SpecificDatumReader<StringPair>(StringPair.class);/*]*/
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
            StringPair result = reader.read(null, decoder);
            assertThat(result./*[*/left/*]*/.toString(), is("L"));
            assertThat(result./*[*/right/*]*/.toString(), is("R"));
            // ^^ AvroSpecificStringPair
        }

        @Test
        public void testDataFile() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));

            GenericRecord datum = new GenericData.Record(schema);
            datum.put("left", "L");
            datum.put("right", "R");

            // vv AvroDataFileCreation
            File file = new File("data.avro");
            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(writer);
            dataFileWriter.create(schema, file);
            dataFileWriter.append(datum);
            dataFileWriter.close();
            // ^^ AvroDataFileCreation

            // vv AvroDataFileGetSchema
            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
            DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, reader);
            assertThat("Schema is the same", schema, is(dataFileReader.getSchema()));
            // ^^ AvroDataFileGetSchema

            // vv AvroDataFileRead
            assertThat(dataFileReader.hasNext(), is(true));
            GenericRecord result = dataFileReader.next();
            assertThat(result.get("left").toString(), is("L"));
            assertThat(result.get("right").toString(), is("R"));
            assertThat(dataFileReader.hasNext(), is(false));
            // ^^ AvroDataFileRead

            file.delete();
        }

        @Test
        public void testDataFileIteration() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));

            GenericRecord datum = new GenericData.Record(schema);
            datum.put("left", "L");
            datum.put("right", "R");

            File file = new File("data.avro");
            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(writer);
            dataFileWriter.create(schema, file);
            dataFileWriter.append(datum);
            datum.put("right", new Utf8("r"));
            dataFileWriter.append(datum);
            dataFileWriter.close();

            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
            DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, reader);
            assertThat("Schema is the same", schema, is(dataFileReader.getSchema()));

            int count = 0;
            // vv AvroDataFileIterator
            GenericRecord record = null;
            while (dataFileReader.hasNext()) {
                record = dataFileReader.next(record);
                // process record
                // ^^ AvroDataFileIterator
                count++;
                assertThat(record.get("left").toString(), is("L"));
                if (count == 1) {
                    assertThat(record.get("right").toString(), is("R"));
                } else {
                    assertThat(record.get("right").toString(), is("r"));
                }
                // vv AvroDataFileIterator      
            }
            // ^^ AvroDataFileIterator    

            assertThat(count, is(2));
            file.delete();
        }

        @Test
        public void testDataFileIterationShort() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));

            GenericRecord datum = new GenericData.Record(schema);
            datum.put("left", "L");
            datum.put("right", "R");

            File file = new File("data.avro");
            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(writer);
            dataFileWriter.create(schema, file);
            dataFileWriter.append(datum);
            datum.put("right", new Utf8("r"));
            dataFileWriter.append(datum);
            dataFileWriter.close();

            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>();
            DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, reader);
            assertThat("Schema is the same", schema, is(dataFileReader.getSchema()));

            int count = 0;
            // vv AvroDataFileShortIterator
            for (GenericRecord record : dataFileReader) {
                // process record
                // ^^ AvroDataFileShortIterator
                count++;
                assertThat(record.get("left").toString(), is("L"));
                if (count == 1) {
                    assertThat(record.get("right").toString(), is("R"));
                } else {
                    assertThat(record.get("right").toString(), is("r"));
                }
                // vv AvroDataFileShortIterator      
            }
            // ^^ AvroDataFileShortIterator    

            assertThat(count, is(2));
            file.delete();
        }

        @Test
        public void testSchemaResolution() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));
            Schema newSchema = new Schema.Parser().parse(getClass().getResourceAsStream("NewStringPair.avsc"));

            ByteArrayOutputStream out = new ByteArrayOutputStream();

            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null /* reuse */);
            GenericRecord datum = new GenericData.Record(schema); // no description
            datum.put("left", "L");
            datum.put("right", "R");
            writer.write(datum, encoder);
            encoder.flush();

            // vv AvroSchemaResolution
            DatumReader<GenericRecord> reader = /*[*/new GenericDatumReader<GenericRecord>(schema, newSchema);/*]*/
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
            GenericRecord result = reader.read(null, decoder);
            assertThat(result.get("left").toString(), is("L"));
            assertThat(result.get("right").toString(), is("R"));
            /*[*/assertThat(result.get("description").toString(), is(""));/*]*/
            // ^^ AvroSchemaResolution
        }

        @Test
        public void testSchemaResolutionWithAliases() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));
            Schema newSchema = new Schema.Parser().parse(getClass().getResourceAsStream("AliasedStringPair.avsc"));

            ByteArrayOutputStream out = new ByteArrayOutputStream();

            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null /* reuse */);
            GenericRecord datum = new GenericData.Record(schema);
            datum.put("left", "L");
            datum.put("right", "R");
            writer.write(datum, encoder);
            encoder.flush();

            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema, newSchema);
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
            GenericRecord result = reader.read(null, decoder);
            assertThat(result.get("first").toString(), is("L"));
            assertThat(result.get("second").toString(), is("R"));

            // old field names don't work
            assertThat(result.get("left"), is((Object) null));
            assertThat(result.get("right"), is((Object) null));
        }

        @Test
        public void testSchemaResolutionWithNull() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));
            Schema newSchema = new Schema.Parser().parse(getClass().getResourceAsStream("NewStringPairWithNull.avsc"));

            ByteArrayOutputStream out = new ByteArrayOutputStream();

            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null /* reuse */);
            GenericRecord datum = new GenericData.Record(schema); // no description
            datum.put("left", "L");
            datum.put("right", "R");
            writer.write(datum, encoder);
            encoder.flush();

            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema, newSchema); // write schema, read schema
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
            GenericRecord result = reader.read(null, decoder);
            assertThat(result.get("left").toString(), is("L"));
            assertThat(result.get("right").toString(), is("R"));
            assertThat(result.get("description"), is((Object) null));
        }

        @Test
        public void testIncompatibleSchemaResolution() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));
            Schema newSchema = new Schema.Parser().parse("{\"type\": \"array\", \"items\": \"string\"}");

            ByteArrayOutputStream out = new ByteArrayOutputStream();

            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            Encoder encoder = EncoderFactory.get().binaryEncoder(out, null /* reuse */);
            GenericRecord datum = new GenericData.Record(schema); // no description
            datum.put("left", "L");
            datum.put("right", "R");
            writer.write(datum, encoder);
            encoder.flush();

            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema, newSchema); // write schema, read schema
            Decoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
            try {
                reader.read(null, decoder);
                fail("Expected AvroTypeException");
            } catch (AvroTypeException e) {
                // expected
            }

        }

        @Test
        public void testSchemaResolutionWithDataFile() throws IOException {
            Schema schema = new Schema.Parser().parse(getClass().getResourceAsStream("StringPair.avsc"));
            Schema newSchema = new Schema.Parser().parse(getClass().getResourceAsStream("NewStringPair.avsc"));

            File file = new File("data.avro");

            DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
            DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(writer);
            dataFileWriter.create(schema, file);
            GenericRecord datum = new GenericData.Record(schema);
            datum.put("left", "L");
            datum.put("right", "R");
            dataFileWriter.append(datum);
            dataFileWriter.close();

            // vv AvroSchemaResolutionWithDataFile
            DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(/*[*/null/*]*/, newSchema);
            // ^^ AvroSchemaResolutionWithDataFile
            DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(file, reader);
            assertThat(schema, is(dataFileReader.getSchema())); // schema is the actual (write) schema

            assertThat(dataFileReader.hasNext(), is(true));
            GenericRecord result = dataFileReader.next();
            assertThat(result.get("left").toString(), is("L"));
            assertThat(result.get("right").toString(), is("R"));
            assertThat(result.get("description").toString(), is(""));
            assertThat(dataFileReader.hasNext(), is(false));

            file.delete();
        }

    }

    //=*=*=*=*
    //./ch05/src/main/examples/ConfigurationPrinterSystem.java.input.txt
    hadoop-Dcolor=

    yellow ConfigurationPrinter|
    grep color
    //=*=*=*=*
    //./ch05/src/main/examples/ConfigurationPrinterWithConf.java.input.txt
    hadoop ConfigurationPrinter-
    conf conf/hadoop-localhost.xml\|
    grep mapred.job.tracker=
    //=*=*=*=*
    //./ch05/src/main/examples/ConfigurationPrinterWithConf.java.output.txt
    mapred.job.tracker=localhost:8021

    //=*=*=*=*
    //./ch05/src/main/examples/ConfigurationPrinterWithConfAndD.java.input.txt
    hadoop ConfigurationPrinter-
    conf conf/hadoop-localhost.xml\-
    D mapred.job.tracker=example.com:8021\|
    grep mapred.
    job.tracker
    //=*=*=*=*
    //./ch05/src/main/examples/ConfigurationPrinterWithD.java.input.txt
    hadoop ConfigurationPrinter-
    D color = yellow |
            grep color
    //=*=*=*=*
    //./ch05/src/main/examples/ConfigurationPrinterWithD.java.output.txt
    color=
    yellow

    //=*=*=*=*
    //./ch05/src/main/examples/MaxTemperatureDriver.java.input.txt
    hadoop
    jar hadoop-
    examples.jar v3.MaxTemperatureDriver-
    conf conf/hadoop-cluster.xml\input/ncdc/
    all max-temp
    //=*=*=*=*
    //./ch05/src/main/java/ConfigurationPrinter.java
    // cc ConfigurationPrinter An example Tool implementation for printing the properties in a Configuration
    import java.util.Map.Entry;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.util.*;

    // vv ConfigurationPrinter
    public class ConfigurationPrinter extends Configured implements Tool {

        static {
            Configuration.addDefaultResource("hdfs-default.xml");
            Configuration.addDefaultResource("hdfs-site.xml");
            Configuration.addDefaultResource("mapred-default.xml");
            Configuration.addDefaultResource("mapred-site.xml");
        }

        @Override
        public int run(String[] args) throws Exception {
            Configuration conf = getConf();
            for (Entry<String, String> entry : conf) {
                System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
            }
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new ConfigurationPrinter(), args);
            System.exit(exitCode);
        }
    }
    // ^^ ConfigurationPrinter

    //=*=*=*=*
    //./ch05/src/main/java/LoggingDriver.java
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    public class LoggingDriver extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            Job job = new Job(getConf(), "Logging job");
            job.setJarByClass(getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(LoggingIdentityMapper.class);
            job.setNumReduceTasks(0);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new LoggingDriver(), args);
            System.exit(exitCode);
        }
    }
    //=*=*=*=*
    //./ch05/src/main/java/LoggingIdentityMapper.java
    //cc LoggingIdentityMapper An identity mapper that writes to standard output and also uses the Apache Commons Logging API
    import java.io.IOException;

    //vv LoggingIdentityMapper
    import org.apache.commons.logging.Log;
    import org.apache.commons.logging.LogFactory;
    import org.apache.hadoop.mapreduce.Mapper;

    public class LoggingIdentityMapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
            extends Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {

        private static final Log LOG = LogFactory.getLog(LoggingIdentityMapper.class);

        @Override
        public void map(KEYIN key, VALUEIN value, Context context) throws IOException, InterruptedException {
            // Log to stdout file
            System.out.println("Map key: " + key);

            // Log to syslog file
            LOG.info("Map key: " + key);
            if (LOG.isDebugEnabled()) {
                LOG.debug("Map value: " + value);
            }
            context.write((KEYOUT) key, (VALUEOUT) value);
        }
    }
    //^^ LoggingIdentityMapper
    //=*=*=*=*
    //./ch05/src/main/java/v1/MaxTemperatureMapper.java
    package v1;
    // cc MaxTemperatureMapperV1 First version of a Mapper that passes MaxTemperatureMapperTest
    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.*;

    //vv MaxTemperatureMapperV1
    public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String line = value.toString();
            String year = line.substring(15, 19);
            int airTemperature = Integer.parseInt(line.substring(87, 92));
            context.write(new Text(year), new IntWritable(airTemperature));
        }
    }
    //^^ MaxTemperatureMapperV1

    //=*=*=*=*
    //./ch05/src/main/java/v1/MaxTemperatureReducer.java
    package v1;
    //cc MaxTemperatureReducerV1 Reducer for maximum temperature example
    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;

    // vv MaxTemperatureReducerV1
    public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

        @Override
        public void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {

            int maxValue = Integer.MIN_VALUE;
            for (IntWritable value : values) {
                maxValue = Math.max(maxValue, value.get());
            }
            context.write(key, new IntWritable(maxValue));
        }
    }
    // ^^ MaxTemperatureReducerV1

    //=*=*=*=*
    //./ch05/src/main/java/v2/MaxTemperatureDriver.java
    package v2;

    // cc MaxTemperatureDriverV2 Application to find the maximum temperature
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    import v1.MaxTemperatureReducer;

    // vv MaxTemperatureDriverV2
    public class MaxTemperatureDriver extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            Job job = new Job(getConf(), "Max temperature");
            job.setJarByClass(getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
            System.exit(exitCode);
        }
    }
    // ^^ MaxTemperatureDriverV2

    //=*=*=*=*
    //./ch05/src/main/java/v2/MaxTemperatureMapper.java
    package v2;
    //== MaxTemperatureMapperV2

    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

    public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        //vv MaxTemperatureMapperV2
        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String line = value.toString();
            String year = line.substring(15, 19);
            /*[*/String temp = line.substring(87, 92);
            if (!missing(temp)) {/*]*/
                int airTemperature = Integer.parseInt(temp);
                context.write(new Text(year), new IntWritable(airTemperature));
                /*[*/} /*]*/
        }

        /*[*/private boolean missing(String temp) {
            return temp.equals("+9999");
        }/*]*/
        //^^ MaxTemperatureMapperV2
    }

    //=*=*=*=*
    //./ch05/src/main/java/v3/MaxTemperatureDriver.java
    package v3;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    import v1.MaxTemperatureReducer;

    // Identical to v2 except for v3 mapper
    public class MaxTemperatureDriver extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            Job job = new Job(getConf(), "Max temperature");
            job.setJarByClass(getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch05/src/main/java/v3/MaxTemperatureMapper.java
    package v3;
    // cc MaxTemperatureMapperV3 A Mapper that uses a utility class to parse records

    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

    // vv MaxTemperatureMapperV3
    public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        /*[*/private NcdcRecordParser parser = new NcdcRecordParser();/*]*/

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            /*[*/parser.parse(value);/*]*/
            if (/*[*/parser.isValidTemperature()/*]*/) {
                context.write(new Text(/*[*/parser.getYear()/*]*/),
                        new IntWritable(/*[*/parser.getAirTemperature()/*]*/));
            }
        }
    }
    // ^^ MaxTemperatureMapperV3

    //=*=*=*=*
    //./ch05/src/main/java/v3/NcdcRecordParser.java
    // cc NcdcRecordParserV3 A class for parsing weather records in NCDC format
    package v3;
    import org.apache.hadoop.io.Text;

    // vv NcdcRecordParserV3
    public class NcdcRecordParser {

        private static final int MISSING_TEMPERATURE = 9999;

        private String year;
        private int airTemperature;
        private String quality;

        public void parse(String record) {
            year = record.substring(15, 19);
            String airTemperatureString;
            // Remove leading plus sign as parseInt doesn't like them
            if (record.charAt(87) == '+') {
                airTemperatureString = record.substring(88, 92);
            } else {
                airTemperatureString = record.substring(87, 92);
            }
            airTemperature = Integer.parseInt(airTemperatureString);
            quality = record.substring(92, 93);
        }

        public void parse(Text record) {
            parse(record.toString());
        }

        public boolean isValidTemperature() {
            return airTemperature != MISSING_TEMPERATURE && quality.matches("[01459]");
        }

        public String getYear() {
            return year;
        }

        public int getAirTemperature() {
            return airTemperature;
        }
    }
    // ^^ NcdcRecordParserV3

    //=*=*=*=*
    //./ch05/src/main/java/v4/MaxTemperatureDriver.java
    package v4;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    import v1.MaxTemperatureReducer;

    //Identical to v3 except for v4 mapper
    public class MaxTemperatureDriver extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            Job job = new Job(getConf(), "Max temperature");
            job.setJarByClass(getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch05/src/main/java/v4/MaxTemperatureMapper.java
    // == MaxTemperatureMapperV4
    package v4;

    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

    import v3.NcdcRecordParser;

    //vv MaxTemperatureMapperV4
    public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        /*[*/enum Temperature {
            OVER_100
        }/*]*/

        private NcdcRecordParser parser = new NcdcRecordParser();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            parser.parse(value);
            if (parser.isValidTemperature()) {
                int airTemperature = parser.getAirTemperature();
                /*[*/if (airTemperature > 1000) {
                    System.err.println("Temperature over 100 degrees for input: " + value);
                    context.setStatus("Detected possibly corrupt record: see logs.");
                    context.getCounter(Temperature.OVER_100).increment(1);
                } /*]*/
                context.write(new Text(parser.getYear()), new IntWritable(airTemperature));
            }
        }
    }
    //^^ MaxTemperatureMapperV4

    //=*=*=*=*
    //./ch05/src/main/java/v5/MaxTemperatureDriver.java
    package v5;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    import v1.MaxTemperatureReducer;

    //Identical to v4 except for v5 mapper
    public class MaxTemperatureDriver extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            Job job = new Job(getConf(), "Max temperature");
            job.setJarByClass(getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch05/src/main/java/v5/MaxTemperatureMapper.java
    package v5;
    // cc MaxTemperatureMapperV5 Mapper for maximum temperature example
    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

    // vv MaxTemperatureMapperV5
    public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        enum Temperature {
            MALFORMED
        }

        private NcdcRecordParser parser = new NcdcRecordParser();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            parser.parse(value);
            if (parser.isValidTemperature()) {
                int airTemperature = parser.getAirTemperature();
                context.write(new Text(parser.getYear()), new IntWritable(airTemperature));
            } else if (parser.isMalformedTemperature()) {
                System.err.println("Ignoring possibly corrupt input: " + value);
                context.getCounter(Temperature.MALFORMED).increment(1);
            }
        }
    }
    // ^^ MaxTemperatureMapperV5

    //=*=*=*=*
    //./ch05/src/main/java/v5/NcdcRecordParser.java
    package v5;
    import org.apache.hadoop.io.Text;

    public class NcdcRecordParser {

        private static final int MISSING_TEMPERATURE = 9999;

        private String year;
        private int airTemperature;
        private boolean airTemperatureMalformed;
        private String quality;

        public void parse(String record) {
            year = record.substring(15, 19);
            airTemperatureMalformed = false;
            // Remove leading plus sign as parseInt doesn't like them
            if (record.charAt(87) == '+') {
                airTemperature = Integer.parseInt(record.substring(88, 92));
            } else if (record.charAt(87) == '-') {
                airTemperature = Integer.parseInt(record.substring(87, 92));
            } else {
                airTemperatureMalformed = true;
            }
            quality = record.substring(92, 93);
        }

        public void parse(Text record) {
            parse(record.toString());
        }

        public boolean isValidTemperature() {
            return !airTemperatureMalformed && airTemperature != MISSING_TEMPERATURE && quality.matches("[01459]");
        }

        public boolean isMalformedTemperature() {
            return airTemperatureMalformed;
        }

        public String getYear() {
            return year;
        }

        public int getAirTemperature() {
            return airTemperature;
        }
    }

    //=*=*=*=*
    //./ch05/src/main/java/v6/MaxTemperatureDriver.java
    package v6;
    //== MaxTemperatureDriverV6

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    import v1.MaxTemperatureReducer;
    import v5.MaxTemperatureMapper;

    //Identical to v5 except for profiling configuration
    public class MaxTemperatureDriver extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            //vv MaxTemperatureDriverV6
            Configuration conf = getConf();
            conf.setBoolean("mapred.task.profile", true);
            conf.set("mapred.task.profile.params",
                    "-agentlib:hprof=cpu=samples," + "heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s");
            conf.set("mapred.task.profile.maps", "0-2");
            conf.set("mapred.task.profile.reduces", ""); // no reduces
            Job job = new Job(conf, "Max temperature");
            //^^ MaxTemperatureDriverV6

            // Following alternative is only available in 0.21 onwards
            //    conf.setBoolean(JobContext.TASK_PROFILE, true);
            //    conf.set(JobContext.TASK_PROFILE_PARAMS, "-agentlib:hprof=cpu=samples," +
            //        "heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s");
            //    conf.set(JobContext.NUM_MAP_PROFILES, "0-2");
            //    conf.set(JobContext.NUM_REDUCE_PROFILES, "");

            job.setJarByClass(getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch05/src/main/java/v7/MaxTemperatureDriver.java
    package v7;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    import v1.MaxTemperatureReducer;

    //Identical to v5 except for v7 mapper
    public class MaxTemperatureDriver extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            Job job = new Job(getConf(), "Max temperature");
            job.setJarByClass(getClass());

            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));

            job.setMapperClass(MaxTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureDriver(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch05/src/main/java/v7/MaxTemperatureMapper.java
    package v7;
    import java.io.IOException;

    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

    import v5.NcdcRecordParser;

    public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

        enum Temperature {
            MALFORMED
        }

        private NcdcRecordParser parser = new NcdcRecordParser();
        /*[*/private Text year = new Text();
        private IntWritable temp = new IntWritable();/*]*/

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            parser.parse(value);
            if (parser.isValidTemperature()) {
                /*[*/year.set(parser.getYear());
                temp.set(parser.getAirTemperature());
                context.write(year, temp);/*]*/
            } else if (parser.isMalformedTemperature()) {
                System.err.println("Ignoring possibly corrupt input: " + value);
                context.getCounter(Temperature.MALFORMED).increment(1);
            }
        }
    }

    //=*=*=*=*
    //./ch05/src/test/java/MultipleResourceConfigurationTest.java
    // == MultipleResourceConfigurationTest
    // == MultipleResourceConfigurationTest-Override
    // == MultipleResourceConfigurationTest-Final
    // == MultipleResourceConfigurationTest-Expansion
    // == MultipleResourceConfigurationTest-SystemExpansion
    // == MultipleResourceConfigurationTest-NoSystemByDefault
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import org.apache.hadoop.conf.Configuration;
    import org.junit.Test;

    public class MultipleResourceConfigurationTest {

        @Test
        public void get() throws IOException {
            // Single test as an expedient for inclusion in the book

            // vv MultipleResourceConfigurationTest
            Configuration conf = new Configuration();
            conf.addResource("configuration-1.xml");
            conf.addResource("configuration-2.xml");
            // ^^ MultipleResourceConfigurationTest

            assertThat(conf.get("color"), is("yellow"));

            // override
            // vv MultipleResourceConfigurationTest-Override
            assertThat(conf.getInt("size", 0), is(12));
            // ^^ MultipleResourceConfigurationTest-Override

            // final properties cannot be overridden
            // vv MultipleResourceConfigurationTest-Final
            assertThat(conf.get("weight"), is("heavy"));
            // ^^ MultipleResourceConfigurationTest-Final

            // variable expansion
            // vv MultipleResourceConfigurationTest-Expansion
            assertThat(conf.get("size-weight"), is("12,heavy"));
            // ^^ MultipleResourceConfigurationTest-Expansion

            // variable expansion with system properties
            // vv MultipleResourceConfigurationTest-SystemExpansion
            System.setProperty("size", "14");
            assertThat(conf.get("size-weight"), is("14,heavy"));
            // ^^ MultipleResourceConfigurationTest-SystemExpansion

            // system properties are not picked up
            // vv MultipleResourceConfigurationTest-NoSystemByDefault
            System.setProperty("length", "2");
            assertThat(conf.get("length"), is((String) null));
            // ^^ MultipleResourceConfigurationTest-NoSystemByDefault

        }

    }

    //=*=*=*=*
    //./ch05/src/test/java/SingleResourceConfigurationTest.java
    // == SingleResourceConfigurationTest
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configuration;
    import org.junit.Test;

    public class SingleResourceConfigurationTest {

        @Test
        public void get() throws IOException {
            // vv SingleResourceConfigurationTest
            Configuration conf = new Configuration();
            conf.addResource("configuration-1.xml");
            assertThat(conf.get("color"), is("yellow"));
            assertThat(conf.getInt("size", 0), is(10));
            assertThat(conf.get("breadth", "wide"), is("wide"));
            // ^^ SingleResourceConfigurationTest
        }

    }

    //=*=*=*=*
    //./ch05/src/test/java/v1/MaxTemperatureMapperTest.java
    package v1;
    // cc MaxTemperatureMapperTestV1 Unit test for MaxTemperatureMapper
    // == MaxTemperatureMapperTestV1Missing
    // vv MaxTemperatureMapperTestV1
    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mrunit.mapreduce.MapDriver;
    import org.junit.*;

    public class MaxTemperatureMapperTest {

        @Test
        public void processesValidRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9-00111+99999999999");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).withOutput(new Text("1950"), new IntWritable(-11)).runTest();
        }

        // ^^ MaxTemperatureMapperTestV1
        @Ignore // since we are showing a failing test in the book
        // vv MaxTemperatureMapperTestV1Missing
        @Test
        public void ignoresMissingTemperatureRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9+99991+99999999999");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).runTest();
        }

        // ^^ MaxTemperatureMapperTestV1Missing
        @Test
        public void processesMalformedTemperatureRecord() throws IOException, InterruptedException {
            Text value = new Text("0335999999433181957042302005+37950+139117SAO  +0004" +
            // Year ^^^^
                    "RJSN V02011359003150070356999999433201957010100005+353");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).withOutput(new Text("1957"), new IntWritable(1957)).runTest();
        }
        // vv MaxTemperatureMapperTestV1
    }
    // ^^ MaxTemperatureMapperTestV1

    //=*=*=*=*
    //./ch05/src/test/java/v1/MaxTemperatureReducerTest.java
    package v1;
    // == MaxTemperatureReducerTestV1
    import java.io.IOException;
    import java.util.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;
    import org.junit.*;

    public class MaxTemperatureReducerTest {

        //vv MaxTemperatureReducerTestV1
        @Test
        public void returnsMaximumIntegerInValues() throws IOException, InterruptedException {
            new ReduceDriver<Text, IntWritable, Text, IntWritable>().withReducer(new MaxTemperatureReducer())
                    .withInputKey(new Text("1950"))
                    .withInputValues(Arrays.asList(new IntWritable(10), new IntWritable(5)))
                    .withOutput(new Text("1950"), new IntWritable(10)).runTest();
        }
        //^^ MaxTemperatureReducerTestV1
    }

    //=*=*=*=*
    //./ch05/src/test/java/v2/MaxTemperatureMapperTest.java
    package v2;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mrunit.mapreduce.MapDriver;
    import org.junit.Test;

    public class MaxTemperatureMapperTest {

        @Test
        public void processesValidRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9-00111+99999999999");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).withOutput(new Text("1950"), new IntWritable(-11)).runTest();
        }

        @Test
        public void ignoresMissingTemperatureRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9+99991+99999999999");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).runTest();
        }
    }

    //=*=*=*=*
    //./ch05/src/test/java/v3/MaxTemperatureDriverMiniTest.java
    package v3;

    import static org.hamcrest.Matchers.is;
    import static org.hamcrest.Matchers.nullValue;
    import static org.junit.Assert.assertThat;

    import java.io.BufferedReader;
    import java.io.InputStream;
    import java.io.InputStreamReader;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileUtil;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.PathFilter;
    import org.apache.hadoop.mapred.ClusterMapReduceTestCase;

    // A test for MaxTemperatureDriver that runs in a "mini" HDFS and MapReduce cluster
    public class MaxTemperatureDriverMiniTest extends ClusterMapReduceTestCase {

        public static class OutputLogFilter implements PathFilter {
            public boolean accept(Path path) {
                return !path.getName().startsWith("_");
            }
        }

        @Override
        protected void setUp() throws Exception {
            if (System.getProperty("test.build.data") == null) {
                System.setProperty("test.build.data", "/tmp");
            }
            if (System.getProperty("hadoop.log.dir") == null) {
                System.setProperty("hadoop.log.dir", "/tmp");
            }
            super.setUp();
        }

        // Not marked with @Test since ClusterMapReduceTestCase is a JUnit 3 test case
        public void test() throws Exception {
            Configuration conf = createJobConf();

            Path localInput = new Path("input/ncdc/micro");
            Path input = getInputDir();
            Path output = getOutputDir();

            // Copy input data into test HDFS
            getFileSystem().copyFromLocalFile(localInput, input);

            MaxTemperatureDriver driver = new MaxTemperatureDriver();
            driver.setConf(conf);

            int exitCode = driver.run(new String[] { input.toString(), output.toString() });
            assertThat(exitCode, is(0));

            // Check the output is as expected
            Path[] outputFiles = FileUtil.stat2Paths(getFileSystem().listStatus(output, new OutputLogFilter()));
            assertThat(outputFiles.length, is(1));

            InputStream in = getFileSystem().open(outputFiles[0]);
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            assertThat(reader.readLine(), is("1949\t111"));
            assertThat(reader.readLine(), is("1950\t22"));
            assertThat(reader.readLine(), nullValue());
            reader.close();
        }
    }

    //=*=*=*=*
    //./ch05/src/test/java/v3/MaxTemperatureDriverTest.java
    package v3;
    // cc MaxTemperatureDriverTestV3 A test for MaxTemperatureDriver that uses a local, in-process job runner
    import static org.hamcrest.Matchers.is;
    import static org.hamcrest.Matchers.nullValue;
    import static org.junit.Assert.assertThat;

    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.io.InputStreamReader;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.FileUtil;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.fs.PathFilter;
    import org.junit.Test;

    public class MaxTemperatureDriverTest {

        public static class OutputLogFilter implements PathFilter {
            public boolean accept(Path path) {
                return !path.getName().startsWith("_");
            }
        }

        //vv MaxTemperatureDriverTestV3
        @Test
        public void test() throws Exception {
            Configuration conf = new Configuration();
            conf.set("fs.default.name", "file:///");
            conf.set("mapred.job.tracker", "local");

            Path input = new Path("input/ncdc/micro");
            Path output = new Path("output");

            FileSystem fs = FileSystem.getLocal(conf);
            fs.delete(output, true); // delete old output

            MaxTemperatureDriver driver = new MaxTemperatureDriver();
            driver.setConf(conf);

            int exitCode = driver.run(new String[] { input.toString(), output.toString() });
            assertThat(exitCode, is(0));

            checkOutput(conf, output);
        }
        //^^ MaxTemperatureDriverTestV3

        private void checkOutput(Configuration conf, Path output) throws IOException {
            FileSystem fs = FileSystem.getLocal(conf);
            Path[] outputFiles = FileUtil.stat2Paths(fs.listStatus(output, new OutputLogFilter()));
            assertThat(outputFiles.length, is(1));

            BufferedReader actual = asBufferedReader(fs.open(outputFiles[0]));
            BufferedReader expected = asBufferedReader(getClass().getResourceAsStream("/expected.txt"));
            String expectedLine;
            while ((expectedLine = expected.readLine()) != null) {
                assertThat(actual.readLine(), is(expectedLine));
            }
            assertThat(actual.readLine(), nullValue());
            actual.close();
            expected.close();
        }

        private BufferedReader asBufferedReader(InputStream in) throws IOException {
            return new BufferedReader(new InputStreamReader(in));
        }
    }

    //=*=*=*=*
    //./ch05/src/test/java/v3/MaxTemperatureMapperTest.java
    package v3;

    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mrunit.mapreduce.MapDriver;
    import org.junit.Test;

    public class MaxTemperatureMapperTest {

        @Test
        public void processesValidRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9-00111+99999999999");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).withOutput(new Text("1950"), new IntWritable(-11)).runTest();
        }

        @Test
        public void processesPositiveTemperatureRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9+00111+99999999999");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).withOutput(new Text("1950"), new IntWritable(11)).runTest();
        }

        @Test
        public void ignoresMissingTemperatureRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9+99991+99999999999");
            // Temperature ^^^^^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).runTest();
        }

        @Test
        public void ignoresSuspectQualityRecord() throws IOException, InterruptedException {
            Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
            // Year ^^^^
                    "99999V0203201N00261220001CN9999999N9+00112+99999999999");
            // Temperature ^^^^^
            // Suspect quality ^
            new MapDriver<LongWritable, Text, Text, IntWritable>().withMapper(new MaxTemperatureMapper())
                    .withInputValue(value).runTest();
        }

    }

    //=*=*=*=*
    //./ch05/src/test/java/v5/MaxTemperatureMapperTest.java
    package v5;
    // == MaxTemperatureMapperTestV5Malformed
    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;
    import java.io.IOException;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.Counter;
    import org.apache.hadoop.mapreduce.Counters;
    import org.apache.hadoop.mrunit.mapreduce.MapDriver;
    import org.junit.Test;

public class MaxTemperatureMapperTest {

  @Test
  public void parsesValidRecord() throws IOException, InterruptedException {
    Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
                                  // Year ^^^^
        "99999V0203201N00261220001CN9999999N9-00111+99999999999");
                              // Temperature ^^^^^
    new MapDriver<LongWritable, Text, Text, IntWritable>()
      .withMapper(new MaxTemperatureMapper())
      .withInputValue(value)
      .withOutput(new Text("1950"), new IntWritable(-11))
      .runTest();
  }
  
  @Test
  public void parsesMissingTemperature() throws IOException,
      InterruptedException {
    Text value = new Text("0043011990999991950051518004+68750+023550FM-12+0382" +
                                  // Year ^^^^
        "99999V0203201N00261220001CN9999999N9+99991+99999999999");
                              // Temperature ^^^^^
    new MapDriver<LongWritable, Text, Text, IntWritable>()
      .withMapper(new MaxTemperatureMapper())
      .withInputValue(value)
      .runTest();
  }
//vv MaxTemperatureMapperTestV5Malformed
  @Test
  public void parsesMalformedTemperature() throws IOException,
      InterruptedException {
    Text value = new Text("0335999999433181957042302005+37950+139117SAO  +0004" +
                                  // Year ^^^^
        "RJSN V02011359003150070356999999433201957010100005+353");
                              // Temperature ^^^^^
    Counters counters = new Counters();
    new MapDriver<LongWritable, Text, Text, IntWritable>()
      .withMapper(new MaxTemperatureMapper())
      .withInputValue(value)
      .withCounters(counters)
      .runTest();
    Counter c = counters.findCounter(MaxTemperatureMapper.Temperature.MALFORMED);
    assertThat(c.getValue(), is(1L));
  }
// ^^ MaxTemperatureMapperTestV5Malformed
}

//=*=*=*=*
//./ch07/src/main/examples/MinimalMapReduce.java.input.txt
hadoop MinimalMapReduce "input/ncdc/all/190{1,2}.gz" output
//=*=*=*=*
//./ch07/src/main/examples/PartitionByStationUsingMultipleOutputFormat.java.input.txt
hadoop jar hadoop-examples.jar PartitionByStationUsingMultipleOutputFormat 'input/ncdc/all/190?.gz' output-part-by-station
//=*=*=*=*
//./ch07/src/main/examples/SmallFilesToSequenceFileConverter.java.input.txt
hadoop jar hadoop-examples.jar SmallFilesToSequenceFileConverter \
  -conf conf/hadoop-localhost.xml -D mapred.reduce.tasks=2 input/smallfiles output
//=*=*=*=*
//./ch07/src/main/java/MaxTemperatureWithMultipleInputs.java
// == MaxTemperatureWithMultipleInputs
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.*;

    public class MaxTemperatureWithMultipleInputs extends Configured implements Tool {

        static class MetOfficeMaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

            private MetOfficeRecordParser parser = new MetOfficeRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                parser.parse(value);
                if (parser.isValidTemperature()) {
                    context.write(new Text(parser.getYear()), new IntWritable(parser.getAirTemperature()));
                }
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 3) {
                JobBuilder.printUsage(this, "<ncdc input> <metoffice input> <output>");
                return -1;
            }

            Job job = new Job(getConf(), "Max temperature with multiple input formats");
            job.setJarByClass(getClass());

            Path ncdcInputPath = new Path(args[0]);
            Path metOfficeInputPath = new Path(args[1]);
            Path outputPath = new Path(args[2]);

            // vv MaxTemperatureWithMultipleInputs    
            MultipleInputs.addInputPath(job, ncdcInputPath, TextInputFormat.class, MaxTemperatureMapper.class);
            MultipleInputs.addInputPath(job, metOfficeInputPath, TextInputFormat.class,
                    MetOfficeMaxTemperatureMapper.class);
            // ^^ MaxTemperatureWithMultipleInputs
            FileOutputFormat.setOutputPath(job, outputPath);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureWithMultipleInputs(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/MinimalMapReduce.java
    // == MinimalMapReduce The simplest possible MapReduce driver, which uses the defaults
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    // vv MinimalMapReduce
    public class MinimalMapReduce extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            Job job = new Job(getConf());
            job.setJarByClass(getClass());
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MinimalMapReduce(), args);
            System.exit(exitCode);
        }
    }
    // ^^ MinimalMapReduce

    //=*=*=*=*
    //./ch07/src/main/java/MinimalMapReduceWithDefaults.java
    // cc MinimalMapReduceWithDefaults A minimal MapReduce driver, with the defaults explicitly set
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    //vv MinimalMapReduceWithDefaults
    public class MinimalMapReduceWithDefaults extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            /*[*/job.setInputFormatClass(TextInputFormat.class);

            job.setMapperClass(Mapper.class);

            job.setMapOutputKeyClass(LongWritable.class);
            job.setMapOutputValueClass(Text.class);

            job.setPartitionerClass(HashPartitioner.class);

            job.setNumReduceTasks(1);
            job.setReducerClass(Reducer.class);

            job.setOutputKeyClass(LongWritable.class);
            job.setOutputValueClass(Text.class);

            job.setOutputFormatClass(TextOutputFormat.class);/*]*/

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MinimalMapReduceWithDefaults(), args);
            System.exit(exitCode);
        }
    }
    // ^^ MinimalMapReduceWithDefaults

    //=*=*=*=*
    //./ch07/src/main/java/NonSplittableTextInputFormat.java
    // == NonSplittableTextInputFormat
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapreduce.JobContext;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

    public class NonSplittableTextInputFormat extends TextInputFormat {
        @Override
        protected boolean isSplitable(JobContext context, Path file) {
            return false;
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/PartitionByStationUsingMultipleOutputs.java
    // cc PartitionByStationUsingMultipleOutputs Partitions whole dataset into files named by the station ID using MultipleOutputs
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    //vv PartitionByStationUsingMultipleOutputs
    public class PartitionByStationUsingMultipleOutputs extends Configured implements Tool {

        static class StationMapper extends Mapper<LongWritable, Text, Text, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                parser.parse(value);
                context.write(new Text(parser.getStationId()), value);
            }
        }

        /*[*/static class MultipleOutputsReducer extends Reducer<Text, Text, NullWritable, Text> {

            private MultipleOutputs<NullWritable, Text> multipleOutputs;

            @Override
            protected void setup(Context context) throws IOException, InterruptedException {
                multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
            }

            @Override
            protected void reduce(Text key, Iterable<Text> values, Context context)
                    throws IOException, InterruptedException {
                for (Text value : values) {
                    multipleOutputs.write(NullWritable.get(), value, key.toString());
                }
            }

            @Override
            protected void cleanup(Context context) throws IOException, InterruptedException {
                multipleOutputs.close();
            }
        }/*]*/

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setMapperClass(StationMapper.class);
            job.setMapOutputKeyClass(Text.class);
            job.setReducerClass(MultipleOutputsReducer.class);
            job.setOutputKeyClass(NullWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new PartitionByStationUsingMultipleOutputs(), args);
            System.exit(exitCode);
        }
    }
    //^^ PartitionByStationUsingMultipleOutputs
    //=*=*=*=*
    //./ch07/src/main/java/PartitionByStationYearUsingMultipleOutputs.java
    // == PartitionByStationYearUsingMultipleOutputs
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    public class PartitionByStationYearUsingMultipleOutputs extends Configured implements Tool {

        static class StationMapper extends Mapper<LongWritable, Text, Text, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                parser.parse(value);
                context.write(new Text(parser.getStationId()), value);
            }
        }

        static class MultipleOutputsReducer extends Reducer<Text, Text, NullWritable, Text> {

            private MultipleOutputs<NullWritable, Text> multipleOutputs;
            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void setup(Context context) throws IOException, InterruptedException {
                multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);
            }

            // vv PartitionByStationYearUsingMultipleOutputs
            @Override
            protected void reduce(Text key, Iterable<Text> values, Context context)
                    throws IOException, InterruptedException {
                for (Text value : values) {
                    parser.parse(value);
                    String basePath = String.format("%s/%s/part", parser.getStationId(), parser.getYear());
                    multipleOutputs.write(NullWritable.get(), value, basePath);
                }
            }
            // ^^ PartitionByStationYearUsingMultipleOutputs

            @Override
            protected void cleanup(Context context) throws IOException, InterruptedException {
                multipleOutputs.close();
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setMapperClass(StationMapper.class);
            job.setMapOutputKeyClass(Text.class);
            job.setReducerClass(MultipleOutputsReducer.class);
            job.setOutputKeyClass(NullWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new PartitionByStationYearUsingMultipleOutputs(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/SmallFilesToSequenceFileConverter.java
    // cc SmallFilesToSequenceFileConverter A MapReduce program for packaging a collection of small files as a single SequenceFile
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.BytesWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.InputSplit;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.lib.input.FileSplit;
    import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    //vv SmallFilesToSequenceFileConverter
    public class SmallFilesToSequenceFileConverter extends Configured implements Tool {

        static class SequenceFileMapper extends Mapper<NullWritable, BytesWritable, Text, BytesWritable> {

            private Text filenameKey;

            @Override
            protected void setup(Context context) throws IOException, InterruptedException {
                InputSplit split = context.getInputSplit();
                Path path = ((FileSplit) split).getPath();
                filenameKey = new Text(path.toString());
            }

            @Override
            protected void map(NullWritable key, BytesWritable value, Context context)
                    throws IOException, InterruptedException {
                context.write(filenameKey, value);
            }

        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setInputFormatClass(WholeFileInputFormat.class);
            job.setOutputFormatClass(SequenceFileOutputFormat.class);

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(BytesWritable.class);

            job.setMapperClass(SequenceFileMapper.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SmallFilesToSequenceFileConverter(), args);
            System.exit(exitCode);
        }
    }
    // ^^ SmallFilesToSequenceFileConverter
    //=*=*=*=*
    //./ch07/src/main/java/StationPartitioner.java
    // == StationPartitioner
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Partitioner;

    //vv StationPartitioner
    public class StationPartitioner extends Partitioner<LongWritable, Text> {

        private NcdcRecordParser parser = new NcdcRecordParser();

        @Override
        public int getPartition(LongWritable key, Text value, int numPartitions) {
            parser.parse(value);
            return getPartition(parser.getStationId());
        }

        private int getPartition(String stationId) {
            /*...*/
            // ^^ StationPartitioner
            return 0;
            // vv StationPartitioner
        }

    }
    //^^ StationPartitioner
    //=*=*=*=*
    //./ch07/src/main/java/WholeFileInputFormat.java
    // cc WholeFileInputFormat An InputFormat for reading a whole file as a record
    import java.io.IOException;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.InputSplit;
    import org.apache.hadoop.mapreduce.JobContext;
    import org.apache.hadoop.mapreduce.RecordReader;
    import org.apache.hadoop.mapreduce.TaskAttemptContext;
    import org.apache.hadoop.mapreduce.lib.input.*;

    //vv WholeFileInputFormat
    public class WholeFileInputFormat extends FileInputFormat<NullWritable, BytesWritable> {

        @Override
        protected boolean isSplitable(JobContext context, Path file) {
            return false;
        }

        @Override
        public RecordReader<NullWritable, BytesWritable> createRecordReader(InputSplit split,
                TaskAttemptContext context) throws IOException, InterruptedException {
            WholeFileRecordReader reader = new WholeFileRecordReader();
            reader.initialize(split, context);
            return reader;
        }
    }
    //^^ WholeFileInputFormat

    //=*=*=*=*
    //./ch07/src/main/java/WholeFileRecordReader.java
    // cc WholeFileRecordReader The RecordReader used by WholeFileInputFormat for reading a whole file as a record
    import java.io.IOException;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataInputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.BytesWritable;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.mapreduce.InputSplit;
    import org.apache.hadoop.mapreduce.RecordReader;
    import org.apache.hadoop.mapreduce.TaskAttemptContext;
    import org.apache.hadoop.mapreduce.lib.input.FileSplit;

    //vv WholeFileRecordReader
    class WholeFileRecordReader extends RecordReader<NullWritable, BytesWritable> {

        private FileSplit fileSplit;
        private Configuration conf;
        private BytesWritable value = new BytesWritable();
        private boolean processed = false;

        @Override
        public void initialize(InputSplit split, TaskAttemptContext context) throws IOException, InterruptedException {
            this.fileSplit = (FileSplit) split;
            this.conf = context.getConfiguration();
        }

        @Override
        public boolean nextKeyValue() throws IOException, InterruptedException {
            if (!processed) {
                byte[] contents = new byte[(int) fileSplit.getLength()];
                Path file = fileSplit.getPath();
                FileSystem fs = file.getFileSystem(conf);
                FSDataInputStream in = null;
                try {
                    in = fs.open(file);
                    IOUtils.readFully(in, contents, 0, contents.length);
                    value.set(contents, 0, contents.length);
                } finally {
                    IOUtils.closeStream(in);
                }
                processed = true;
                return true;
            }
            return false;
        }

        @Override
        public NullWritable getCurrentKey() throws IOException, InterruptedException {
            return NullWritable.get();
        }

        @Override
        public BytesWritable getCurrentValue() throws IOException, InterruptedException {
            return value;
        }

        @Override
        public float getProgress() throws IOException {
            return processed ? 1.0f : 0.0f;
        }

        @Override
        public void close() throws IOException {
            // do nothing
        }
    }
    //^^ WholeFileRecordReader

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/MaxTemperatureWithMultipleInputs.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.MultipleInputs;
    import org.apache.hadoop.util.*;

    public class MaxTemperatureWithMultipleInputs extends Configured implements Tool {

        static class MetOfficeMaxTemperatureMapper extends MapReduceBase
                implements Mapper<LongWritable, Text, Text, IntWritable> {

            private MetOfficeRecordParser parser = new MetOfficeRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    output.collect(new Text(parser.getYear()), new IntWritable(parser.getAirTemperature()));
                }
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 3) {
                JobBuilder.printUsage(this, "<ncdc input> <metoffice input> <output>");
                return -1;
            }

            JobConf conf = new JobConf(getConf(), getClass());
            conf.setJobName("Max temperature with multiple input formats");

            Path ncdcInputPath = new Path(args[0]);
            Path metOfficeInputPath = new Path(args[1]);
            Path outputPath = new Path(args[2]);

            MultipleInputs.addInputPath(conf, ncdcInputPath, TextInputFormat.class, MaxTemperatureMapper.class);
            MultipleInputs.addInputPath(conf, metOfficeInputPath, TextInputFormat.class,
                    MetOfficeMaxTemperatureMapper.class);
            FileOutputFormat.setOutputPath(conf, outputPath);

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            conf.setCombinerClass(MaxTemperatureReducer.class);
            conf.setReducerClass(MaxTemperatureReducer.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureWithMultipleInputs(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/MinimalMapReduce.java
    package oldapi;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class MinimalMapReduce extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                System.err.printf("Usage: %s [generic options] <input> <output>\n", getClass().getSimpleName());
                ToolRunner.printGenericCommandUsage(System.err);
                return -1;
            }

            JobConf conf = new JobConf(getConf(), getClass());
            FileInputFormat.addInputPath(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));
            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MinimalMapReduce(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/MinimalMapReduceWithDefaults.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.*;
    import org.apache.hadoop.util.*;

    public class MinimalMapReduceWithDefaults extends Configured implements Tool {

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            /*[*/conf.setInputFormat(TextInputFormat.class);

            conf.setNumMapTasks(1);
            conf.setMapperClass(IdentityMapper.class);
            conf.setMapRunnerClass(MapRunner.class);

            conf.setMapOutputKeyClass(LongWritable.class);
            conf.setMapOutputValueClass(Text.class);

            conf.setPartitionerClass(HashPartitioner.class);

            conf.setNumReduceTasks(1);
            conf.setReducerClass(IdentityReducer.class);

            conf.setOutputKeyClass(LongWritable.class);
            conf.setOutputValueClass(Text.class);

            conf.setOutputFormat(TextOutputFormat.class);/*]*/

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MinimalMapReduceWithDefaults(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/NonSplittableTextInputFormat.java
    package oldapi;

    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.mapred.TextInputFormat;

    public class NonSplittableTextInputFormat extends TextInputFormat {
        @Override
        protected boolean isSplitable(FileSystem fs, Path file) {
            return false;
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/PartitionByStationUsingMultipleOutputFormat.java
    package oldapi;

    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;
    import org.apache.hadoop.util.*;

    public class PartitionByStationUsingMultipleOutputFormat extends Configured implements Tool {

        static class StationMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                output.collect(new Text(parser.getStationId()), value);
            }
        }

        static class StationReducer extends MapReduceBase implements Reducer<Text, Text, NullWritable, Text> {

            @Override
            public void reduce(Text key, Iterator<Text> values, OutputCollector<NullWritable, Text> output,
                    Reporter reporter) throws IOException {
                while (values.hasNext()) {
                    output.collect(NullWritable.get(), values.next());
                }
            }
        }

        /*[*/static class StationNameMultipleTextOutputFormat extends MultipleTextOutputFormat<NullWritable, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            protected String generateFileNameForKeyValue(NullWritable key, Text value, String name) {
                parser.parse(value);
                return parser.getStationId();
            }
        }/*]*/

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setMapperClass(StationMapper.class);
            conf.setMapOutputKeyClass(Text.class);
            conf.setReducerClass(StationReducer.class);
            conf.setOutputKeyClass(NullWritable.class);
            conf.setOutputFormat(StationNameMultipleTextOutputFormat.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new PartitionByStationUsingMultipleOutputFormat(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/PartitionByStationUsingMultipleOutputs.java
    package oldapi;

    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.*;
    import org.apache.hadoop.util.*;

    public class PartitionByStationUsingMultipleOutputs extends Configured implements Tool {

        static class StationMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                output.collect(new Text(parser.getStationId()), value);
            }
        }

        static class MultipleOutputsReducer extends MapReduceBase implements Reducer<Text, Text, NullWritable, Text> {

            private MultipleOutputs multipleOutputs;

            @Override
            public void configure(JobConf conf) {
                multipleOutputs = new MultipleOutputs(conf);
            }

            public void reduce(Text key, Iterator<Text> values, OutputCollector<NullWritable, Text> output,
                    Reporter reporter) throws IOException {

                OutputCollector collector = multipleOutputs.getCollector("station", key.toString().replace("-", ""),
                        reporter);
                while (values.hasNext()) {
                    collector.collect(NullWritable.get(), values.next());
                }
            }

            @Override
            public void close() throws IOException {
                multipleOutputs.close();
            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setMapperClass(StationMapper.class);
            conf.setMapOutputKeyClass(Text.class);
            conf.setReducerClass(MultipleOutputsReducer.class);
            conf.setOutputKeyClass(NullWritable.class);
            conf.setOutputFormat(NullOutputFormat.class); // suppress empty part file

            MultipleOutputs.addMultiNamedOutput(conf, "station", TextOutputFormat.class, NullWritable.class,
                    Text.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new PartitionByStationUsingMultipleOutputs(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/PartitionByStationYearUsingMultipleOutputFormat.java
    package oldapi;

    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat;
    import org.apache.hadoop.util.*;

    public class PartitionByStationYearUsingMultipleOutputFormat extends Configured implements Tool {

        static class StationMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                output.collect(new Text(parser.getStationId()), value);
            }
        }

        static class StationReducer extends MapReduceBase implements Reducer<Text, Text, NullWritable, Text> {

            @Override
            public void reduce(Text key, Iterator<Text> values, OutputCollector<NullWritable, Text> output,
                    Reporter reporter) throws IOException {
                while (values.hasNext()) {
                    output.collect(NullWritable.get(), values.next());
                }
            }
        }

        static class StationNameMultipleTextOutputFormat extends MultipleTextOutputFormat<NullWritable, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            protected String generateFileNameForKeyValue(NullWritable key, Text value, String name) {
                parser.parse(value);
                return parser.getStationId() + "/" + parser.getYear();
            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setMapperClass(StationMapper.class);
            conf.setMapOutputKeyClass(Text.class);
            conf.setReducerClass(StationReducer.class);
            conf.setOutputKeyClass(NullWritable.class);
            conf.setOutputFormat(StationNameMultipleTextOutputFormat.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new PartitionByStationYearUsingMultipleOutputFormat(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/SmallFilesToSequenceFileConverter.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.IdentityReducer;
    import org.apache.hadoop.util.*;

    public class SmallFilesToSequenceFileConverter extends Configured implements Tool {

        static class SequenceFileMapper extends MapReduceBase
                implements Mapper<NullWritable, BytesWritable, Text, BytesWritable> {

            private JobConf conf;

            @Override
            public void configure(JobConf conf) {
                this.conf = conf;
            }

            @Override
            public void map(NullWritable key, BytesWritable value, OutputCollector<Text, BytesWritable> output,
                    Reporter reporter) throws IOException {

                String filename = conf.get("map.input.file");
                output.collect(new Text(filename), value);
            }

        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setInputFormat(WholeFileInputFormat.class);
            conf.setOutputFormat(SequenceFileOutputFormat.class);

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(BytesWritable.class);

            conf.setMapperClass(SequenceFileMapper.class);
            conf.setReducerClass(IdentityReducer.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SmallFilesToSequenceFileConverter(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/StationPartitioner.java
    package oldapi;

    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;

    public class StationPartitioner implements Partitioner<LongWritable, Text> {

        private NcdcRecordParser parser = new NcdcRecordParser();

        @Override
        public int getPartition(LongWritable key, Text value, int numPartitions) {
            parser.parse(value);
            return getPartition(parser.getStationId());
        }

        private int getPartition(String stationId) {
            return 0;
        }

        @Override
        public void configure(JobConf conf) {
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/WholeFileInputFormat.java
    package oldapi;

    import java.io.IOException;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;

    public class WholeFileInputFormat extends FileInputFormat<NullWritable, BytesWritable> {

        @Override
        protected boolean isSplitable(FileSystem fs, Path filename) {
            return false;
        }

        @Override
        public RecordReader<NullWritable, BytesWritable> getRecordReader(InputSplit split, JobConf job,
                Reporter reporter) throws IOException {

            return new WholeFileRecordReader((FileSplit) split, job);
        }
    }

    //=*=*=*=*
    //./ch07/src/main/java/oldapi/WholeFileRecordReader.java
    package oldapi;

    import java.io.IOException;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;

    class WholeFileRecordReader implements RecordReader<NullWritable, BytesWritable> {

        private FileSplit fileSplit;
        private Configuration conf;
        private boolean processed = false;

        public WholeFileRecordReader(FileSplit fileSplit, Configuration conf) throws IOException {
            this.fileSplit = fileSplit;
            this.conf = conf;
        }

        @Override
        public NullWritable createKey() {
            return NullWritable.get();
        }

        @Override
        public BytesWritable createValue() {
            return new BytesWritable();
        }

        @Override
        public long getPos() throws IOException {
            return processed ? fileSplit.getLength() : 0;
        }

        @Override
        public float getProgress() throws IOException {
            return processed ? 1.0f : 0.0f;
        }

        @Override
        public boolean next(NullWritable key, BytesWritable value) throws IOException {
            if (!processed) {
                byte[] contents = new byte[(int) fileSplit.getLength()];
                Path file = fileSplit.getPath();
                FileSystem fs = file.getFileSystem(conf);
                FSDataInputStream in = null;
                try {
                    in = fs.open(file);
                    IOUtils.readFully(in, contents, 0, contents.length);
                    value.set(contents, 0, contents.length);
                } finally {
                    IOUtils.closeStream(in);
                }
                processed = true;
                return true;
            }
            return false;
        }

        @Override
        public void close() throws IOException {
            // do nothing
        }
    }

    //=*=*=*=*
    //./ch07/src/test/java/TextInputFormatsTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.*;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.NLineInputFormat;
    import org.junit.*;

    public class TextInputFormatsTest {

        private static final String BASE_PATH = "/tmp/" + TextInputFormatsTest.class.getSimpleName();

        private JobConf conf;
        private FileSystem fs;

        @Before
        public void setUp() throws Exception {
            conf = new JobConf();
            fs = FileSystem.get(conf);
        }

        @After
        public void tearDown() throws Exception {
            fs.delete(new Path(BASE_PATH), true);
        }

        @Test
        public void text() throws Exception {
            String input = "On the top of the Crumpetty Tree\n" + "The Quangle Wangle sat,\n"
                    + "But his face you could not see,\n" + "On account of his Beaver Hat.";

            writeInput(input);

            TextInputFormat format = new TextInputFormat();
            format.configure(conf);
            InputSplit[] splits = format.getSplits(conf, 1);
            RecordReader<LongWritable, Text> recordReader = format.getRecordReader(splits[0], conf, Reporter.NULL);
            checkNextLine(recordReader, 0, "On the top of the Crumpetty Tree");
            checkNextLine(recordReader, 33, "The Quangle Wangle sat,");
            checkNextLine(recordReader, 57, "But his face you could not see,");
            checkNextLine(recordReader, 89, "On account of his Beaver Hat.");
        }

        @Test
        public void keyValue() throws Exception {
            String input = "line1\tOn the top of the Crumpetty Tree\n" + "line2\tThe Quangle Wangle sat,\n"
                    + "line3\tBut his face you could not see,\n" + "line4\tOn account of his Beaver Hat.";

            writeInput(input);

            KeyValueTextInputFormat format = new KeyValueTextInputFormat();
            format.configure(conf);
            InputSplit[] splits = format.getSplits(conf, 1);
            RecordReader<Text, Text> recordReader = format.getRecordReader(splits[0], conf, Reporter.NULL);
            checkNextLine(recordReader, "line1", "On the top of the Crumpetty Tree");
            checkNextLine(recordReader, "line2", "The Quangle Wangle sat,");
            checkNextLine(recordReader, "line3", "But his face you could not see,");
            checkNextLine(recordReader, "line4", "On account of his Beaver Hat.");
        }

        @Test
        public void nLine() throws Exception {
            String input = "On the top of the Crumpetty Tree\n" + "The Quangle Wangle sat,\n"
                    + "But his face you could not see,\n" + "On account of his Beaver Hat.";

            writeInput(input);

            conf.setInt("mapred.line.input.format.linespermap", 2);
            NLineInputFormat format = new NLineInputFormat();
            format.configure(conf);
            InputSplit[] splits = format.getSplits(conf, 2);
            RecordReader<LongWritable, Text> recordReader = format.getRecordReader(splits[0], conf, Reporter.NULL);
            checkNextLine(recordReader, 0, "On the top of the Crumpetty Tree");
            checkNextLine(recordReader, 33, "The Quangle Wangle sat,");
            recordReader = format.getRecordReader(splits[1], conf, Reporter.NULL);
            checkNextLine(recordReader, 57, "But his face you could not see,");
            checkNextLine(recordReader, 89, "On account of his Beaver Hat.");
        }

        private void writeInput(String input) throws IOException {
            OutputStream out = fs.create(new Path(BASE_PATH, "input"));
            out.write(input.getBytes());
            out.close();

            FileInputFormat.setInputPaths(conf, BASE_PATH);
        }

        private void checkNextLine(RecordReader<LongWritable, Text> recordReader, long expectedKey,
                String expectedValue) throws IOException {
            LongWritable key = new LongWritable();
            Text value = new Text();
            assertThat(expectedValue, recordReader.next(key, value), is(true));
            assertThat(key.get(), is(expectedKey));
            assertThat(value.toString(), is(expectedValue));
        }

        private void checkNextLine(RecordReader<Text, Text> recordReader, String expectedKey, String expectedValue)
                throws IOException {
            Text key = new Text();
            Text value = new Text();
            assertThat(expectedValue, recordReader.next(key, value), is(true));
            assertThat(key.toString(), is(expectedKey));
            assertThat(value.toString(), is(expectedValue));
        }
    }

    //=*=*=*=*
    //./ch08/src/main/examples/LookupRecordByTemperature.java.input.txt
    hadoop jar hadoop-
    examples.jar LookupRecordByTemperature output-hashmapsort-100
    //=*=*=*=*
    //./ch08/src/main/examples/LookupRecordByTemperature.java.output.txt
    357460-99999 1956
    //=*=*=*=*
    //./ch08/src/main/examples/LookupRecordsByTemperature.java.input.txt
    hadoop jar hadoop-
    examples.jar LookupRecordsByTemperature output-hashmapsort-100\2>/dev/null|wc-l
    //=*=*=*=*
    //./ch08/src/main/examples/LookupRecordsByTemperature.java.output.txt
    1489272
    //=*=*=*=*
    //./ch08/src/main/examples/MaxTemperatureByStationNameUsingDistributedCacheFile.java.input.txt
    hadoop jar hadoop-
    examples.jar MaxTemperatureByStationNameUsingDistributedCacheFile\-
    files input/ncdc/metadata/stations-fixed-
    width.txt input/ncdc/
    all output
    //=*=*=*=*
    //./ch08/src/main/examples/MaxTemperatureWithCounters.java.input.txt
    hadoop jar hadoop-
    examples.jar MaxTemperatureWithCounters input/ncdc/
    all output-
    counters

    //=*=*=*=*
    //./ch08/src/main/examples/MissingTemperatureFields.java.input.txt
    hadoop
    jar hadoop-
    examples.jar MissingTemperatureFields
    job_200904200610_0003
    //=*=*=*=*
    //./ch08/src/main/examples/SortByTemperatureUsingHashPartitioner.java.input.txt
    hadoop
    jar hadoop-
    examples.jar SortByTemperatureUsingHashPartitioner\-
    D mapred.reduce.tasks=30 input/ncdc/all-
    seq output-
    hashsort

    //=*=*=*=*
    //./ch08/src/main/examples/SortByTemperatureUsingTotalOrderPartitioner.java.input.txt
    hadoop
    jar hadoop-
    examples.jar SortByTemperatureUsingTotalOrderPartitioner\-
    D mapred.reduce.tasks=30 input/ncdc/all-
    seq output-
    totalsort
    //=*=*=*=*
    //./ch08/src/main/examples/SortDataPreprocessor.java.input.txt
    hadoop
    jar hadoop-
    examples.jar SortDataPreprocessor input/ncdc/all\input/ncdc/all-seq
    //=*=*=*=*
    //./ch08/src/main/java/JoinRecordMapper.java
    // cc JoinRecordMapper Mapper for tagging weather records for a reduce-side join
    import java.io.IOException;

    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;

    //vv JoinRecordMapper
    public class JoinRecordMapper extends Mapper<LongWritable, Text, TextPair, Text> {
        private NcdcRecordParser parser = new NcdcRecordParser();

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            parser.parse(value);
            context.write(new TextPair(parser.getStationId(), "1"), value);
        }

    }
    //^^ JoinRecordMapper
    //=*=*=*=*
    //./ch08/src/main/java/JoinRecordWithStationName.java
    // cc JoinRecordWithStationName Application to join weather records with station names
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Partitioner;
    import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.*;

    // vv JoinRecordWithStationName
    public class JoinRecordWithStationName extends Configured implements Tool {

        public static class KeyPartitioner extends Partitioner<TextPair, Text> {
            @Override
            public int getPartition(/*[*/TextPair key/*]*/, Text value, int numPartitions) {
                return (/*[*/key.getFirst().hashCode()/*]*/ & Integer.MAX_VALUE) % numPartitions;
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 3) {
                JobBuilder.printUsage(this, "<ncdc input> <station input> <output>");
                return -1;
            }

            Job job = new Job(getConf(), "Join weather records with station names");
            job.setJarByClass(getClass());

            Path ncdcInputPath = new Path(args[0]);
            Path stationInputPath = new Path(args[1]);
            Path outputPath = new Path(args[2]);

            MultipleInputs.addInputPath(job, ncdcInputPath, TextInputFormat.class, JoinRecordMapper.class);
            MultipleInputs.addInputPath(job, stationInputPath, TextInputFormat.class, JoinStationMapper.class);
            FileOutputFormat.setOutputPath(job, outputPath);

            /*[*/job.setPartitionerClass(KeyPartitioner.class);
            job.setGroupingComparatorClass(TextPair.FirstComparator.class);/*]*/

            job.setMapOutputKeyClass(TextPair.class);

            job.setReducerClass(JoinReducer.class);

            job.setOutputKeyClass(Text.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new JoinRecordWithStationName(), args);
            System.exit(exitCode);
        }
    }
    // ^^ JoinRecordWithStationName

    //=*=*=*=*
    //./ch08/src/main/java/JoinReducer.java
    // cc JoinReducer Reducer for joining tagged station records with tagged weather records
    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;

    // vv JoinReducer
    public class JoinReducer extends Reducer<TextPair, Text, Text, Text> {

        @Override
        protected void reduce(TextPair key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            Iterator<Text> iter = values.iterator();
            Text stationName = new Text(iter.next());
            while (iter.hasNext()) {
                Text record = iter.next();
                Text outValue = new Text(stationName.toString() + "\t" + record.toString());
                context.write(key.getFirst(), outValue);
            }
        }
    }
    // ^^ JoinReducer
    //=*=*=*=*
    //./ch08/src/main/java/JoinStationMapper.java
    // cc JoinStationMapper Mapper for tagging station records for a reduce-side join
    import java.io.IOException;

    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.Mapper;

    // vv JoinStationMapper
    public class JoinStationMapper extends Mapper<LongWritable, Text, TextPair, Text> {
        private NcdcStationMetadataParser parser = new NcdcStationMetadataParser();

        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            if (parser.parse(value)) {
                context.write(new TextPair(parser.getStationId(), "0"), new Text(parser.getStationName()));
            }
        }
    }
    // ^^ JoinStationMapper
    //=*=*=*=*
    //./ch08/src/main/java/LookupRecordByTemperature.java
    // cc LookupRecordByTemperature Retrieve the first entry with a given key from a collection of MapFiles
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.io.MapFile.Reader;
    import org.apache.hadoop.mapreduce.Partitioner;
    import org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
    import org.apache.hadoop.util.*;

    // vv LookupRecordByTemperature
    public class LookupRecordByTemperature extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                JobBuilder.printUsage(this, "<path> <key>");
                return -1;
            }
            Path path = new Path(args[0]);
            IntWritable key = new IntWritable(Integer.parseInt(args[1]));

            Reader[] readers = /*[*/MapFileOutputFormat.getReaders(path, getConf())/*]*/;
            Partitioner<IntWritable, Text> partitioner = new HashPartitioner<IntWritable, Text>();
            Text val = new Text();
            Writable entry = /*[*/MapFileOutputFormat.getEntry(readers, partitioner, key, val)/*]*/;
            if (entry == null) {
                System.err.println("Key not found: " + key);
                return -1;
            }
            NcdcRecordParser parser = new NcdcRecordParser();
            parser.parse(val.toString());
            System.out.printf("%s\t%s\n", parser.getStationId(), parser.getYear());
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new LookupRecordByTemperature(), args);
            System.exit(exitCode);
        }
    }
    // ^^ LookupRecordByTemperature

    //=*=*=*=*
    //./ch08/src/main/java/LookupRecordsByTemperature.java
    // cc LookupRecordsByTemperature Retrieve all entries with a given key from a collection of MapFiles
    // == LookupRecordsByTemperature-ReaderFragment
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.io.MapFile.Reader;
    import org.apache.hadoop.mapreduce.Partitioner;
    import org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;
    import org.apache.hadoop.util.*;

    // vv LookupRecordsByTemperature
    public class LookupRecordsByTemperature extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                JobBuilder.printUsage(this, "<path> <key>");
                return -1;
            }
            Path path = new Path(args[0]);
            IntWritable key = new IntWritable(Integer.parseInt(args[1]));

            Reader[] readers = MapFileOutputFormat.getReaders(path, getConf());
            Partitioner<IntWritable, Text> partitioner = new HashPartitioner<IntWritable, Text>();
            Text val = new Text();

            // vv LookupRecordsByTemperature-ReaderFragment
            Reader reader = readers[partitioner.getPartition(key, val, readers.length)];
            // ^^ LookupRecordsByTemperature-ReaderFragment
            Writable entry = reader.get(key, val);
            if (entry == null) {
                System.err.println("Key not found: " + key);
                return -1;
            }
            NcdcRecordParser parser = new NcdcRecordParser();
            IntWritable nextKey = new IntWritable();
            do {
                parser.parse(val.toString());
                System.out.printf("%s\t%s\n", parser.getStationId(), parser.getYear());
            } while (reader.next(nextKey, val) && key.equals(nextKey));
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new LookupRecordsByTemperature(), args);
            System.exit(exitCode);
        }
    }
    // ^^ LookupRecordsByTemperature

    //=*=*=*=*
    //./ch08/src/main/java/MaxTemperatureByStationNameUsingDistributedCacheFile.java
    // cc MaxTemperatureByStationNameUsingDistributedCacheFile Application to find the maximum temperature by station, showing station names from a lookup table passed as a distributed cache file
    import java.io.File;
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    // vv MaxTemperatureByStationNameUsingDistributedCacheFile
    public class MaxTemperatureByStationNameUsingDistributedCacheFile extends Configured implements Tool {

        static class StationTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    context.write(new Text(parser.getStationId()), new IntWritable(parser.getAirTemperature()));
                }
            }
        }

        static class MaxTemperatureReducerWithStationLookup extends Reducer<Text, IntWritable, Text, IntWritable> {

            /*[*/private NcdcStationMetadata metadata;/*]*/

            /*[*/@Override
            protected void setup(Context context) throws IOException, InterruptedException {
                metadata = new NcdcStationMetadata();
                metadata.initialize(new File("stations-fixed-width.txt"));
            }/*]*/

            @Override
            protected void reduce(Text key, Iterable<IntWritable> values, Context context)
                    throws IOException, InterruptedException {

                /*[*/String stationName = metadata.getStationName(key.toString());/*]*/

                int maxValue = Integer.MIN_VALUE;
                for (IntWritable value : values) {
                    maxValue = Math.max(maxValue, value.get());
                }
                context.write(new Text(/*[*/stationName/*]*/), new IntWritable(maxValue));
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            job.setMapperClass(StationTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducerWithStationLookup.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureByStationNameUsingDistributedCacheFile(), args);
            System.exit(exitCode);
        }
    }
    // ^^ MaxTemperatureByStationNameUsingDistributedCacheFile

    //=*=*=*=*
    //./ch08/src/main/java/MaxTemperatureByStationNameUsingDistributedCacheFileApi.java
    // == MaxTemperatureByStationNameUsingDistributedCacheFileApi
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    public class MaxTemperatureByStationNameUsingDistributedCacheFileApi extends Configured implements Tool {

        static class StationTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    context.write(new Text(parser.getStationId()), new IntWritable(parser.getAirTemperature()));
                }
            }
        }

        static class MaxTemperatureReducerWithStationLookup extends Reducer<Text, IntWritable, Text, IntWritable> {

            private NcdcStationMetadata metadata;

            // vv MaxTemperatureByStationNameUsingDistributedCacheFileApi
            @Override
            protected void setup(Context context) throws IOException, InterruptedException {
                metadata = new NcdcStationMetadata();
                Path[] localPaths = context.getLocalCacheFiles();
                if (localPaths.length == 0) {
                    throw new FileNotFoundException("Distributed cache file not found.");
                }
                File localFile = new File(localPaths[0].toString());
                metadata.initialize(localFile);
            }
            // ^^ MaxTemperatureByStationNameUsingDistributedCacheFileApi

            @Override
            protected void reduce(Text key, Iterable<IntWritable> values, Context context)
                    throws IOException, InterruptedException {

                String stationName = metadata.getStationName(key.toString());

                int maxValue = Integer.MIN_VALUE;
                for (IntWritable value : values) {
                    maxValue = Math.max(maxValue, value.get());
                }
                context.write(new Text(stationName), new IntWritable(maxValue));
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            job.setMapperClass(StationTemperatureMapper.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducerWithStationLookup.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureByStationNameUsingDistributedCacheFileApi(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/MaxTemperatureUsingSecondarySort.java
    // cc MaxTemperatureUsingSecondarySort Application to find the maximum temperature by sorting temperatures in the key
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.WritableComparable;
    import org.apache.hadoop.io.WritableComparator;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Partitioner;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    // vv MaxTemperatureUsingSecondarySort
    public class MaxTemperatureUsingSecondarySort extends Configured implements Tool {

        static class MaxTemperatureMapper extends Mapper<LongWritable, Text, IntPair, NullWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    /*[*/context.write(new IntPair(parser.getYearInt(), parser.getAirTemperature()),
                            NullWritable.get());/*]*/
                }
            }
        }

        static class MaxTemperatureReducer extends Reducer<IntPair, NullWritable, IntPair, NullWritable> {

            @Override
            protected void reduce(IntPair key, Iterable<NullWritable> values, Context context)
                    throws IOException, InterruptedException {

                /*[*/context.write(key, NullWritable.get());/*]*/
            }
        }

        public static class FirstPartitioner extends Partitioner<IntPair, NullWritable> {

            @Override
            public int getPartition(IntPair key, NullWritable value, int numPartitions) {
                // multiply by 127 to perform some mixing
                return Math.abs(key.getFirst() * 127) % numPartitions;
            }
        }

        public static class KeyComparator extends WritableComparator {
            protected KeyComparator() {
                super(IntPair.class, true);
            }

            @Override
            public int compare(WritableComparable w1, WritableComparable w2) {
                IntPair ip1 = (IntPair) w1;
                IntPair ip2 = (IntPair) w2;
                int cmp = IntPair.compare(ip1.getFirst(), ip2.getFirst());
                if (cmp != 0) {
                    return cmp;
                }
                return -IntPair.compare(ip1.getSecond(), ip2.getSecond()); //reverse
            }
        }

        public static class GroupComparator extends WritableComparator {
            protected GroupComparator() {
                super(IntPair.class, true);
            }

            @Override
            public int compare(WritableComparable w1, WritableComparable w2) {
                IntPair ip1 = (IntPair) w1;
                IntPair ip2 = (IntPair) w2;
                return IntPair.compare(ip1.getFirst(), ip2.getFirst());
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setMapperClass(MaxTemperatureMapper.class);
            /*[*/job.setPartitionerClass(FirstPartitioner.class);
            /*]*/
            /*[*/job.setSortComparatorClass(KeyComparator.class);
            /*]*/
            /*[*/job.setGroupingComparatorClass(GroupComparator.class);/*]*/
            job.setReducerClass(MaxTemperatureReducer.class);
            job.setOutputKeyClass(IntPair.class);
            job.setOutputValueClass(NullWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureUsingSecondarySort(), args);
            System.exit(exitCode);
        }
    }
    // ^^ MaxTemperatureUsingSecondarySort

    //=*=*=*=*
    //./ch08/src/main/java/MaxTemperatureWithCounters.java
    // cc MaxTemperatureWithCounters Application to run the maximum temperature job, including counting missing and malformed fields and quality codes
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    // vv MaxTemperatureWithCounters
    public class MaxTemperatureWithCounters extends Configured implements Tool {

        enum Temperature {
            MISSING, MALFORMED
        }

        static class MaxTemperatureMapperWithCounters extends Mapper<LongWritable, Text, Text, IntWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    int airTemperature = parser.getAirTemperature();
                    context.write(new Text(parser.getYear()), new IntWritable(airTemperature));
                } else if (parser.isMalformedTemperature()) {
                    System.err.println("Ignoring possibly corrupt input: " + value);
                    context.getCounter(Temperature.MALFORMED).increment(1);
                } else if (parser.isMissingTemperature()) {
                    context.getCounter(Temperature.MISSING).increment(1);
                }

                // dynamic counter
                context.getCounter("TemperatureQuality", parser.getQuality()).increment(1);
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);

            job.setMapperClass(MaxTemperatureMapperWithCounters.class);
            job.setCombinerClass(MaxTemperatureReducer.class);
            job.setReducerClass(MaxTemperatureReducer.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureWithCounters(), args);
            System.exit(exitCode);
        }
    }
    // ^^ MaxTemperatureWithCounters

    //=*=*=*=*
    //./ch08/src/main/java/MissingTemperatureFields.java
    // cc MissingTemperatureFields Application to calculate the proportion of records with missing temperature fields
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class MissingTemperatureFields extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 1) {
                JobBuilder.printUsage(this, "<job ID>");
                return -1;
            }
            String jobID = args[0];
            JobClient jobClient = new JobClient(new JobConf(getConf()));
            RunningJob job = jobClient.getJob(JobID.forName(jobID));
            if (job == null) {
                System.err.printf("No job with ID %s found.\n", jobID);
                return -1;
            }
            if (!job.isComplete()) {
                System.err.printf("Job %s is not complete.\n", jobID);
                return -1;
            }

            Counters counters = job.getCounters();
            long missing = counters.getCounter(MaxTemperatureWithCounters.Temperature.MISSING);

            long total = counters.getCounter(Task.Counter.MAP_INPUT_RECORDS);

            System.out.printf("Records with missing temperature fields: %.2f%%\n", 100.0 * missing / total);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MissingTemperatureFields(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/NewMissingTemperatureFields.java
    // == NewMissingTemperatureFields Application to calculate the proportion of records with missing temperature fields

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.mapreduce.Cluster;
    import org.apache.hadoop.mapreduce.Counters;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.JobID;
    import org.apache.hadoop.mapreduce.TaskCounter;
    import org.apache.hadoop.util.*;

    public class NewMissingTemperatureFields extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 1) {
                JobBuilder.printUsage(this, "<job ID>");
                return -1;
            }
            String jobID = args[0];
            // vv NewMissingTemperatureFields
            Cluster cluster = new Cluster(getConf());
            Job job = cluster.getJob(JobID.forName(jobID));
            // ^^ NewMissingTemperatureFields
            if (job == null) {
                System.err.printf("No job with ID %s found.\n", jobID);
                return -1;
            }
            if (!job.isComplete()) {
                System.err.printf("Job %s is not complete.\n", jobID);
                return -1;
            }

            // vv NewMissingTemperatureFields
            Counters counters = job.getCounters();
            long missing = counters.findCounter(MaxTemperatureWithCounters.Temperature.MISSING).getValue();
            long total = counters.findCounter(TaskCounter.MAP_INPUT_RECORDS).getValue();
            // ^^ NewMissingTemperatureFields

            System.out.printf("Records with missing temperature fields: %.2f%%\n", 100.0 * missing / total);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new NewMissingTemperatureFields(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/SortByTemperatureToMapFile.java
    // cc SortByTemperatureToMapFile A MapReduce program for sorting a SequenceFile and producing MapFiles as output
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    // vv SortByTemperatureToMapFile
    public class SortByTemperatureToMapFile extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setInputFormatClass(SequenceFileInputFormat.class);
            job.setOutputKeyClass(IntWritable.class);
            /*[*/job.setOutputFormatClass(MapFileOutputFormat.class);/*]*/
            SequenceFileOutputFormat.setCompressOutput(job, true);
            SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortByTemperatureToMapFile(), args);
            System.exit(exitCode);
        }
    }
    // ^^ SortByTemperatureToMapFile

    //=*=*=*=*
    //./ch08/src/main/java/SortByTemperatureUsingHashPartitioner.java
    // cc SortByTemperatureUsingHashPartitioner A MapReduce program for sorting a SequenceFile with IntWritable keys using the default HashPartitioner
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    // vv SortByTemperatureUsingHashPartitioner
    public class SortByTemperatureUsingHashPartitioner extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setInputFormatClass(SequenceFileInputFormat.class);
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputFormatClass(SequenceFileOutputFormat.class);
            SequenceFileOutputFormat.setCompressOutput(job, true);
            SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortByTemperatureUsingHashPartitioner(), args);
            System.exit(exitCode);
        }
    }
    // ^^ SortByTemperatureUsingHashPartitioner

    //=*=*=*=*
    //./ch08/src/main/java/SortByTemperatureUsingTotalOrderPartitioner.java
    // cc SortByTemperatureUsingTotalOrderPartitioner A MapReduce program for sorting a SequenceFile with IntWritable keys using the TotalOrderPartitioner to globally sort the data
    import java.net.URI;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.filecache.DistributedCache;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.partition.InputSampler;
    import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner;
    import org.apache.hadoop.util.*;

    // vv SortByTemperatureUsingTotalOrderPartitioner
    public class SortByTemperatureUsingTotalOrderPartitioner extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setInputFormatClass(SequenceFileInputFormat.class);
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputFormatClass(SequenceFileOutputFormat.class);
            SequenceFileOutputFormat.setCompressOutput(job, true);
            SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);

            job.setPartitionerClass(TotalOrderPartitioner.class);

            InputSampler.Sampler<IntWritable, Text> sampler = new InputSampler.RandomSampler<IntWritable, Text>(0.1,
                    10000, 10);

            InputSampler.writePartitionFile(job, sampler);

            // Add to DistributedCache
            Configuration conf = job.getConfiguration();
            String partitionFile = TotalOrderPartitioner.getPartitionFile(conf);
            URI partitionUri = new URI(partitionFile + "#" + TotalOrderPartitioner.DEFAULT_PATH);
            DistributedCache.addCacheFile(partitionUri, conf);
            DistributedCache.createSymlink(conf);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortByTemperatureUsingTotalOrderPartitioner(), args);
            System.exit(exitCode);
        }
    }
    // ^^ SortByTemperatureUsingTotalOrderPartitioner

    //=*=*=*=*
    //./ch08/src/main/java/SortDataPreprocessor.java
    // cc SortDataPreprocessor A MapReduce program for transforming the weather data into SequenceFile format
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    // vv SortDataPreprocessor
    public class SortDataPreprocessor extends Configured implements Tool {

        static class CleanerMapper extends Mapper<LongWritable, Text, IntWritable, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    context.write(new IntWritable(parser.getAirTemperature()), value);
                }
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setMapperClass(CleanerMapper.class);
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(Text.class);
            job.setNumReduceTasks(0);
            job.setOutputFormatClass(SequenceFileOutputFormat.class);
            SequenceFileOutputFormat.setCompressOutput(job, true);
            SequenceFileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortDataPreprocessor(), args);
            System.exit(exitCode);
        }
    }
    // ^^ SortDataPreprocessor
    //=*=*=*=*
    //./ch08/src/main/java/TemperatureDistribution.java
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.lib.reduce.LongSumReducer;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    public class TemperatureDistribution extends Configured implements Tool {

        static class TemperatureCountMapper extends Mapper<LongWritable, Text, IntWritable, LongWritable> {

            private static final LongWritable ONE = new LongWritable(1);
            private NcdcRecordParser parser = new NcdcRecordParser();

            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    context.write(new IntWritable(parser.getAirTemperature() / 10), ONE);
                }
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (job == null) {
                return -1;
            }

            job.setMapperClass(TemperatureCountMapper.class);
            job.setCombinerClass(LongSumReducer.class);
            job.setReducerClass(LongSumReducer.class);
            job.setOutputKeyClass(IntWritable.class);
            job.setOutputValueClass(LongWritable.class);

            return job.waitForCompletion(true) ? 0 : 1;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new TemperatureDistribution(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/JoinRecordMapper.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;

    public class JoinRecordMapper extends MapReduceBase implements Mapper<LongWritable, Text, TextPair, Text> {
        private NcdcRecordParser parser = new NcdcRecordParser();

        public void map(LongWritable key, Text value, OutputCollector<TextPair, Text> output, Reporter reporter)
                throws IOException {

            parser.parse(value);
            output.collect(new TextPair(parser.getStationId(), "1"), value);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/JoinRecordWithStationName.java
    package oldapi;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.MultipleInputs;
    import org.apache.hadoop.util.*;

    public class JoinRecordWithStationName extends Configured implements Tool {

        public static class KeyPartitioner implements Partitioner<TextPair, Text> {
            @Override
            public void configure(JobConf job) {
            }

            @Override
            public int getPartition(/*[*/TextPair key/*]*/, Text value, int numPartitions) {
                return (/*[*/key.getFirst().hashCode()/*]*/ & Integer.MAX_VALUE) % numPartitions;
            }
        }

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 3) {
                JobBuilder.printUsage(this, "<ncdc input> <station input> <output>");
                return -1;
            }

            JobConf conf = new JobConf(getConf(), getClass());
            conf.setJobName("Join record with station name");

            Path ncdcInputPath = new Path(args[0]);
            Path stationInputPath = new Path(args[1]);
            Path outputPath = new Path(args[2]);

            MultipleInputs.addInputPath(conf, ncdcInputPath, TextInputFormat.class, JoinRecordMapper.class);
            MultipleInputs.addInputPath(conf, stationInputPath, TextInputFormat.class, JoinStationMapper.class);
            FileOutputFormat.setOutputPath(conf, outputPath);

            /*[*/conf.setPartitionerClass(KeyPartitioner.class);
            conf.setOutputValueGroupingComparator(TextPair.FirstComparator.class);/*]*/

            conf.setMapOutputKeyClass(TextPair.class);

            conf.setReducerClass(JoinReducer.class);

            conf.setOutputKeyClass(Text.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new JoinRecordWithStationName(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/JoinReducer.java
    package oldapi;

    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.*;

    public class JoinReducer extends MapReduceBase implements Reducer<TextPair, Text, Text, Text> {

        public void reduce(TextPair key, Iterator<Text> values, OutputCollector<Text, Text> output, Reporter reporter)
                throws IOException {

            Text stationName = new Text(values.next());
            while (values.hasNext()) {
                Text record = values.next();
                Text outValue = new Text(stationName.toString() + "\t" + record.toString());
                output.collect(key.getFirst(), outValue);
            }
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/JoinStationMapper.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;

    public class JoinStationMapper extends MapReduceBase implements Mapper<LongWritable, Text, TextPair, Text> {
        private NcdcStationMetadataParser parser = new NcdcStationMetadataParser();

        public void map(LongWritable key, Text value, OutputCollector<TextPair, Text> output, Reporter reporter)
                throws IOException {

            if (parser.parse(value)) {
                output.collect(new TextPair(parser.getStationId(), "0"), new Text(parser.getStationName()));
            }
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/LookupRecordByTemperature.java
    package oldapi;

    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.io.MapFile.Reader;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.HashPartitioner;
    import org.apache.hadoop.util.*;

    public class LookupRecordByTemperature extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                JobBuilder.printUsage(this, "<path> <key>");
                return -1;
            }
            Path path = new Path(args[0]);
            IntWritable key = new IntWritable(Integer.parseInt(args[1]));
            FileSystem fs = path.getFileSystem(getConf());

            Reader[] readers = /*[*/MapFileOutputFormat.getReaders(fs, path, getConf())/*]*/;
            Partitioner<IntWritable, Text> partitioner = new HashPartitioner<IntWritable, Text>();
            Text val = new Text();
            Writable entry = /*[*/MapFileOutputFormat.getEntry(readers, partitioner, key, val)/*]*/;
            if (entry == null) {
                System.err.println("Key not found: " + key);
                return -1;
            }
            NcdcRecordParser parser = new NcdcRecordParser();
            parser.parse(val.toString());
            System.out.printf("%s\t%s\n", parser.getStationId(), parser.getYear());
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new LookupRecordByTemperature(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/LookupRecordsByTemperature.java
    package oldapi;

    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.fs.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.io.MapFile.Reader;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.HashPartitioner;
    import org.apache.hadoop.util.*;

    public class LookupRecordsByTemperature extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 2) {
                JobBuilder.printUsage(this, "<path> <key>");
                return -1;
            }
            Path path = new Path(args[0]);
            IntWritable key = new IntWritable(Integer.parseInt(args[1]));
            FileSystem fs = path.getFileSystem(getConf());

            Reader[] readers = MapFileOutputFormat.getReaders(fs, path, getConf());
            Partitioner<IntWritable, Text> partitioner = new HashPartitioner<IntWritable, Text>();
            Text val = new Text();

            Reader reader = readers[partitioner.getPartition(key, val, readers.length)];
            Writable entry = reader.get(key, val);
            if (entry == null) {
                System.err.println("Key not found: " + key);
                return -1;
            }
            NcdcRecordParser parser = new NcdcRecordParser();
            IntWritable nextKey = new IntWritable();
            do {
                parser.parse(val.toString());
                System.out.printf("%s\t%s\n", parser.getStationId(), parser.getYear());
            } while (reader.next(nextKey, val) && key.equals(nextKey));
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new LookupRecordsByTemperature(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/MaxTemperatureByStationNameUsingDistributedCacheFile.java
    package oldapi;

    import java.io.*;
    import java.util.Iterator;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class MaxTemperatureByStationNameUsingDistributedCacheFile extends Configured implements Tool {

        static class StationTemperatureMapper extends MapReduceBase
                implements Mapper<LongWritable, Text, Text, IntWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    output.collect(new Text(parser.getStationId()), new IntWritable(parser.getAirTemperature()));
                }
            }
        }

        static class MaxTemperatureReducerWithStationLookup extends MapReduceBase
                implements Reducer<Text, IntWritable, Text, IntWritable> {

            /*[*/private NcdcStationMetadata metadata;/*]*/

            /*[*/@Override
            public void configure(JobConf conf) {
                metadata = new NcdcStationMetadata();
                try {
                    metadata.initialize(new File("stations-fixed-width.txt"));
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            }/*]*/

            public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
                    Reporter reporter) throws IOException {

                /*[*/String stationName = metadata.getStationName(key.toString());/*]*/

                int maxValue = Integer.MIN_VALUE;
                while (values.hasNext()) {
                    maxValue = Math.max(maxValue, values.next().get());
                }
                output.collect(new Text(/*[*/stationName/*]*/), new IntWritable(maxValue));
            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            conf.setMapperClass(StationTemperatureMapper.class);
            conf.setCombinerClass(MaxTemperatureReducer.class);
            conf.setReducerClass(MaxTemperatureReducerWithStationLookup.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureByStationNameUsingDistributedCacheFile(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/MaxTemperatureByStationNameUsingDistributedCacheFileApi.java
    // == OldMaxTemperatureByStationNameUsingDistributedCacheFileApi
    package oldapi;

    import java.io.*;
    import java.util.Iterator;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.filecache.DistributedCache;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class MaxTemperatureByStationNameUsingDistributedCacheFileApi extends Configured implements Tool {

        static class StationTemperatureMapper extends MapReduceBase
                implements Mapper<LongWritable, Text, Text, IntWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    output.collect(new Text(parser.getStationId()), new IntWritable(parser.getAirTemperature()));
                }
            }
        }

        static class MaxTemperatureReducerWithStationLookup extends MapReduceBase
                implements Reducer<Text, IntWritable, Text, IntWritable> {

            private NcdcStationMetadata metadata;

            // vv OldMaxTemperatureByStationNameUsingDistributedCacheFileApi
            @Override
            public void configure(JobConf conf) {
                metadata = new NcdcStationMetadata();
                try {
                    Path[] localPaths = /*[*/DistributedCache.getLocalCacheFiles(conf);/*]*/
                    if (localPaths.length == 0) {
                        throw new FileNotFoundException("Distributed cache file not found.");
                    }
                    File localFile = new File(localPaths[0].toString());
                    metadata.initialize(localFile);
                } catch (IOException e) {
                    throw new RuntimeException(e);
                }
            }
            // ^^ OldMaxTemperatureByStationNameUsingDistributedCacheFileApi

            public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
                    Reporter reporter) throws IOException {

                String stationName = metadata.getStationName(key.toString());

                int maxValue = Integer.MIN_VALUE;
                while (values.hasNext()) {
                    maxValue = Math.max(maxValue, values.next().get());
                }
                output.collect(new Text(stationName), new IntWritable(maxValue));
            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            conf.setMapperClass(StationTemperatureMapper.class);
            conf.setCombinerClass(MaxTemperatureReducer.class);
            conf.setReducerClass(MaxTemperatureReducerWithStationLookup.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureByStationNameUsingDistributedCacheFileApi(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/MaxTemperatureUsingSecondarySort.java
    package oldapi;

    import java.io.IOException;
    import java.util.Iterator;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class MaxTemperatureUsingSecondarySort extends Configured implements Tool {

        static class MaxTemperatureMapper extends MapReduceBase
                implements Mapper<LongWritable, Text, IntPair, NullWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<IntPair, NullWritable> output,
                    Reporter reporter) throws IOException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    /*[*/output.collect(new IntPair(parser.getYearInt(), +parser.getAirTemperature()),
                            NullWritable.get());/*]*/
                }
            }
        }

        static class MaxTemperatureReducer extends MapReduceBase
                implements Reducer<IntPair, NullWritable, IntPair, NullWritable> {

            public void reduce(IntPair key, Iterator<NullWritable> values,
                    OutputCollector<IntPair, NullWritable> output, Reporter reporter) throws IOException {

                /*[*/output.collect(key, NullWritable.get());/*]*/
            }
        }

        public static class FirstPartitioner implements Partitioner<IntPair, NullWritable> {

            @Override
            public void configure(JobConf job) {
            }

            @Override
            public int getPartition(IntPair key, NullWritable value, int numPartitions) {
                return Math.abs(key.getFirst() * 127) % numPartitions;
            }
        }

        public static class KeyComparator extends WritableComparator {
            protected KeyComparator() {
                super(IntPair.class, true);
            }

            @Override
            public int compare(WritableComparable w1, WritableComparable w2) {
                IntPair ip1 = (IntPair) w1;
                IntPair ip2 = (IntPair) w2;
                int cmp = IntPair.compare(ip1.getFirst(), ip2.getFirst());
                if (cmp != 0) {
                    return cmp;
                }
                return -IntPair.compare(ip1.getSecond(), ip2.getSecond()); //reverse
            }
        }

        public static class GroupComparator extends WritableComparator {
            protected GroupComparator() {
                super(IntPair.class, true);
            }

            @Override
            public int compare(WritableComparable w1, WritableComparable w2) {
                IntPair ip1 = (IntPair) w1;
                IntPair ip2 = (IntPair) w2;
                return IntPair.compare(ip1.getFirst(), ip2.getFirst());
            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setMapperClass(MaxTemperatureMapper.class);
            /*[*/conf.setPartitionerClass(FirstPartitioner.class);
            /*]*/
            /*[*/conf.setOutputKeyComparatorClass(KeyComparator.class);
            /*]*/
            /*[*/conf.setOutputValueGroupingComparator(GroupComparator.class);/*]*/
            conf.setReducerClass(MaxTemperatureReducer.class);
            conf.setOutputKeyClass(IntPair.class);
            conf.setOutputValueClass(NullWritable.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureUsingSecondarySort(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/MaxTemperatureWithCounters.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class MaxTemperatureWithCounters extends Configured implements Tool {

        enum Temperature {
            MISSING, MALFORMED
        }

        static class MaxTemperatureMapperWithCounters extends MapReduceBase
                implements Mapper<LongWritable, Text, Text, IntWritable> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    int airTemperature = parser.getAirTemperature();
                    output.collect(new Text(parser.getYear()), new IntWritable(airTemperature));
                } else if (parser.isMalformedTemperature()) {
                    System.err.println("Ignoring possibly corrupt input: " + value);
                    reporter.incrCounter(Temperature.MALFORMED, 1);
                } else if (parser.isMissingTemperature()) {
                    reporter.incrCounter(Temperature.MISSING, 1);
                }

                // dynamic counter
                reporter.incrCounter("TemperatureQuality", parser.getQuality(), 1);

            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);

            conf.setMapperClass(MaxTemperatureMapperWithCounters.class);
            conf.setCombinerClass(MaxTemperatureReducer.class);
            conf.setReducerClass(MaxTemperatureReducer.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MaxTemperatureWithCounters(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/MissingTemperatureFields.java
    package oldapi;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class MissingTemperatureFields extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            if (args.length != 1) {
                JobBuilder.printUsage(this, "<job ID>");
                return -1;
            }
            String jobID = args[0];
            JobClient jobClient = new JobClient(new JobConf(getConf()));
            RunningJob job = jobClient.getJob(JobID.forName(jobID));
            if (job == null) {
                System.err.printf("No job with ID %s found.\n", jobID);
                return -1;
            }
            if (!job.isComplete()) {
                System.err.printf("Job %s is not complete.\n", jobID);
                return -1;
            }

            Counters counters = job.getCounters();
            long missing = counters.getCounter(MaxTemperatureWithCounters.Temperature.MISSING);

            long total = counters.findCounter("org.apache.hadoop.mapred.Task$Counter", "MAP_INPUT_RECORDS")
                    .getCounter();

            System.out.printf("Records with missing temperature fields: %.2f%%\n", 100.0 * missing / total);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new MissingTemperatureFields(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/SortByTemperatureToMapFile.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class SortByTemperatureToMapFile extends Configured implements Tool {

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setInputFormat(SequenceFileInputFormat.class);
            conf.setOutputKeyClass(IntWritable.class);
            /*[*/conf.setOutputFormat(MapFileOutputFormat.class);/*]*/
            SequenceFileOutputFormat.setCompressOutput(conf, true);
            SequenceFileOutputFormat.setOutputCompressorClass(conf, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortByTemperatureToMapFile(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/SortByTemperatureUsingHashPartitioner.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.compress.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class SortByTemperatureUsingHashPartitioner extends Configured implements Tool {

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setInputFormat(SequenceFileInputFormat.class);
            conf.setOutputKeyClass(IntWritable.class);
            conf.setOutputFormat(SequenceFileOutputFormat.class);
            SequenceFileOutputFormat.setCompressOutput(conf, true);
            SequenceFileOutputFormat.setOutputCompressorClass(conf, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortByTemperatureUsingHashPartitioner(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/SortByTemperatureUsingTotalOrderPartitioner.java
    package oldapi;

    import java.net.URI;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.filecache.DistributedCache;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.*;
    import org.apache.hadoop.util.*;

    public class SortByTemperatureUsingTotalOrderPartitioner extends Configured implements Tool {

        @Override
        public int run(String[] args) throws Exception {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setInputFormat(SequenceFileInputFormat.class);
            conf.setOutputKeyClass(IntWritable.class);
            conf.setOutputFormat(SequenceFileOutputFormat.class);
            SequenceFileOutputFormat.setCompressOutput(conf, true);
            SequenceFileOutputFormat.setOutputCompressorClass(conf, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK);

            conf.setPartitionerClass(TotalOrderPartitioner.class);

            InputSampler.Sampler<IntWritable, Text> sampler = new InputSampler.RandomSampler<IntWritable, Text>(0.1,
                    10000, 10);

            Path input = FileInputFormat.getInputPaths(conf)[0];
            input = input.makeQualified(input.getFileSystem(conf));

            Path partitionFile = new Path(input, "_partitions");
            TotalOrderPartitioner.setPartitionFile(conf, partitionFile);
            InputSampler.writePartitionFile(conf, sampler);

            // Add to DistributedCache
            URI partitionUri = new URI(partitionFile.toString() + "#_partitions");
            DistributedCache.addCacheFile(partitionUri, conf);
            DistributedCache.createSymlink(conf);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortByTemperatureUsingTotalOrderPartitioner(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/SortDataPreprocessor.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.io.SequenceFile.CompressionType;
    import org.apache.hadoop.io.compress.GzipCodec;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class SortDataPreprocessor extends Configured implements Tool {

        static class CleanerMapper extends MapReduceBase implements Mapper<LongWritable, Text, IntWritable, Text> {

            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter)
                    throws IOException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    output.collect(new IntWritable(parser.getAirTemperature()), value);
                }
            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setMapperClass(CleanerMapper.class);
            conf.setOutputKeyClass(IntWritable.class);
            conf.setOutputValueClass(Text.class);
            conf.setNumReduceTasks(0);
            conf.setOutputFormat(SequenceFileOutputFormat.class);
            SequenceFileOutputFormat.setCompressOutput(conf, true);
            SequenceFileOutputFormat.setOutputCompressorClass(conf, GzipCodec.class);
            SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new SortDataPreprocessor(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/main/java/oldapi/TemperatureDistribution.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.LongSumReducer;
    import org.apache.hadoop.util.*;

    public class TemperatureDistribution extends Configured implements Tool {

        static class TemperatureCountMapper extends MapReduceBase
                implements Mapper<LongWritable, Text, IntWritable, LongWritable> {

            private static final LongWritable ONE = new LongWritable(1);
            private NcdcRecordParser parser = new NcdcRecordParser();

            public void map(LongWritable key, Text value, OutputCollector<IntWritable, LongWritable> output,
                    Reporter reporter) throws IOException {

                parser.parse(value);
                if (parser.isValidTemperature()) {
                    output.collect(new IntWritable(parser.getAirTemperature() / 10), ONE);
                }
            }
        }

        @Override
        public int run(String[] args) throws IOException {
            JobConf conf = JobBuilder.parseInputAndOutput(this, getConf(), args);
            if (conf == null) {
                return -1;
            }

            conf.setMapperClass(TemperatureCountMapper.class);
            conf.setCombinerClass(LongSumReducer.class);
            conf.setReducerClass(LongSumReducer.class);
            conf.setOutputKeyClass(IntWritable.class);
            conf.setOutputValueClass(LongWritable.class);

            JobClient.runJob(conf);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new TemperatureDistribution(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch08/src/test/java/KeyFieldBasedComparatorTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.lib.KeyFieldBasedComparator;
    import org.junit.Test;

    public class KeyFieldBasedComparatorTest {

        Text line1 = new Text("2\t30");
        Text line2 = new Text("10\t4");
        Text line3 = new Text("10\t30");

        @Test
        public void firstKey() throws Exception {
            check("-k1,1", line1, line2, 1);
            check("-k1", line1, line2, 1);
            check("-k1.1", line1, line2, 1);
            check("-k1n", line1, line2, -1);
            check("-k1nr", line1, line2, 1);
        }

        @Test
        public void secondKey() throws Exception {
            check("-k2,2", line1, line2, -1);
            check("-k2", line1, line2, -1);
            check("-k2.1", line1, line2, -1);
            check("-k2n", line1, line2, 1);
            check("-k2nr", line1, line2, -1);
        }

        @Test
        public void firstThenSecondKey() throws Exception {
            check("-k1 -k2", line1, line2, 1);
            check("-k1 -k2", line2, line3, 1);
            //check("-k1 -k2n",  line2, line3, -1);
            check("-k1 -k2nr", line2, line3, 1);
        }

        private void check(String options, Text l1, Text l2, int c) throws IOException {
            JobConf conf = new JobConf();
            conf.setKeyFieldComparatorOptions(options);
            KeyFieldBasedComparator comp = new KeyFieldBasedComparator();
            comp.configure(conf);

            DataOutputBuffer out1 = serialize(l1);
            DataOutputBuffer out2 = serialize(l2);
            assertThat(options, comp.compare(out1.getData(), 0, out1.getLength(), out2.getData(), 0, out2.getLength()),
                    is(c));
        }

        public static DataOutputBuffer serialize(Writable writable) throws IOException {
            DataOutputBuffer out = new DataOutputBuffer();
            DataOutputStream dataOut = new DataOutputStream(out);
            writable.write(dataOut);
            dataOut.close();
            return out;
        }

    }

    //=*=*=*=*
    //./ch11/src/main/java/com/hadoopbook/pig/CutLoadFunc.java
    //cc CutLoadFunc A LoadFunc UDF to load tuple fields as column ranges
    package com.hadoopbook.pig;

    import java.io.IOException;
    import java.util.List;

    import org.apache.commons.logging.LogFactory;
    import org.apache.commons.logging.Log;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.InputFormat;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.RecordReader;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.pig.LoadFunc;
    import org.apache.pig.backend.executionengine.ExecException;
    import org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSplit;
    import org.apache.pig.data.DataByteArray;
    import org.apache.pig.data.Tuple;
    import org.apache.pig.data.TupleFactory;

    // vv CutLoadFunc
    public class CutLoadFunc extends LoadFunc {

        private static final Log LOG = LogFactory.getLog(CutLoadFunc.class);

        private final List<Range> ranges;
        private final TupleFactory tupleFactory = TupleFactory.getInstance();
        private RecordReader reader;

        public CutLoadFunc(String cutPattern) {
            ranges = Range.parse(cutPattern);
        }

        @Override
        public void setLocation(String location, Job job) throws IOException {
            FileInputFormat.setInputPaths(job, location);
        }

        @Override
        public InputFormat getInputFormat() {
            return new TextInputFormat();
        }

        @Override
        public void prepareToRead(RecordReader reader, PigSplit split) {
            this.reader = reader;
        }

        @Override
        public Tuple getNext() throws IOException {
            try {
                if (!reader.nextKeyValue()) {
                    return null;
                }
                Text value = (Text) reader.getCurrentValue();
                String line = value.toString();
                Tuple tuple = tupleFactory.newTuple(ranges.size());
                for (int i = 0; i < ranges.size(); i++) {
                    Range range = ranges.get(i);
                    if (range.getEnd() > line.length()) {
                        LOG.warn(String.format("Range end (%s) is longer than line length (%s)", range.getEnd(),
                                line.length()));
                        continue;
                    }
                    tuple.set(i, new DataByteArray(range.getSubstring(line)));
                }
                return tuple;
            } catch (InterruptedException e) {
                throw new ExecException(e);
            }
        }
    }
    // ^^ CutLoadFunc

    //=*=*=*=*
    //./ch11/src/main/java/com/hadoopbook/pig/IsGoodQuality.java
    //cc IsGoodQuality A FilterFunc UDF to remove records with unsatisfactory temperature quality readings
    // == IsGoodQualityTyped
    //vv IsGoodQuality
    package com.hadoopbook.pig;

    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.List;

    import org.apache.pig.FilterFunc;
    //^^ IsGoodQuality
    import org.apache.pig.FuncSpec;
    //vv IsGoodQuality  
    import org.apache.pig.backend.executionengine.ExecException;
    import org.apache.pig.data.DataType;
    import org.apache.pig.data.Tuple;
    import org.apache.pig.impl.logicalLayer.FrontendException;
    //^^ IsGoodQuality
    import org.apache.pig.impl.logicalLayer.schema.Schema;
    //vv IsGoodQuality

    public class IsGoodQuality extends FilterFunc {

        @Override
        public Boolean exec(Tuple tuple) throws IOException {
            if (tuple == null || tuple.size() == 0) {
                return false;
            }
            try {
                Object object = tuple.get(0);
                if (object == null) {
                    return false;
                }
                int i = (Integer) object;
                return i == 0 || i == 1 || i == 4 || i == 5 || i == 9;
            } catch (ExecException e) {
                throw new IOException(e);
            }
        }

        //^^ IsGoodQuality
        //vv IsGoodQualityTyped
        @Override
        public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
            List<FuncSpec> funcSpecs = new ArrayList<FuncSpec>();
            funcSpecs.add(new FuncSpec(this.getClass().getName(),
                    new Schema(new Schema.FieldSchema(null, DataType.INTEGER))));

            return funcSpecs;
        }
        //^^ IsGoodQualityTyped  
        //vv IsGoodQuality  
    }
    // ^^ IsGoodQuality
    //=*=*=*=*
    //./ch11/src/main/java/com/hadoopbook/pig/Range.java
    package com.hadoopbook.pig;

    import java.util.ArrayList;
    import java.util.Collections;
    import java.util.List;

    public class Range {
        private final int start;
        private final int end;

        public Range(int start, int end) {
            this.start = start;
            this.end = end;
        }

        public int getStart() {
            return start;
        }

        public int getEnd() {
            return end;
        }

        public String getSubstring(String line) {
            return line.substring(start - 1, end);
        }

        @Override
        public int hashCode() {
            return start * 37 + end;
        }

        @Override
        public boolean equals(Object obj) {
            if (!(obj instanceof Range)) {
                return false;
            }
            Range other = (Range) obj;
            return this.start == other.start && this.end == other.end;
        }

        public static List<Range> parse(String rangeSpec) throws IllegalArgumentException {
            if (rangeSpec.length() == 0) {
                return Collections.emptyList();
            }
            List<Range> ranges = new ArrayList<Range>();
            String[] specs = rangeSpec.split(",");
            for (String spec : specs) {
                String[] split = spec.split("-");
                try {
                    ranges.add(new Range(Integer.parseInt(split[0]), Integer.parseInt(split[1])));
                } catch (NumberFormatException e) {
                    throw new IllegalArgumentException(e.getMessage());
                }
            }
            return ranges;
        }

    }

    //=*=*=*=*
    //./ch11/src/main/java/com/hadoopbook/pig/Trim.java
    package com.hadoopbook.pig;

    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.List;

    import org.apache.pig.EvalFunc;
    import org.apache.pig.FuncSpec;
    import org.apache.pig.backend.executionengine.ExecException;
    import org.apache.pig.data.DataType;
    import org.apache.pig.data.Tuple;
    import org.apache.pig.impl.logicalLayer.FrontendException;
    import org.apache.pig.impl.logicalLayer.schema.Schema;

    //cc Trim An EvalFunc UDF to trim leading and trailing whitespace from chararray values
    //vv Trim
    public class Trim extends EvalFunc<String> {

        @Override
        public String exec(Tuple input) throws IOException {
            if (input == null || input.size() == 0) {
                return null;
            }
            try {
                Object object = input.get(0);
                if (object == null) {
                    return null;
                }
                return ((String) object).trim();
            } catch (ExecException e) {
                throw new IOException(e);
            }
        }

        @Override
        public List<FuncSpec> getArgToFuncMapping() throws FrontendException {
            List<FuncSpec> funcList = new ArrayList<FuncSpec>();
            funcList.add(new FuncSpec(this.getClass().getName(),
                    new Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));

            return funcList;
        }
    }
    // ^^ Trim
    //=*=*=*=*
    //./ch11/src/test/java/com/hadoopbook/pig/IsGoodQualityTest.java
    package com.hadoopbook.pig;

    import static org.hamcrest.Matchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;

    import org.apache.pig.data.*;
    import org.junit.*;

    public class IsGoodQualityTest {

        private IsGoodQuality func;

        @Before
        public void setUp() {
            func = new IsGoodQuality();
        }

        @Test
        public void nullTuple() throws IOException {
            assertThat(func.exec(null), is(false));
        }

        @Test
        public void emptyTuple() throws IOException {
            Tuple tuple = TupleFactory.getInstance().newTuple();
            assertThat(func.exec(tuple), is(false));
        }

        @Test
        public void tupleWithNullField() throws IOException {
            Tuple tuple = TupleFactory.getInstance().newTuple((Object) null);
            assertThat(func.exec(tuple), is(false));
        }

        @Test
        public void badQuality() throws IOException {
            Tuple tuple = TupleFactory.getInstance().newTuple(new Integer(2));
            assertThat(func.exec(tuple), is(false));
        }

        @Test
        public void goodQuality() throws IOException {
            Tuple tuple = TupleFactory.getInstance().newTuple(new Integer(1));
            assertThat(func.exec(tuple), is(true));
        }

    }

    //=*=*=*=*
    //./ch11/src/test/java/com/hadoopbook/pig/RangeTest.java
    package com.hadoopbook.pig;

    import static org.hamcrest.Matchers.*;
    import static org.junit.Assert.assertThat;

    import java.util.List;

    import org.junit.*;

    public class RangeTest {

        @Test
        public void parsesEmptyRangeSpec() {
            assertThat(Range.parse("").size(), is(0));
        }

        @Test
        public void parsesSingleRangeSpec() {
            List<Range> ranges = Range.parse("1-3");
            assertThat(ranges.size(), is(1));
            assertThat(ranges.get(0), is(new Range(1, 3)));
        }

        @Test
        public void parsesMultipleRangeSpec() {
            List<Range> ranges = Range.parse("1-3,5-10");
            assertThat(ranges.size(), is(2));
            assertThat(ranges.get(0), is(new Range(1, 3)));
            assertThat(ranges.get(1), is(new Range(5, 10)));
        }

        @Test(expected = IllegalArgumentException.class)
        public void failsOnInvalidSpec() {
            Range.parse("1-n");
        }
    }

    //=*=*=*=*
    //./ch12/src/main/java/com/hadoopbook/hive/Maximum.java
    package com.hadoopbook.hive;

    import org.apache.hadoop.hive.ql.exec.UDAF;
    import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
    import org.apache.hadoop.io.IntWritable;

    public class Maximum extends UDAF {

        public static class MaximumIntUDAFEvaluator implements UDAFEvaluator {

            private IntWritable result;

            public void init() {
                System.err.printf("%s %s\n", hashCode(), "init");
                result = null;
            }

            public boolean iterate(IntWritable value) {
                System.err.printf("%s %s %s\n", hashCode(), "iterate", value);
                if (value == null) {
                    return true;
                }
                if (result == null) {
                    result = new IntWritable(value.get());
                } else {
                    result.set(Math.max(result.get(), value.get()));
                }
                return true;
            }

            public IntWritable terminatePartial() {
                System.err.printf("%s %s\n", hashCode(), "terminatePartial");
                return result;
            }

            public boolean merge(IntWritable other) {
                System.err.printf("%s %s %s\n", hashCode(), "merge", other);
                return iterate(other);
            }

            public IntWritable terminate() {
                System.err.printf("%s %s\n", hashCode(), "terminate");
                return result;
            }
        }
    }

    //=*=*=*=*
    //./ch12/src/main/java/com/hadoopbook/hive/Mean.java
    package com.hadoopbook.hive;

    import org.apache.hadoop.hive.ql.exec.UDAF;
    import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
    import org.apache.hadoop.hive.serde2.io.DoubleWritable;

    public class Mean extends UDAF {

        public static class MeanDoubleUDAFEvaluator implements UDAFEvaluator {
            public static class PartialResult {
                double sum;
                long count;
            }

            private PartialResult partial;

            public void init() {
                partial = null;
            }

            public boolean iterate(DoubleWritable value) {
                if (value == null) {
                    return true;
                }
                if (partial == null) {
                    partial = new PartialResult();
                }
                partial.sum += value.get();
                partial.count++;
                return true;
            }

            public PartialResult terminatePartial() {
                return partial;
            }

            public boolean merge(PartialResult other) {
                if (other == null) {
                    return true;
                }
                if (partial == null) {
                    partial = new PartialResult();
                }
                partial.sum += other.sum;
                partial.count += other.count;
                return true;
            }

            public DoubleWritable terminate() {
                if (partial == null) {
                    return null;
                }
                return new DoubleWritable(partial.sum / partial.count);
            }
        }
    }

    //=*=*=*=*
    //./ch12/src/main/java/com/hadoopbook/hive/Strip.java
    package com.hadoopbook.hive;

    import org.apache.commons.lang.StringUtils;
    import org.apache.hadoop.hive.ql.exec.UDF;
    import org.apache.hadoop.io.Text;

    public class Strip extends UDF {
        private Text result = new Text();

        public Text evaluate(Text str) {
            if (str == null) {
                return null;
            }
            result.set(StringUtils.strip(str.toString()));
            return result;
        }

        public Text evaluate(Text str, String stripChars) {
            if (str == null) {
                return null;
            }
            result.set(StringUtils.strip(str.toString(), stripChars));
            return result;
        }
    }
    //=*=*=*=*
    //./ch13/src/main/java/HBaseStationCli.java
    import java.io.IOException;
    import java.util.HashMap;
    import java.util.Map;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.client.Get;
    import org.apache.hadoop.hbase.client.HTable;
    import org.apache.hadoop.hbase.client.Result;
    import org.apache.hadoop.hbase.util.Bytes;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    public class HBaseStationCli extends Configured implements Tool {
        static final byte[] INFO_COLUMNFAMILY = Bytes.toBytes("info");
        static final byte[] NAME_QUALIFIER = Bytes.toBytes("name");
        static final byte[] LOCATION_QUALIFIER = Bytes.toBytes("location");
        static final byte[] DESCRIPTION_QUALIFIER = Bytes.toBytes("description");

        public Map<String, String> getStationInfo(HTable table, String stationId) throws IOException {
            Get get = new Get(Bytes.toBytes(stationId));
            get.addColumn(INFO_COLUMNFAMILY);
            Result res = table.get(get);
            if (res == null) {
                return null;
            }
            Map<String, String> resultMap = new HashMap<String, String>();
            resultMap.put("name", getValue(res, INFO_COLUMNFAMILY, NAME_QUALIFIER));
            resultMap.put("location", getValue(res, INFO_COLUMNFAMILY, LOCATION_QUALIFIER));
            resultMap.put("description", getValue(res, INFO_COLUMNFAMILY, DESCRIPTION_QUALIFIER));
            return resultMap;
        }

        private static String getValue(Result res, byte[] cf, byte[] qualifier) {
            byte[] value = res.getValue(cf, qualifier);
            return value == null ? "" : Bytes.toString(value);
        }

        public int run(String[] args) throws IOException {
            if (args.length != 1) {
                System.err.println("Usage: HBaseStationCli <station_id>");
                return -1;
            }

            HTable table = new HTable(new HBaseConfiguration(getConf()), "stations");
            Map<String, String> stationInfo = getStationInfo(table, args[0]);
            if (stationInfo == null) {
                System.err.printf("Station ID %s not found.\n", args[0]);
                return -1;
            }
            for (Map.Entry<String, String> station : stationInfo.entrySet()) {
                // Print the date, time, and temperature
                System.out.printf("%s\t%s\n", station.getKey(), station.getValue());
            }
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new HBaseConfiguration(), new HBaseStationCli(), args);
            System.exit(exitCode);
        }
    }
    //=*=*=*=*
    //./ch13/src/main/java/HBaseStationImporter.java
    import java.io.*;
    import java.util.Map;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.client.HTable;
    import org.apache.hadoop.hbase.client.Put;
    import org.apache.hadoop.hbase.util.Bytes;
    import org.apache.hadoop.util.*;

    public class HBaseStationImporter extends Configured implements Tool {

        public int run(String[] args) throws IOException {
            if (args.length != 1) {
                System.err.println("Usage: HBaseStationImporter <input>");
                return -1;
            }

            HTable table = new HTable(new HBaseConfiguration(getConf()), "stations");

            NcdcStationMetadata metadata = new NcdcStationMetadata();
            metadata.initialize(new File(args[0]));
            Map<String, String> stationIdToNameMap = metadata.getStationIdToNameMap();

            for (Map.Entry<String, String> entry : stationIdToNameMap.entrySet()) {
                Put put = new Put(Bytes.toBytes(entry.getKey()));
                put.add(HBaseStationCli.INFO_COLUMNFAMILY, HBaseStationCli.NAME_QUALIFIER,
                        Bytes.toBytes(entry.getValue()));
                put.add(HBaseStationCli.INFO_COLUMNFAMILY, HBaseStationCli.DESCRIPTION_QUALIFIER,
                        Bytes.toBytes("Description..."));
                put.add(HBaseStationCli.INFO_COLUMNFAMILY, HBaseStationCli.LOCATION_QUALIFIER,
                        Bytes.toBytes("Location..."));
                table.put(put);
            }
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new HBaseConfiguration(), new HBaseStationImporter(), args);
            System.exit(exitCode);
        }
    }
    //=*=*=*=*
    //./ch13/src/main/java/HBaseTemperatureCli.java
    import java.io.IOException;
    import java.util.*;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.client.*;
    import org.apache.hadoop.hbase.util.Bytes;
    import org.apache.hadoop.util.*;

    public class HBaseTemperatureCli extends Configured implements Tool {
        static final byte[] DATA_COLUMNFAMILY = Bytes.toBytes("data");
        static final byte[] AIRTEMP_QUALIFIER = Bytes.toBytes("airtemp");

        public NavigableMap<Long, Integer> getStationObservations(HTable table, String stationId, long maxStamp,
                int maxCount) throws IOException {
            byte[] startRow = RowKeyConverter.makeObservationRowKey(stationId, maxStamp);
            NavigableMap<Long, Integer> resultMap = new TreeMap<Long, Integer>();
            Scan scan = new Scan(startRow);
            scan.addColumn(DATA_COLUMNFAMILY, AIRTEMP_QUALIFIER);
            ResultScanner scanner = table.getScanner(scan);
            Result res = null;
            int count = 0;
            try {
                while ((res = scanner.next()) != null && count++ < maxCount) {
                    byte[] row = res.getRow();
                    byte[] value = res.getValue(DATA_COLUMNFAMILY, AIRTEMP_QUALIFIER);
                    Long stamp = Long.MAX_VALUE - Bytes.toLong(row, row.length - Bytes.SIZEOF_LONG, Bytes.SIZEOF_LONG);
                    Integer temp = Bytes.toInt(value);
                    resultMap.put(stamp, temp);
                }
            } finally {
                scanner.close();
            }
            return resultMap;
        }

        /**
         * Return the last ten observations.
         */
        public NavigableMap<Long, Integer> getStationObservations(HTable table, String stationId) throws IOException {
            return getStationObservations(table, stationId, Long.MAX_VALUE, 10);
        }

        public int run(String[] args) throws IOException {
            if (args.length != 1) {
                System.err.println("Usage: HBaseTemperatureCli <station_id>");
                return -1;
            }

            HTable table = new HTable(new HBaseConfiguration(getConf()), "observations");
            NavigableMap<Long, Integer> observations = getStationObservations(table, args[0]).descendingMap();
            for (Map.Entry<Long, Integer> observation : observations.entrySet()) {
                // Print the date, time, and temperature
                System.out.printf("%1$tF %1$tR\t%2$s\n", observation.getKey(), observation.getValue());
            }
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new HBaseConfiguration(), new HBaseTemperatureCli(), args);
            System.exit(exitCode);
        }
    }

    //=*=*=*=*
    //./ch13/src/main/java/HBaseTemperatureImporter.java
    import java.io.IOException;

    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.client.HTable;
    import org.apache.hadoop.hbase.client.Put;
    import org.apache.hadoop.hbase.util.Bytes;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.mapred.lib.NullOutputFormat;
    import org.apache.hadoop.util.*;

    public class HBaseTemperatureImporter extends Configured implements Tool {

        // Inner-class for map
        static class HBaseTemperatureMapper<K, V> extends MapReduceBase implements Mapper<LongWritable, Text, K, V> {
            private NcdcRecordParser parser = new NcdcRecordParser();
            private HTable table;

            public void map(LongWritable key, Text value, OutputCollector<K, V> output, Reporter reporter)
                    throws IOException {
                parser.parse(value.toString());
                if (parser.isValidTemperature()) {
                    byte[] rowKey = RowKeyConverter.makeObservationRowKey(parser.getStationId(),
                            parser.getObservationDate().getTime());
                    Put p = new Put(rowKey);
                    p.add(HBaseTemperatureCli.DATA_COLUMNFAMILY, HBaseTemperatureCli.AIRTEMP_QUALIFIER,
                            Bytes.toBytes(parser.getAirTemperature()));
                    table.put(p);
                }
            }

            public void configure(JobConf jc) {
                super.configure(jc);
                // Create the HBase table client once up-front and keep it around
                // rather than create on each map invocation.
                try {
                    this.table = new HTable(new HBaseConfiguration(jc), "observations");
                } catch (IOException e) {
                    throw new RuntimeException("Failed HTable construction", e);
                }
            }

            @Override
            public void close() throws IOException {
                super.close();
                table.close();
            }
        }

        public int run(String[] args) throws IOException {
            if (args.length != 1) {
                System.err.println("Usage: HBaseTemperatureImporter <input>");
                return -1;
            }
            JobConf jc = new JobConf(getConf(), getClass());
            FileInputFormat.addInputPath(jc, new Path(args[0]));
            jc.setMapperClass(HBaseTemperatureMapper.class);
            jc.setNumReduceTasks(0);
            jc.setOutputFormat(NullOutputFormat.class);
            JobClient.runJob(jc);
            return 0;
        }

        public static void main(String[] args) throws Exception {
            int exitCode = ToolRunner.run(new HBaseConfiguration(), new HBaseTemperatureImporter(), args);
            System.exit(exitCode);
        }
    }
    //=*=*=*=*
    //./ch13/src/main/java/RowKeyConverter.java
    import org.apache.hadoop.hbase.util.Bytes;

    public class RowKeyConverter {

        private static final int STATION_ID_LENGTH = 12;

        /**
         * @return A row key whose format is: <station_id> <reverse_order_epoch>
         */
        public static byte[] makeObservationRowKey(String stationId, long observationTime) {
            byte[] row = new byte[STATION_ID_LENGTH + Bytes.SIZEOF_LONG];
            Bytes.putBytes(row, 0, Bytes.toBytes(stationId), 0, STATION_ID_LENGTH);
            long reverseOrderEpoch = Long.MAX_VALUE - observationTime;
            Bytes.putLong(row, STATION_ID_LENGTH, reverseOrderEpoch);
            return row;
        }
    }

    //=*=*=*=*
    //./ch14/src/main/java/ActiveKeyValueStore.java
    //== ActiveKeyValueStore
    //== ActiveKeyValueStore-Read
    //== ActiveKeyValueStore-Write

    import java.nio.charset.Charset;

    import org.apache.zookeeper.CreateMode;
    import org.apache.zookeeper.KeeperException;
    import org.apache.zookeeper.Watcher;
    import org.apache.zookeeper.ZooDefs.Ids;
    import org.apache.zookeeper.data.Stat;

    // vv ActiveKeyValueStore
    public class ActiveKeyValueStore extends ConnectionWatcher {

        private static final Charset CHARSET = Charset.forName("UTF-8");

        //vv ActiveKeyValueStore-Write
        public void write(String path, String value) throws InterruptedException, KeeperException {
            Stat stat = zk.exists(path, false);
            if (stat == null) {
                zk.create(path, value.getBytes(CHARSET), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
            } else {
                zk.setData(path, value.getBytes(CHARSET), -1);
            }
        }

        //^^ ActiveKeyValueStore-Write
        //^^ ActiveKeyValueStore
        //vv ActiveKeyValueStore-Read
        public String read(String path, Watcher watcher) throws InterruptedException, KeeperException {
            byte[] data = zk.getData(path, watcher, null/*stat*/);
            return new String(data, CHARSET);
        }
        //^^ ActiveKeyValueStore-Read
        //vv ActiveKeyValueStore
    }
    //^^ ActiveKeyValueStore

    //=*=*=*=*
    //./ch14/src/main/java/ConfigUpdater.java
    //cc ConfigUpdater An application that updates a property in ZooKeeper at random times
    import java.io.IOException;
    import java.util.Random;
    import java.util.concurrent.TimeUnit;

    import org.apache.zookeeper.KeeperException;

    // vv ConfigUpdater
    public class ConfigUpdater {

        public static final String PATH = "/config";

        private ActiveKeyValueStore store;
        private Random random = new Random();

        public ConfigUpdater(String hosts) throws IOException, InterruptedException {
            store = new ActiveKeyValueStore();
            store.connect(hosts);
        }

        public void run() throws InterruptedException, KeeperException {
            while (true) {
                String value = random.nextInt(100) + "";
                store.write(PATH, value);
                System.out.printf("Set %s to %s\n", PATH, value);
                TimeUnit.SECONDS.sleep(random.nextInt(10));
            }
        }

        public static void main(String[] args) throws Exception {
            ConfigUpdater configUpdater = new ConfigUpdater(args[0]);
            configUpdater.run();
        }
    }
    // ^^ ConfigUpdater

    //=*=*=*=*
    //./ch14/src/main/java/ConfigWatcher.java
    //cc ConfigWatcher An application that watches for updates of a property in ZooKeeper and prints them to the console
    import java.io.IOException;

    import org.apache.zookeeper.KeeperException;
    import org.apache.zookeeper.WatchedEvent;
    import org.apache.zookeeper.Watcher;
    import org.apache.zookeeper.Watcher.Event.EventType;

    // vv ConfigWatcher
    public class ConfigWatcher implements Watcher {

        private ActiveKeyValueStore store;

        public ConfigWatcher(String hosts) throws IOException, InterruptedException {
            store = new ActiveKeyValueStore();
            store.connect(hosts);
        }

        public void displayConfig() throws InterruptedException, KeeperException {
            String value = store.read(ConfigUpdater.PATH, this);
            System.out.printf("Read %s as %s\n", ConfigUpdater.PATH, value);
        }

        @Override
        public void process(WatchedEvent event) {
            if (event.getType() == EventType.NodeDataChanged) {
                try {
                    displayConfig();
                } catch (InterruptedException e) {
                    System.err.println("Interrupted. Exiting.");
                    Thread.currentThread().interrupt();
                } catch (KeeperException e) {
                    System.err.printf("KeeperException: %s. Exiting.\n", e);
                }
            }
        }

        public static void main(String[] args) throws Exception {
            ConfigWatcher configWatcher = new ConfigWatcher(args[0]);
            configWatcher.displayConfig();

            // stay alive until process is killed or thread is interrupted
            Thread.sleep(Long.MAX_VALUE);
        }
    }
    //^^ ConfigWatcher

    //=*=*=*=*
    //./ch14/src/main/java/ConnectionWatcher.java
    //cc ConnectionWatcher A helper class that waits for the connection to ZooKeeper to be established
    import java.io.IOException;
    import java.util.concurrent.CountDownLatch;

    import org.apache.zookeeper.WatchedEvent;
    import org.apache.zookeeper.Watcher;
    import org.apache.zookeeper.ZooKeeper;
    import org.apache.zookeeper.Watcher.Event.KeeperState;

    // vv ConnectionWatcher
    public class ConnectionWatcher implements Watcher {

        private static final int SESSION_TIMEOUT = 5000;

        protected ZooKeeper zk;
        private CountDownLatch connectedSignal = new CountDownLatch(1);

        public void connect(String hosts) throws IOException, InterruptedException {
            zk = new ZooKeeper(hosts, SESSION_TIMEOUT, this);
            connectedSignal.await();
        }

        @Override
        public void process(WatchedEvent event) {
            if (event.getState() == KeeperState.SyncConnected) {
                connectedSignal.countDown();
            }
        }

        public void close() throws InterruptedException {
            zk.close();
        }
    }
    // ^^ ConnectionWatcher
    //=*=*=*=*
    //./ch14/src/main/java/CreateGroup.java
    //cc CreateGroup A program to create a znode representing a group in ZooKeeper

    import java.io.IOException;
    import java.util.concurrent.CountDownLatch;

    import org.apache.zookeeper.CreateMode;
    import org.apache.zookeeper.KeeperException;
    import org.apache.zookeeper.WatchedEvent;
    import org.apache.zookeeper.Watcher;
    import org.apache.zookeeper.ZooKeeper;
    import org.apache.zookeeper.Watcher.Event.KeeperState;
    import org.apache.zookeeper.ZooDefs.Ids;

    // vv CreateGroup
    public class CreateGroup implements Watcher {

        private static final int SESSION_TIMEOUT = 5000;

        private ZooKeeper zk;
        private CountDownLatch connectedSignal = new CountDownLatch(1);

        public void connect(String hosts) throws IOException, InterruptedException {
            zk = new ZooKeeper(hosts, SESSION_TIMEOUT, this);
            connectedSignal.await();
        }

        @Override
        public void process(WatchedEvent event) { // Watcher interface
            if (event.getState() == KeeperState.SyncConnected) {
                connectedSignal.countDown();
            }
        }

        public void create(String groupName) throws KeeperException, InterruptedException {
            String path = "/" + groupName;
            String createdPath = zk.create(path, null/*data*/, Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
            System.out.println("Created " + createdPath);
        }

        public void close() throws InterruptedException {
            zk.close();
        }

        public static void main(String[] args) throws Exception {
            CreateGroup createGroup = new CreateGroup();
            createGroup.connect(args[0]);
            createGroup.create(args[1]);
            createGroup.close();
        }
    }
    // ^^ CreateGroup

    //=*=*=*=*
    //./ch14/src/main/java/DeleteGroup.java
    //cc DeleteGroup A program to delete a group and its members
    import java.util.List;

    import org.apache.zookeeper.KeeperException;

    // vv DeleteGroup
    public class DeleteGroup extends ConnectionWatcher {

        public void delete(String groupName) throws KeeperException, InterruptedException {
            String path = "/" + groupName;

            try {
                List<String> children = zk.getChildren(path, false);
                for (String child : children) {
                    zk.delete(path + "/" + child, -1);
                }
                zk.delete(path, -1);
            } catch (KeeperException.NoNodeException e) {
                System.out.printf("Group %s does not exist\n", groupName);
                System.exit(1);
            }
        }

        public static void main(String[] args) throws Exception {
            DeleteGroup deleteGroup = new DeleteGroup();
            deleteGroup.connect(args[0]);
            deleteGroup.delete(args[1]);
            deleteGroup.close();
        }
    }
    // ^^ DeleteGroup

    //=*=*=*=*
    //./ch14/src/main/java/JoinGroup.java
    //cc JoinGroup A program that joins a group

    import org.apache.zookeeper.CreateMode;
    import org.apache.zookeeper.KeeperException;
    import org.apache.zookeeper.ZooDefs.Ids;

    // vv JoinGroup
    public class JoinGroup extends ConnectionWatcher {

        public void join(String groupName, String memberName) throws KeeperException, InterruptedException {
            String path = "/" + groupName + "/" + memberName;
            String createdPath = zk.create(path, null/*data*/, Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
            System.out.println("Created " + createdPath);
        }

        public static void main(String[] args) throws Exception {
            JoinGroup joinGroup = new JoinGroup();
            joinGroup.connect(args[0]);
            joinGroup.join(args[1], args[2]);

            // stay alive until process is killed or thread is interrupted
            Thread.sleep(Long.MAX_VALUE);
        }
    }
    // ^^ JoinGroup

    //=*=*=*=*
    //./ch14/src/main/java/ListGroup.java
    //cc ListGroup A program to list the members in a group
    import java.util.List;

    import org.apache.zookeeper.KeeperException;

    // vv ListGroup
    public class ListGroup extends ConnectionWatcher {

        public void list(String groupName) throws KeeperException, InterruptedException {
            String path = "/" + groupName;

            try {
                List<String> children = zk.getChildren(path, false);
                if (children.isEmpty()) {
                    System.out.printf("No members in group %s\n", groupName);
                    System.exit(1);
                }
                for (String child : children) {
                    System.out.println(child);
                }
            } catch (KeeperException.NoNodeException e) {
                System.out.printf("Group %s does not exist\n", groupName);
                System.exit(1);
            }
        }

        public static void main(String[] args) throws Exception {
            ListGroup listGroup = new ListGroup();
            listGroup.connect(args[0]);
            listGroup.list(args[1]);
            listGroup.close();
        }
    }
    // ^^ ListGroup

    //=*=*=*=*
    //./ch14/src/main/java/ResilientActiveKeyValueStore.java
    //== ResilientActiveKeyValueStore-Write

    import java.nio.charset.Charset;
    import java.util.concurrent.TimeUnit;

    import org.apache.zookeeper.CreateMode;
    import org.apache.zookeeper.KeeperException;
    import org.apache.zookeeper.Watcher;
    import org.apache.zookeeper.ZooDefs.Ids;
    import org.apache.zookeeper.data.Stat;

    public class ResilientActiveKeyValueStore extends ConnectionWatcher {

        private static final Charset CHARSET = Charset.forName("UTF-8");
        private static final int MAX_RETRIES = 5;
        private static final int RETRY_PERIOD_SECONDS = 10;

        //vv ResilientActiveKeyValueStore-Write
        public void write(String path, String value) throws InterruptedException, KeeperException {
            int retries = 0;
            while (true) {
                try {
                    Stat stat = zk.exists(path, false);
                    if (stat == null) {
                        zk.create(path, value.getBytes(CHARSET), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
                    } else {
                        zk.setData(path, value.getBytes(CHARSET), stat.getVersion());
                    }
                    return;
                } catch (KeeperException.SessionExpiredException e) {
                    throw e;
                } catch (KeeperException e) {
                    if (retries++ == MAX_RETRIES) {
                        throw e;
                    }
                    // sleep then retry
                    TimeUnit.SECONDS.sleep(RETRY_PERIOD_SECONDS);
                }
            }
        }

        //^^ ResilientActiveKeyValueStore-Write
        public String read(String path, Watcher watcher) throws InterruptedException, KeeperException {
            byte[] data = zk.getData(path, watcher, null/*stat*/);
            return new String(data, CHARSET);
        }
    }

    //=*=*=*=*
    //./ch14/src/main/java/ResilientConfigUpdater.java
    //== ResilientConfigUpdater
    import java.io.IOException;
    import java.util.Random;
    import java.util.concurrent.TimeUnit;

    import org.apache.zookeeper.KeeperException;

    public class ResilientConfigUpdater {

        public static final String PATH = "/config";

        private ResilientActiveKeyValueStore store;
        private Random random = new Random();

        public ResilientConfigUpdater(String hosts) throws IOException, InterruptedException {
            store = new ResilientActiveKeyValueStore();
            store.connect(hosts);
        }

        public void run() throws InterruptedException, KeeperException {
            while (true) {
                String value = random.nextInt(100) + "";
                store.write(PATH, value);
                System.out.printf("Set %s to %s\n", PATH, value);
                TimeUnit.SECONDS.sleep(random.nextInt(10));
            }
        }

        //vv ResilientConfigUpdater
        public static void main(String[] args) throws Exception {
            /*[*/while (true) {
                try {/*]*/
                    ResilientConfigUpdater configUpdater = new ResilientConfigUpdater(args[0]);
                    configUpdater.run();
                    /*[*/} catch (KeeperException.SessionExpiredException e) {
                    // start a new session
                } catch (KeeperException e) {
                    // already retried, so exit
                    e.printStackTrace();
                    break;
                }
            } /*]*/
        }
        //^^ ResilientConfigUpdater
    }

    //=*=*=*=*
    //./ch15/src/main/java/MaxWidgetId.java

    import java.io.IOException;

    import com.cloudera.sqoop.lib.RecordParser.ParseError;

    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.*;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.*;

    public class MaxWidgetId extends Configured implements Tool {

        public static class MaxWidgetMapper extends Mapper<LongWritable, Text, LongWritable, Widget> {

            private Widget maxWidget = null;

            public void map(LongWritable k, Text v, Context context) {
                Widget widget = new Widget();
                try {
                    widget.parse(v); // Auto-generated: parse all fields from text.
                } catch (ParseError pe) {
                    // Got a malformed record. Ignore it.
                    return;
                }

                Integer id = widget.get_id();
                if (null == id) {
                    return;
                } else {
                    if (maxWidget == null || id.intValue() > maxWidget.get_id().intValue()) {
                        maxWidget = widget;
                    }
                }
            }

            public void cleanup(Context context) throws IOException, InterruptedException {
                if (null != maxWidget) {
                    context.write(new LongWritable(0), maxWidget);
                }
            }
        }

        public static class MaxWidgetReducer extends Reducer<LongWritable, Widget, Widget, NullWritable> {

            // There will be a single reduce call with key '0' which gets
            // the max widget from each map task. Pick the max widget from
            // this list.
            public void reduce(LongWritable k, Iterable<Widget> vals, Context context)
                    throws IOException, InterruptedException {
                Widget maxWidget = null;

                for (Widget w : vals) {
                    if (maxWidget == null || w.get_id().intValue() > maxWidget.get_id().intValue()) {
                        try {
                            maxWidget = (Widget) w.clone();
                        } catch (CloneNotSupportedException cnse) {
                            // Shouldn't happen; Sqoop-generated classes support clone().
                            throw new IOException(cnse);
                        }
                    }
                }

                if (null != maxWidget) {
                    context.write(maxWidget, NullWritable.get());
                }
            }
        }

        public int run(String[] args) throws Exception {
            Job job = new Job(getConf());

            job.setJarByClass(MaxWidgetId.class);

            job.setMapperClass(MaxWidgetMapper.class);
            job.setReducerClass(MaxWidgetReducer.class);

            FileInputFormat.addInputPath(job, new Path("widgets"));
            FileOutputFormat.setOutputPath(job, new Path("maxwidget"));

            job.setMapOutputKeyClass(LongWritable.class);
            job.setMapOutputValueClass(Widget.class);

            job.setOutputKeyClass(Widget.class);
            job.setOutputValueClass(NullWritable.class);

            job.setNumReduceTasks(1);

            if (!job.waitForCompletion(true)) {
                return 1; // error.
            }

            return 0;
        }

        public static void main(String[] args) throws Exception {
            int ret = ToolRunner.run(new MaxWidgetId(), args);
            System.exit(ret);
        }
    }

    //=*=*=*=*
    //./ch15/src/main/java/MaxWidgetIdGenericAvro.java
    import java.io.IOException;

    import org.apache.avro.Schema;
    import org.apache.avro.file.DataFileReader;
    import org.apache.avro.file.FileReader;
    import org.apache.avro.generic.GenericData;
    import org.apache.avro.generic.GenericDatumReader;
    import org.apache.avro.generic.GenericRecord;
    import org.apache.avro.mapred.AvroCollector;
    import org.apache.avro.mapred.AvroInputFormat;
    import org.apache.avro.mapred.AvroJob;
    import org.apache.avro.mapred.AvroMapper;
    import org.apache.avro.mapred.AvroOutputFormat;
    import org.apache.avro.mapred.AvroReducer;
    import org.apache.avro.mapred.FsInput;
    import org.apache.avro.mapred.Pair;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IOUtils;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;

    public class MaxWidgetIdGenericAvro extends Configured implements Tool {

        public static class MaxWidgetMapper extends AvroMapper<GenericRecord, Pair<Long, GenericRecord>> {

            private GenericRecord maxWidget;
            private AvroCollector<Pair<Long, GenericRecord>> collector;

            @Override
            public void map(GenericRecord widget, AvroCollector<Pair<Long, GenericRecord>> collector, Reporter reporter)
                    throws IOException {
                this.collector = collector;
                Integer id = (Integer) widget.get("id");
                if (id != null) {
                    if (maxWidget == null || id > (Integer) maxWidget.get("id")) {
                        maxWidget = widget;
                    }
                }
            }

            @Override
            public void close() throws IOException {
                if (maxWidget != null) {
                    collector.collect(new Pair<Long, GenericRecord>(0L, maxWidget));
                }
                super.close();
            }

        }

        static GenericRecord copy(GenericRecord record) {
            Schema schema = record.getSchema();
            GenericRecord copy = new GenericData.Record(schema);
            for (Schema.Field f : schema.getFields()) {
                copy.put(f.name(), record.get(f.name()));
            }
            return copy;
        }

        public static class MaxWidgetReducer extends AvroReducer<Long, GenericRecord, GenericRecord> {

            @Override
            public void reduce(Long key, Iterable<GenericRecord> values, AvroCollector<GenericRecord> collector,
                    Reporter reporter) throws IOException {
                GenericRecord maxWidget = null;

                for (GenericRecord w : values) {
                    if (maxWidget == null || (Integer) w.get("id") > (Integer) maxWidget.get("id")) {
                        maxWidget = copy(w);
                    }
                }

                if (maxWidget != null) {
                    collector.collect(maxWidget);
                }
            }
        }

        public int run(String[] args) throws Exception {
            JobConf conf = new JobConf(getConf(), getClass());
            conf.setJobName("Max temperature");

            Path inputDir = new Path("widgets");
            FileInputFormat.addInputPath(conf, inputDir);
            FileOutputFormat.setOutputPath(conf, new Path("maxwidget"));

            Schema schema = readSchema(inputDir, conf);

            conf.setInputFormat(AvroInputFormat.class);
            conf.setOutputFormat(AvroOutputFormat.class);

            AvroJob.setInputSchema(conf, schema);
            AvroJob.setMapOutputSchema(conf, Pair.getPairSchema(Schema.create(Schema.Type.LONG), schema));
            AvroJob.setOutputSchema(conf, schema);

            AvroJob.setMapperClass(conf, MaxWidgetMapper.class);
            AvroJob.setReducerClass(conf, MaxWidgetReducer.class);

            conf.setNumReduceTasks(1);

            JobClient.runJob(conf);
            return 0;
        }

        /**
         * Read the Avro schema from the first file in the input directory.
         */
        private Schema readSchema(Path inputDir, Configuration conf) throws IOException {
            FsInput fsInput = null;
            FileReader<Object> reader = null;
            try {
                fsInput = new FsInput(new Path(inputDir, "part-m-00000.avro"), conf);
                reader = DataFileReader.openReader(fsInput, new GenericDatumReader<Object>());
                return reader.getSchema();
            } finally {
                IOUtils.closeStream(fsInput);
                IOUtils.closeStream(reader);
            }
        }

        public static void main(String[] args) throws Exception {
            int ret = ToolRunner.run(new MaxWidgetIdGenericAvro(), args);
            System.exit(ret);
        }
    }

    //=*=*=*=*
    //./ch15/src/main/java/Widget.java
    // ORM class for widgets
    // WARNING: This class is AUTO-GENERATED. Modify at your own risk.
    import org.apache.hadoop.io.BytesWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.Writable;
    import org.apache.hadoop.mapred.lib.db.DBWritable;
    import com.cloudera.sqoop.lib.JdbcWritableBridge;
    import com.cloudera.sqoop.lib.DelimiterSet;
    import com.cloudera.sqoop.lib.FieldFormatter;
    import com.cloudera.sqoop.lib.RecordParser;
    import com.cloudera.sqoop.lib.BooleanParser;
    import com.cloudera.sqoop.lib.BlobRef;
    import com.cloudera.sqoop.lib.ClobRef;
    import com.cloudera.sqoop.lib.LargeObjectLoader;
    import com.cloudera.sqoop.lib.SqoopRecord;
    import java.sql.PreparedStatement;
    import java.sql.ResultSet;
    import java.sql.SQLException;
    import java.io.DataInput;
    import java.io.DataOutput;
    import java.io.IOException;
    import java.nio.ByteBuffer;
    import java.nio.CharBuffer;
    import java.sql.Date;
    import java.sql.Time;
    import java.sql.Timestamp;
    import java.util.Arrays;
    import java.util.Iterator;
    import java.util.List;
    import java.util.Map;
    import java.util.TreeMap;

    public class Widget extends SqoopRecord implements DBWritable, Writable {
        private final int PROTOCOL_VERSION = 3;

        public int getClassFormatVersion() {
            return PROTOCOL_VERSION;
        }

        protected ResultSet __cur_result_set;
        private Integer id;

        public Integer get_id() {
            return id;
        }

        public void set_id(Integer id) {
            this.id = id;
        }

        public Widget with_id(Integer id) {
            this.id = id;
            return this;
        }

        private String widget_name;

        public String get_widget_name() {
            return widget_name;
        }

        public void set_widget_name(String widget_name) {
            this.widget_name = widget_name;
        }

        public Widget with_widget_name(String widget_name) {
            this.widget_name = widget_name;
            return this;
        }

        private java.math.BigDecimal price;

        public java.math.BigDecimal get_price() {
            return price;
        }

        public void set_price(java.math.BigDecimal price) {
            this.price = price;
        }

        public Widget with_price(java.math.BigDecimal price) {
            this.price = price;
            return this;
        }

        private java.sql.Date design_date;

        public java.sql.Date get_design_date() {
            return design_date;
        }

        public void set_design_date(java.sql.Date design_date) {
            this.design_date = design_date;
        }

        public Widget with_design_date(java.sql.Date design_date) {
            this.design_date = design_date;
            return this;
        }

        private Integer version;

        public Integer get_version() {
            return version;
        }

        public void set_version(Integer version) {
            this.version = version;
        }

        public Widget with_version(Integer version) {
            this.version = version;
            return this;
        }

        private String design_comment;

        public String get_design_comment() {
            return design_comment;
        }

        public void set_design_comment(String design_comment) {
            this.design_comment = design_comment;
        }

        public Widget with_design_comment(String design_comment) {
            this.design_comment = design_comment;
            return this;
        }

        public boolean equals(Object o) {
            if (this == o) {
                return true;
            }
            if (!(o instanceof Widget)) {
                return false;
            }
            Widget that = (Widget) o;
            boolean equal = true;
            equal = equal && (this.id == null ? that.id == null : this.id.equals(that.id));
            equal = equal && (this.widget_name == null ? that.widget_name == null
                    : this.widget_name.equals(that.widget_name));
            equal = equal && (this.price == null ? that.price == null : this.price.equals(that.price));
            equal = equal && (this.design_date == null ? that.design_date == null
                    : this.design_date.equals(that.design_date));
            equal = equal && (this.version == null ? that.version == null : this.version.equals(that.version));
            equal = equal && (this.design_comment == null ? that.design_comment == null
                    : this.design_comment.equals(that.design_comment));
            return equal;
        }

        public void readFields(ResultSet __dbResults) throws SQLException {
            this.__cur_result_set = __dbResults;
            this.id = JdbcWritableBridge.readInteger(1, __dbResults);
            this.widget_name = JdbcWritableBridge.readString(2, __dbResults);
            this.price = JdbcWritableBridge.readBigDecimal(3, __dbResults);
            this.design_date = JdbcWritableBridge.readDate(4, __dbResults);
            this.version = JdbcWritableBridge.readInteger(5, __dbResults);
            this.design_comment = JdbcWritableBridge.readString(6, __dbResults);
        }

        public void loadLargeObjects(LargeObjectLoader __loader)
                throws SQLException, IOException, InterruptedException {
        }

        public void write(PreparedStatement __dbStmt) throws SQLException {
            write(__dbStmt, 0);
        }

        public int write(PreparedStatement __dbStmt, int __off) throws SQLException {
            JdbcWritableBridge.writeInteger(id, 1 + __off, 4, __dbStmt);
            JdbcWritableBridge.writeString(widget_name, 2 + __off, 12, __dbStmt);
            JdbcWritableBridge.writeBigDecimal(price, 3 + __off, 3, __dbStmt);
            JdbcWritableBridge.writeDate(design_date, 4 + __off, 91, __dbStmt);
            JdbcWritableBridge.writeInteger(version, 5 + __off, 4, __dbStmt);
            JdbcWritableBridge.writeString(design_comment, 6 + __off, 12, __dbStmt);
            return 6;
        }

        public void readFields(DataInput __dataIn) throws IOException {
            if (__dataIn.readBoolean()) {
                this.id = null;
            } else {
                this.id = Integer.valueOf(__dataIn.readInt());
            }
            if (__dataIn.readBoolean()) {
                this.widget_name = null;
            } else {
                this.widget_name = Text.readString(__dataIn);
            }
            if (__dataIn.readBoolean()) {
                this.price = null;
            } else {
                this.price = com.cloudera.sqoop.lib.BigDecimalSerializer.readFields(__dataIn);
            }
            if (__dataIn.readBoolean()) {
                this.design_date = null;
            } else {
                this.design_date = new Date(__dataIn.readLong());
            }
            if (__dataIn.readBoolean()) {
                this.version = null;
            } else {
                this.version = Integer.valueOf(__dataIn.readInt());
            }
            if (__dataIn.readBoolean()) {
                this.design_comment = null;
            } else {
                this.design_comment = Text.readString(__dataIn);
            }
        }

        public void write(DataOutput __dataOut) throws IOException {
            if (null == this.id) {
                __dataOut.writeBoolean(true);
            } else {
                __dataOut.writeBoolean(false);
                __dataOut.writeInt(this.id);
            }
            if (null == this.widget_name) {
                __dataOut.writeBoolean(true);
            } else {
                __dataOut.writeBoolean(false);
                Text.writeString(__dataOut, widget_name);
            }
            if (null == this.price) {
                __dataOut.writeBoolean(true);
            } else {
                __dataOut.writeBoolean(false);
                com.cloudera.sqoop.lib.BigDecimalSerializer.write(this.price, __dataOut);
            }
            if (null == this.design_date) {
                __dataOut.writeBoolean(true);
            } else {
                __dataOut.writeBoolean(false);
                __dataOut.writeLong(this.design_date.getTime());
            }
            if (null == this.version) {
                __dataOut.writeBoolean(true);
            } else {
                __dataOut.writeBoolean(false);
                __dataOut.writeInt(this.version);
            }
            if (null == this.design_comment) {
                __dataOut.writeBoolean(true);
            } else {
                __dataOut.writeBoolean(false);
                Text.writeString(__dataOut, design_comment);
            }
        }

        private final DelimiterSet __outputDelimiters = new DelimiterSet((char) 44, (char) 10, (char) 0, (char) 0,
                false);

        public String toString() {
            return toString(__outputDelimiters, true);
        }

        public String toString(DelimiterSet delimiters) {
            return toString(delimiters, true);
        }

        public String toString(boolean useRecordDelim) {
            return toString(__outputDelimiters, useRecordDelim);
        }

        public String toString(DelimiterSet delimiters, boolean useRecordDelim) {
            StringBuilder __sb = new StringBuilder();
            char fieldDelim = delimiters.getFieldsTerminatedBy();
            __sb.append(FieldFormatter.escapeAndEnclose(id == null ? "null" : "" + id, delimiters));
            __sb.append(fieldDelim);
            __sb.append(FieldFormatter.escapeAndEnclose(widget_name == null ? "null" : widget_name, delimiters));
            __sb.append(fieldDelim);
            __sb.append(FieldFormatter.escapeAndEnclose(price == null ? "null" : "" + price, delimiters));
            __sb.append(fieldDelim);
            __sb.append(FieldFormatter.escapeAndEnclose(design_date == null ? "null" : "" + design_date, delimiters));
            __sb.append(fieldDelim);
            __sb.append(FieldFormatter.escapeAndEnclose(version == null ? "null" : "" + version, delimiters));
            __sb.append(fieldDelim);
            __sb.append(FieldFormatter.escapeAndEnclose(design_comment == null ? "null" : design_comment, delimiters));
            if (useRecordDelim) {
                __sb.append(delimiters.getLinesTerminatedBy());
            }
            return __sb.toString();
        }

        private final DelimiterSet __inputDelimiters = new DelimiterSet((char) 44, (char) 10, (char) 0, (char) 0,
                false);
        private RecordParser __parser;

        public void parse(Text __record) throws RecordParser.ParseError {
            if (null == this.__parser) {
                this.__parser = new RecordParser(__inputDelimiters);
            }
            List<String> __fields = this.__parser.parseRecord(__record);
            __loadFromFields(__fields);
        }

        public void parse(CharSequence __record) throws RecordParser.ParseError {
            if (null == this.__parser) {
                this.__parser = new RecordParser(__inputDelimiters);
            }
            List<String> __fields = this.__parser.parseRecord(__record);
            __loadFromFields(__fields);
        }

        public void parse(byte[] __record) throws RecordParser.ParseError {
            if (null == this.__parser) {
                this.__parser = new RecordParser(__inputDelimiters);
            }
            List<String> __fields = this.__parser.parseRecord(__record);
            __loadFromFields(__fields);
        }

        public void parse(char[] __record) throws RecordParser.ParseError {
            if (null == this.__parser) {
                this.__parser = new RecordParser(__inputDelimiters);
            }
            List<String> __fields = this.__parser.parseRecord(__record);
            __loadFromFields(__fields);
        }

        public void parse(ByteBuffer __record) throws RecordParser.ParseError {
            if (null == this.__parser) {
                this.__parser = new RecordParser(__inputDelimiters);
            }
            List<String> __fields = this.__parser.parseRecord(__record);
            __loadFromFields(__fields);
        }

        public void parse(CharBuffer __record) throws RecordParser.ParseError {
            if (null == this.__parser) {
                this.__parser = new RecordParser(__inputDelimiters);
            }
            List<String> __fields = this.__parser.parseRecord(__record);
            __loadFromFields(__fields);
        }

        private void __loadFromFields(List<String> fields) {
            Iterator<String> __it = fields.listIterator();
            String __cur_str;
            __cur_str = __it.next();
            if (__cur_str.equals("null") || __cur_str.length() == 0) {
                this.id = null;
            } else {
                this.id = Integer.valueOf(__cur_str);
            }

            __cur_str = __it.next();
            if (__cur_str.equals("null")) {
                this.widget_name = null;
            } else {
                this.widget_name = __cur_str;
            }

            __cur_str = __it.next();
            if (__cur_str.equals("null") || __cur_str.length() == 0) {
                this.price = null;
            } else {
                this.price = new java.math.BigDecimal(__cur_str);
            }

            __cur_str = __it.next();
            if (__cur_str.equals("null") || __cur_str.length() == 0) {
                this.design_date = null;
            } else {
                this.design_date = java.sql.Date.valueOf(__cur_str);
            }

            __cur_str = __it.next();
            if (__cur_str.equals("null") || __cur_str.length() == 0) {
                this.version = null;
            } else {
                this.version = Integer.valueOf(__cur_str);
            }

            __cur_str = __it.next();
            if (__cur_str.equals("null")) {
                this.design_comment = null;
            } else {
                this.design_comment = __cur_str;
            }

        }

        public Object clone() throws CloneNotSupportedException {
            Widget o = (Widget) super.clone();
            o.design_date = (o.design_date != null) ? (java.sql.Date) o.design_date.clone() : null;
            return o;
        }

        public Map<String, Object> getFieldMap() {
            Map<String, Object> __sqoop$field_map = new TreeMap<String, Object>();
            __sqoop$field_map.put("id", this.id);
            __sqoop$field_map.put("widget_name", this.widget_name);
            __sqoop$field_map.put("price", this.price);
            __sqoop$field_map.put("design_date", this.design_date);
            __sqoop$field_map.put("version", this.version);
            __sqoop$field_map.put("design_comment", this.design_comment);
            return __sqoop$field_map;
        }

        public void setField(String __fieldName, Object __fieldVal) {
            if ("id".equals(__fieldName)) {
                this.id = (Integer) __fieldVal;
            } else if ("widget_name".equals(__fieldName)) {
                this.widget_name = (String) __fieldVal;
            } else if ("price".equals(__fieldName)) {
                this.price = (java.math.BigDecimal) __fieldVal;
            } else if ("design_date".equals(__fieldName)) {
                this.design_date = (java.sql.Date) __fieldVal;
            } else if ("version".equals(__fieldName)) {
                this.version = (Integer) __fieldVal;
            } else if ("design_comment".equals(__fieldName)) {
                this.design_comment = (String) __fieldVal;
            } else {
                throw new RuntimeException("No such field: " + __fieldName);
            }
        }
    }

    //=*=*=*=*
    //./ch16/src/main/java/fm/last/hadoop/io/records/TrackStats.java
    // File generated by hadoop record compiler. Do not edit.
    package fm.last.hadoop.io.records;

    public class TrackStats extends org.apache.hadoop.record.Record {
        private static final org.apache.hadoop.record.meta.RecordTypeInfo _rio_recTypeInfo;
        private static org.apache.hadoop.record.meta.RecordTypeInfo _rio_rtiFilter;
        private static int[] _rio_rtiFilterFields;
        static {
            _rio_recTypeInfo = new org.apache.hadoop.record.meta.RecordTypeInfo("TrackStats");
            _rio_recTypeInfo.addField("listeners", org.apache.hadoop.record.meta.TypeID.IntTypeID);
            _rio_recTypeInfo.addField("plays", org.apache.hadoop.record.meta.TypeID.IntTypeID);
            _rio_recTypeInfo.addField("scrobbles", org.apache.hadoop.record.meta.TypeID.IntTypeID);
            _rio_recTypeInfo.addField("radioPlays", org.apache.hadoop.record.meta.TypeID.IntTypeID);
            _rio_recTypeInfo.addField("skips", org.apache.hadoop.record.meta.TypeID.IntTypeID);
        }

        private int listeners;
        private int plays;
        private int scrobbles;
        private int radioPlays;
        private int skips;

        public TrackStats() {
        }

        public TrackStats(final int listeners, final int plays, final int scrobbles, final int radioPlays,
                final int skips) {
            this.listeners = listeners;
            this.plays = plays;
            this.scrobbles = scrobbles;
            this.radioPlays = radioPlays;
            this.skips = skips;
        }

        public static org.apache.hadoop.record.meta.RecordTypeInfo getTypeInfo() {
            return _rio_recTypeInfo;
        }

        public static void setTypeFilter(org.apache.hadoop.record.meta.RecordTypeInfo rti) {
            if (null == rti)
                return;
            _rio_rtiFilter = rti;
            _rio_rtiFilterFields = null;
        }

        private static void setupRtiFields() {
            if (null == _rio_rtiFilter)
                return;
            // we may already have done this
            if (null != _rio_rtiFilterFields)
                return;
            int _rio_i, _rio_j;
            _rio_rtiFilterFields = new int[_rio_rtiFilter.getFieldTypeInfos().size()];
            for (_rio_i = 0; _rio_i < _rio_rtiFilterFields.length; _rio_i++) {
                _rio_rtiFilterFields[_rio_i] = 0;
            }
            java.util.Iterator<org.apache.hadoop.record.meta.FieldTypeInfo> _rio_itFilter = _rio_rtiFilter
                    .getFieldTypeInfos().iterator();
            _rio_i = 0;
            while (_rio_itFilter.hasNext()) {
                org.apache.hadoop.record.meta.FieldTypeInfo _rio_tInfoFilter = _rio_itFilter.next();
                java.util.Iterator<org.apache.hadoop.record.meta.FieldTypeInfo> _rio_it = _rio_recTypeInfo
                        .getFieldTypeInfos().iterator();
                _rio_j = 1;
                while (_rio_it.hasNext()) {
                    org.apache.hadoop.record.meta.FieldTypeInfo _rio_tInfo = _rio_it.next();
                    if (_rio_tInfo.equals(_rio_tInfoFilter)) {
                        _rio_rtiFilterFields[_rio_i] = _rio_j;
                        break;
                    }
                    _rio_j++;
                }
                _rio_i++;
            }
        }

        public int getListeners() {
            return listeners;
        }

        public void setListeners(final int listeners) {
            this.listeners = listeners;
        }

        public int getPlays() {
            return plays;
        }

        public void setPlays(final int plays) {
            this.plays = plays;
        }

        public int getScrobbles() {
            return scrobbles;
        }

        public void setScrobbles(final int scrobbles) {
            this.scrobbles = scrobbles;
        }

        public int getRadioPlays() {
            return radioPlays;
        }

        public void setRadioPlays(final int radioPlays) {
            this.radioPlays = radioPlays;
        }

        public int getSkips() {
            return skips;
        }

        public void setSkips(final int skips) {
            this.skips = skips;
        }

        public void serialize(final org.apache.hadoop.record.RecordOutput _rio_a, final String _rio_tag)
                throws java.io.IOException {
            _rio_a.startRecord(this, _rio_tag);
            _rio_a.writeInt(listeners, "listeners");
            _rio_a.writeInt(plays, "plays");
            _rio_a.writeInt(scrobbles, "scrobbles");
            _rio_a.writeInt(radioPlays, "radioPlays");
            _rio_a.writeInt(skips, "skips");
            _rio_a.endRecord(this, _rio_tag);
        }

        private void deserializeWithoutFilter(final org.apache.hadoop.record.RecordInput _rio_a, final String _rio_tag)
                throws java.io.IOException {
            _rio_a.startRecord(_rio_tag);
            listeners = _rio_a.readInt("listeners");
            plays = _rio_a.readInt("plays");
            scrobbles = _rio_a.readInt("scrobbles");
            radioPlays = _rio_a.readInt("radioPlays");
            skips = _rio_a.readInt("skips");
            _rio_a.endRecord(_rio_tag);
        }

        public void deserialize(final org.apache.hadoop.record.RecordInput _rio_a, final String _rio_tag)
                throws java.io.IOException {
            if (null == _rio_rtiFilter) {
                deserializeWithoutFilter(_rio_a, _rio_tag);
                return;
            }
            // if we're here, we need to read based on version info
            _rio_a.startRecord(_rio_tag);
            setupRtiFields();
            for (int _rio_i = 0; _rio_i < _rio_rtiFilter.getFieldTypeInfos().size(); _rio_i++) {
                if (1 == _rio_rtiFilterFields[_rio_i]) {
                    listeners = _rio_a.readInt("listeners");
                } else if (2 == _rio_rtiFilterFields[_rio_i]) {
                    plays = _rio_a.readInt("plays");
                } else if (3 == _rio_rtiFilterFields[_rio_i]) {
                    scrobbles = _rio_a.readInt("scrobbles");
                } else if (4 == _rio_rtiFilterFields[_rio_i]) {
                    radioPlays = _rio_a.readInt("radioPlays");
                } else if (5 == _rio_rtiFilterFields[_rio_i]) {
                    skips = _rio_a.readInt("skips");
                } else {
                    java.util.ArrayList<org.apache.hadoop.record.meta.FieldTypeInfo> typeInfos = (java.util.ArrayList<org.apache.hadoop.record.meta.FieldTypeInfo>) (_rio_rtiFilter
                            .getFieldTypeInfos());
                    org.apache.hadoop.record.meta.Utils.skip(_rio_a, typeInfos.get(_rio_i).getFieldID(),
                            typeInfos.get(_rio_i).getTypeID());
                }
            }
            _rio_a.endRecord(_rio_tag);
        }

        public int compareTo(final Object _rio_peer_) throws ClassCastException {
            if (!(_rio_peer_ instanceof TrackStats)) {
                throw new ClassCastException("Comparing different types of records.");
            }
            TrackStats _rio_peer = (TrackStats) _rio_peer_;
            int _rio_ret = 0;
            _rio_ret = (listeners == _rio_peer.listeners) ? 0 : ((listeners < _rio_peer.listeners) ? -1 : 1);
            if (_rio_ret != 0)
                return _rio_ret;
            _rio_ret = (plays == _rio_peer.plays) ? 0 : ((plays < _rio_peer.plays) ? -1 : 1);
            if (_rio_ret != 0)
                return _rio_ret;
            _rio_ret = (scrobbles == _rio_peer.scrobbles) ? 0 : ((scrobbles < _rio_peer.scrobbles) ? -1 : 1);
            if (_rio_ret != 0)
                return _rio_ret;
            _rio_ret = (radioPlays == _rio_peer.radioPlays) ? 0 : ((radioPlays < _rio_peer.radioPlays) ? -1 : 1);
            if (_rio_ret != 0)
                return _rio_ret;
            _rio_ret = (skips == _rio_peer.skips) ? 0 : ((skips < _rio_peer.skips) ? -1 : 1);
            if (_rio_ret != 0)
                return _rio_ret;
            return _rio_ret;
        }

        public boolean equals(final Object _rio_peer_) {
            if (!(_rio_peer_ instanceof TrackStats)) {
                return false;
            }
            if (_rio_peer_ == this) {
                return true;
            }
            TrackStats _rio_peer = (TrackStats) _rio_peer_;
            boolean _rio_ret = false;
            _rio_ret = (listeners == _rio_peer.listeners);
            if (!_rio_ret)
                return _rio_ret;
            _rio_ret = (plays == _rio_peer.plays);
            if (!_rio_ret)
                return _rio_ret;
            _rio_ret = (scrobbles == _rio_peer.scrobbles);
            if (!_rio_ret)
                return _rio_ret;
            _rio_ret = (radioPlays == _rio_peer.radioPlays);
            if (!_rio_ret)
                return _rio_ret;
            _rio_ret = (skips == _rio_peer.skips);
            if (!_rio_ret)
                return _rio_ret;
            return _rio_ret;
        }

        public Object clone() throws CloneNotSupportedException {
            TrackStats _rio_other = new TrackStats();
            _rio_other.listeners = this.listeners;
            _rio_other.plays = this.plays;
            _rio_other.scrobbles = this.scrobbles;
            _rio_other.radioPlays = this.radioPlays;
            _rio_other.skips = this.skips;
            return _rio_other;
        }

        public int hashCode() {
            int _rio_result = 17;
            int _rio_ret;
            _rio_ret = (int) listeners;
            _rio_result = 37 * _rio_result + _rio_ret;
            _rio_ret = (int) plays;
            _rio_result = 37 * _rio_result + _rio_ret;
            _rio_ret = (int) scrobbles;
            _rio_result = 37 * _rio_result + _rio_ret;
            _rio_ret = (int) radioPlays;
            _rio_result = 37 * _rio_result + _rio_ret;
            _rio_ret = (int) skips;
            _rio_result = 37 * _rio_result + _rio_ret;
            return _rio_result;
        }

        public static String signature() {
            return "LTrackStats(iiiii)";
        }

        public static class Comparator extends org.apache.hadoop.record.RecordComparator {
            public Comparator() {
                super(TrackStats.class);
            }

            static public int slurpRaw(byte[] b, int s, int l) {
                try {
                    int os = s;
                    {
                        int i = org.apache.hadoop.record.Utils.readVInt(b, s);
                        int z = org.apache.hadoop.record.Utils.getVIntSize(i);
                        s += z;
                        l -= z;
                    }
                    {
                        int i = org.apache.hadoop.record.Utils.readVInt(b, s);
                        int z = org.apache.hadoop.record.Utils.getVIntSize(i);
                        s += z;
                        l -= z;
                    }
                    {
                        int i = org.apache.hadoop.record.Utils.readVInt(b, s);
                        int z = org.apache.hadoop.record.Utils.getVIntSize(i);
                        s += z;
                        l -= z;
                    }
                    {
                        int i = org.apache.hadoop.record.Utils.readVInt(b, s);
                        int z = org.apache.hadoop.record.Utils.getVIntSize(i);
                        s += z;
                        l -= z;
                    }
                    {
                        int i = org.apache.hadoop.record.Utils.readVInt(b, s);
                        int z = org.apache.hadoop.record.Utils.getVIntSize(i);
                        s += z;
                        l -= z;
                    }
                    return (os - s);
                } catch (java.io.IOException e) {
                    throw new RuntimeException(e);
                }
            }

            static public int compareRaw(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
                try {
                    int os1 = s1;
                    {
                        int i1 = org.apache.hadoop.record.Utils.readVInt(b1, s1);
                        int i2 = org.apache.hadoop.record.Utils.readVInt(b2, s2);
                        if (i1 != i2) {
                            return ((i1 - i2) < 0) ? -1 : 0;
                        }
                        int z1 = org.apache.hadoop.record.Utils.getVIntSize(i1);
                        int z2 = org.apache.hadoop.record.Utils.getVIntSize(i2);
                        s1 += z1;
                        s2 += z2;
                        l1 -= z1;
                        l2 -= z2;
                    }
                    {
                        int i1 = org.apache.hadoop.record.Utils.readVInt(b1, s1);
                        int i2 = org.apache.hadoop.record.Utils.readVInt(b2, s2);
                        if (i1 != i2) {
                            return ((i1 - i2) < 0) ? -1 : 0;
                        }
                        int z1 = org.apache.hadoop.record.Utils.getVIntSize(i1);
                        int z2 = org.apache.hadoop.record.Utils.getVIntSize(i2);
                        s1 += z1;
                        s2 += z2;
                        l1 -= z1;
                        l2 -= z2;
                    }
                    {
                        int i1 = org.apache.hadoop.record.Utils.readVInt(b1, s1);
                        int i2 = org.apache.hadoop.record.Utils.readVInt(b2, s2);
                        if (i1 != i2) {
                            return ((i1 - i2) < 0) ? -1 : 0;
                        }
                        int z1 = org.apache.hadoop.record.Utils.getVIntSize(i1);
                        int z2 = org.apache.hadoop.record.Utils.getVIntSize(i2);
                        s1 += z1;
                        s2 += z2;
                        l1 -= z1;
                        l2 -= z2;
                    }
                    {
                        int i1 = org.apache.hadoop.record.Utils.readVInt(b1, s1);
                        int i2 = org.apache.hadoop.record.Utils.readVInt(b2, s2);
                        if (i1 != i2) {
                            return ((i1 - i2) < 0) ? -1 : 0;
                        }
                        int z1 = org.apache.hadoop.record.Utils.getVIntSize(i1);
                        int z2 = org.apache.hadoop.record.Utils.getVIntSize(i2);
                        s1 += z1;
                        s2 += z2;
                        l1 -= z1;
                        l2 -= z2;
                    }
                    {
                        int i1 = org.apache.hadoop.record.Utils.readVInt(b1, s1);
                        int i2 = org.apache.hadoop.record.Utils.readVInt(b2, s2);
                        if (i1 != i2) {
                            return ((i1 - i2) < 0) ? -1 : 0;
                        }
                        int z1 = org.apache.hadoop.record.Utils.getVIntSize(i1);
                        int z2 = org.apache.hadoop.record.Utils.getVIntSize(i2);
                        s1 += z1;
                        s2 += z2;
                        l1 -= z1;
                        l2 -= z2;
                    }
                    return (os1 - s1);
                } catch (java.io.IOException e) {
                    throw new RuntimeException(e);
                }
            }

            public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
                int ret = compareRaw(b1, s1, l1, b2, s2, l2);
                return (ret == -1) ? -1 : ((ret == 0) ? 1 : 0);
            }
        }

        static {
            org.apache.hadoop.record.RecordComparator.define(TrackStats.class, new Comparator());
        }
    }

    //=*=*=*=*
    //./ch16/src/main/java/fm/last/hadoop/programs/labs/trackstats/TrackStatisticsProgram.java
    /*
     * Copyright 2008 Last.fm.
     *
     *  Licensed under the Apache License, Version 2.0 (the "License");
     *  you may not use this file except in compliance with the License.
     *  You may obtain a copy of the License at
     *
     *      http://www.apache.org/licenses/LICENSE-2.0
     *
     *  Unless required by applicable law or agreed to in writing, software
     *  distributed under the License is distributed on an "AS IS" BASIS,
     *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     *  See the License for the specific language governing permissions and
     *  limitations under the License.
     */
    package fm.last.hadoop.programs.labs.trackstats;

    import java.io.IOException;
    import java.util.ArrayList;
    import java.util.HashSet;
    import java.util.Iterator;
    import java.util.List;
    import java.util.Set;

    import org.apache.commons.logging.Log;
    import org.apache.commons.logging.LogFactory;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.mapred.SequenceFileInputFormat;
    import org.apache.hadoop.mapred.SequenceFileOutputFormat;
    import org.apache.hadoop.mapred.TextInputFormat;
    import org.apache.hadoop.mapred.TextOutputFormat;
    import org.apache.hadoop.mapred.jobcontrol.Job;
    import org.apache.hadoop.mapred.jobcontrol.JobControl;
    import org.apache.hadoop.mapred.lib.IdentityMapper;
    import org.apache.hadoop.mapred.lib.MultipleInputs;

    import fm.last.hadoop.io.records.TrackStats;

    /**
     * Program that calculates various track-related statistics from raw listening data.
     */
    public class TrackStatisticsProgram {

        public static final Log log = LogFactory.getLog(TrackStatisticsProgram.class);

        // values below indicate position in raw data for each value

        private static final int COL_USERID = 0;
        private static final int COL_TRACKID = 1;
        private static final int COL_SCROBBLES = 2;
        private static final int COL_RADIO = 3;
        private static final int COL_SKIP = 4;

        private Configuration conf;

        /**
         * Constructs a new TrackStatisticsProgram, using a default Configuration.
         */
        public TrackStatisticsProgram() {
            this.conf = new Configuration();
        }

        /**
         * Enumeration for Hadoop error counters.
         */
        private enum COUNTER_KEYS {
            INVALID_LINES, NOT_LISTEN
        };

        /**
         * Mapper that takes in raw listening data and outputs the number of unique listeners per track.
         */
        public static class UniqueListenersMapper extends MapReduceBase
                implements Mapper<LongWritable, Text, IntWritable, IntWritable> {

            public void map(LongWritable position, Text rawLine, OutputCollector<IntWritable, IntWritable> output,
                    Reporter reporter) throws IOException {

                String line = (rawLine).toString();
                if (line.trim().isEmpty()) { // if the line is empty, report error and ignore
                    reporter.incrCounter(COUNTER_KEYS.INVALID_LINES, 1);
                    return;
                }

                String[] parts = line.split(" "); // raw data is whitespace delimited
                try {
                    int scrobbles = Integer.parseInt(parts[TrackStatisticsProgram.COL_SCROBBLES]);
                    int radioListens = Integer.parseInt(parts[TrackStatisticsProgram.COL_RADIO]);
                    if (scrobbles <= 0 && radioListens <= 0) {
                        // if track somehow is marked with zero plays, report error and ignore
                        reporter.incrCounter(COUNTER_KEYS.NOT_LISTEN, 1);
                        return;
                    }
                    // if we get to here then user has listened to track, so output user id against track id
                    IntWritable trackId = new IntWritable(Integer.parseInt(parts[TrackStatisticsProgram.COL_TRACKID]));
                    IntWritable userId = new IntWritable(Integer.parseInt(parts[TrackStatisticsProgram.COL_USERID]));
                    output.collect(trackId, userId);
                } catch (NumberFormatException e) {
                    reporter.incrCounter(COUNTER_KEYS.INVALID_LINES, 1);
                    reporter.setStatus("Invalid line in listening data: " + rawLine);
                    return;
                }
            }
        }

        /**
         * Combiner that improves efficiency by removing duplicate user ids from mapper output.
         */
        public static class UniqueListenersCombiner extends MapReduceBase
                implements Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {

            public void reduce(IntWritable trackId, Iterator<IntWritable> values,
                    OutputCollector<IntWritable, IntWritable> output, Reporter reporter) throws IOException {

                Set<IntWritable> userIds = new HashSet<IntWritable>();
                while (values.hasNext()) {
                    IntWritable userId = values.next();
                    if (!userIds.contains(userId)) {
                        // if this user hasn't already been marked as listening to the track, add them to set and output them
                        userIds.add(new IntWritable(userId.get()));
                        output.collect(trackId, userId);
                    }
                }
            }
        }

        /**
         * Reducer that outputs only unique listener ids per track (i.e. it removes any duplicated). Final output is number of
         * unique listeners per track.
         */
        public static class UniqueListenersReducer extends MapReduceBase
                implements Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {

            public void reduce(IntWritable trackId, Iterator<IntWritable> values,
                    OutputCollector<IntWritable, IntWritable> output, Reporter reporter) throws IOException {

                Set<Integer> userIds = new HashSet<Integer>();
                // add all userIds to the set, duplicates automatically removed (set contract)
                while (values.hasNext()) {
                    IntWritable userId = values.next();
                    userIds.add(Integer.valueOf(userId.get()));
                }
                // output trackId -> number of unique listeners per track
                output.collect(trackId, new IntWritable(userIds.size()));
            }

        }

        /**
         * Mapper that summarizes various statistics per track. Input is raw listening data, output is a partially filled in
         * TrackStatistics object per track id.
         */
        public static class SumMapper extends MapReduceBase
                implements Mapper<LongWritable, Text, IntWritable, TrackStats> {

            public void map(LongWritable position, Text rawLine, OutputCollector<IntWritable, TrackStats> output,
                    Reporter reporter) throws IOException {

                String line = (rawLine).toString();
                if (line.trim().isEmpty()) { // ignore empty lines
                    reporter.incrCounter(COUNTER_KEYS.INVALID_LINES, 1);
                    return;
                }

                String[] parts = line.split(" ");
                try {
                    int trackId = Integer.parseInt(parts[TrackStatisticsProgram.COL_TRACKID]);
                    int scrobbles = Integer.parseInt(parts[TrackStatisticsProgram.COL_SCROBBLES]);
                    int radio = Integer.parseInt(parts[TrackStatisticsProgram.COL_RADIO]);
                    int skip = Integer.parseInt(parts[TrackStatisticsProgram.COL_SKIP]);
                    // set number of listeners to 0 (this is calculated later) and other values as provided in text file
                    TrackStats trackstat = new TrackStats(0, scrobbles + radio, scrobbles, radio, skip);
                    output.collect(new IntWritable(trackId), trackstat);
                } catch (NumberFormatException e) {
                    reporter.incrCounter(COUNTER_KEYS.INVALID_LINES, 1);
                    log.warn("Invalid line in listening data: " + rawLine);
                }
            }
        }

        /**
         * Sum up the track statistics per track. Output is a TrackStatistics object per track id.
         */
        public static class SumReducer extends MapReduceBase
                implements Reducer<IntWritable, TrackStats, IntWritable, TrackStats> {

            @Override
            public void reduce(IntWritable trackId, Iterator<TrackStats> values,
                    OutputCollector<IntWritable, TrackStats> output, Reporter reporter) throws IOException {

                TrackStats sum = new TrackStats(); // holds the totals for this track
                while (values.hasNext()) {
                    TrackStats trackStats = (TrackStats) values.next();
                    sum.setListeners(sum.getListeners() + trackStats.getListeners());
                    sum.setPlays(sum.getPlays() + trackStats.getPlays());
                    sum.setSkips(sum.getSkips() + trackStats.getSkips());
                    sum.setScrobbles(sum.getScrobbles() + trackStats.getScrobbles());
                    sum.setRadioPlays(sum.getRadioPlays() + trackStats.getRadioPlays());
                }
                output.collect(trackId, sum);
            }
        }

        /**
         * Mapper that takes the number of listeners for a track and converts this to a TrackStats object which is output
         * against each track id.
         */
        public static class MergeListenersMapper extends MapReduceBase
                implements Mapper<IntWritable, IntWritable, IntWritable, TrackStats> {

            public void map(IntWritable trackId, IntWritable uniqueListenerCount,
                    OutputCollector<IntWritable, TrackStats> output, Reporter reporter) throws IOException {

                TrackStats trackStats = new TrackStats();
                trackStats.setListeners(uniqueListenerCount.get());
                output.collect(trackId, trackStats);
            }
        }

        /**
         * Create a JobConf for a Job that will calculate the number of unique listeners per track.
         * 
         * @param inputDir The path to the folder containing the raw listening data files.
         * @return The unique listeners JobConf.
         */
        private JobConf getUniqueListenersJobConf(Path inputDir) {
            log.info("Creating configuration for unique listeners Job");

            // output results to a temporary intermediate folder, this will get deleted by start() method
            Path uniqueListenersOutput = new Path("uniqueListeners");

            JobConf conf = new JobConf(TrackStatisticsProgram.class);
            conf.setOutputKeyClass(IntWritable.class); // track id
            conf.setOutputValueClass(IntWritable.class); // number of unique listeners
            conf.setInputFormat(TextInputFormat.class); // raw listening data
            conf.setOutputFormat(SequenceFileOutputFormat.class);
            conf.setMapperClass(UniqueListenersMapper.class);
            conf.setCombinerClass(UniqueListenersCombiner.class);
            conf.setReducerClass(UniqueListenersReducer.class);

            FileInputFormat.addInputPath(conf, inputDir);
            FileOutputFormat.setOutputPath(conf, uniqueListenersOutput);
            conf.setJobName("uniqueListeners");
            return conf;
        }

        /**
         * Creates a JobConf for a Job that will sum up the TrackStatistics per track.
         * 
         * @param inputDir The path to the folder containing the raw input data files.
         * @return The sum JobConf.
         */
        private JobConf getSumJobConf(Path inputDir) {
            log.info("Creating configuration for sum job");
            // output results to a temporary intermediate folder, this will get deleted by start() method
            Path playsOutput = new Path("sum");

            JobConf conf = new JobConf(TrackStatisticsProgram.class);
            conf.setOutputKeyClass(IntWritable.class); // track id
            conf.setOutputValueClass(TrackStats.class); // statistics for a track
            conf.setInputFormat(TextInputFormat.class); // raw listening data
            conf.setOutputFormat(SequenceFileOutputFormat.class);
            conf.setMapperClass(SumMapper.class);
            conf.setCombinerClass(SumReducer.class);
            conf.setReducerClass(SumReducer.class);

            FileInputFormat.addInputPath(conf, inputDir);
            FileOutputFormat.setOutputPath(conf, playsOutput);
            conf.setJobName("sum");
            return conf;
        }

        /**
         * Creates a JobConf for a Job that will merge the unique listeners and track statistics.
         * 
         * @param outputPath The path for the results to be output to.
         * @param sumInputDir The path containing the data from the sum Job.
         * @param listenersInputDir The path containing the data from the unique listeners job.
         * @return The merge JobConf.
         */
        private JobConf getMergeConf(Path outputPath, Path sumInputDir, Path listenersInputDir) {
            log.info("Creating configuration for merge job");
            JobConf conf = new JobConf(TrackStatisticsProgram.class);
            conf.setOutputKeyClass(IntWritable.class); // track id
            conf.setOutputValueClass(TrackStats.class); // overall track statistics
            conf.setCombinerClass(SumReducer.class); // safe to re-use reducer as a combiner here
            conf.setReducerClass(SumReducer.class);
            conf.setOutputFormat(TextOutputFormat.class);

            FileOutputFormat.setOutputPath(conf, outputPath);

            MultipleInputs.addInputPath(conf, sumInputDir, SequenceFileInputFormat.class, IdentityMapper.class);
            MultipleInputs.addInputPath(conf, listenersInputDir, SequenceFileInputFormat.class,
                    MergeListenersMapper.class);
            conf.setJobName("merge");
            return conf;
        }

        /**
         * Start the program.
         * 
         * @param inputDir The path to the folder containing the raw listening data files.
         * @param outputPath The path for the results to be output to.
         * @throws IOException If an error occurs retrieving data from the file system or an error occurs running the job.
         */
        public void start(Path inputDir, Path outputDir) throws IOException {
            FileSystem fs = FileSystem.get(this.conf);

            JobConf uniqueListenersConf = getUniqueListenersJobConf(inputDir);
            Path listenersOutputDir = FileOutputFormat.getOutputPath(uniqueListenersConf);
            Job listenersJob = new Job(uniqueListenersConf);
            // delete any output that might exist from a previous run of this job
            if (fs.exists(FileOutputFormat.getOutputPath(uniqueListenersConf))) {
                fs.delete(FileOutputFormat.getOutputPath(uniqueListenersConf), true);
            }

            JobConf sumConf = getSumJobConf(inputDir);
            Path sumOutputDir = FileOutputFormat.getOutputPath(sumConf);
            Job sumJob = new Job(sumConf);
            // delete any output that might exist from a previous run of this job
            if (fs.exists(FileOutputFormat.getOutputPath(sumConf))) {
                fs.delete(FileOutputFormat.getOutputPath(sumConf), true);
            }

            // the merge job depends on the other two jobs
            ArrayList<Job> mergeDependencies = new ArrayList<Job>();
            mergeDependencies.add(listenersJob);
            mergeDependencies.add(sumJob);
            JobConf mergeConf = getMergeConf(outputDir, sumOutputDir, listenersOutputDir);
            Job mergeJob = new Job(mergeConf, mergeDependencies);
            // delete any output that might exist from a previous run of this job
            if (fs.exists(FileOutputFormat.getOutputPath(mergeConf))) {
                fs.delete(FileOutputFormat.getOutputPath(mergeConf), true);
            }

            // store the output paths of the intermediate jobs so this can be cleaned up after a successful run
            List<Path> deletePaths = new ArrayList<Path>();
            deletePaths.add(FileOutputFormat.getOutputPath(uniqueListenersConf));
            deletePaths.add(FileOutputFormat.getOutputPath(sumConf));

            JobControl control = new JobControl("TrackStatisticsProgram");
            control.addJob(listenersJob);
            control.addJob(sumJob);
            control.addJob(mergeJob);

            // execute the jobs
            try {
                Thread jobControlThread = new Thread(control, "jobcontrol");
                jobControlThread.start();
                while (!control.allFinished()) {
                    Thread.sleep(1000);
                }
                if (control.getFailedJobs().size() > 0) {
                    throw new IOException("One or more jobs failed");
                }
            } catch (InterruptedException e) {
                throw new IOException("Interrupted while waiting for job control to finish", e);
            }

            // remove intermediate output paths
            for (Path deletePath : deletePaths) {
                fs.delete(deletePath, true);
            }
        }

        /**
         * Set the Configuration used by this Program.
         * 
         * @param conf The new Configuration to use by this program.
         */
        public void setConf(Configuration conf) {
            this.conf = conf; // this will usually only be set by unit test.
        }

        /**
         * Gets the Configuration used by this program.
         * 
         * @return This program's Configuration.
         */
        public Configuration getConf() {
            return conf;
        }

        /**
         * Main method used to run the TrackStatisticsProgram from the command line. This takes two parameters - first the
         * path to the folder containing the raw input data; and second the path for the data to be output to.
         * 
         * @param args Command line arguments.
         * @throws IOException If an error occurs running the program.
         */
        public static void main(String[] args) throws Exception {
            if (args.length < 2) {
                log.info("Args: <input directory> <output directory>");
                return;
            }

            Path inputPath = new Path(args[0]);
            Path outputDir = new Path(args[1]);
            log.info("Running on input directories: " + inputPath);
            TrackStatisticsProgram listeners = new TrackStatisticsProgram();
            listeners.start(inputPath, outputDir);
        }

    }

    //=*=*=*=*
    //./common/src/main/java/JobBuilder.java
    // == JobBuilder
    import java.io.IOException;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
    import org.apache.hadoop.util.Tool;

    public class JobBuilder {

        private final Class<?> driverClass;
        private final Job job;
        private final int extraArgCount;
        private final String extrArgsUsage;

        private String[] extraArgs;

        public JobBuilder(Class<?> driverClass) throws IOException {
            this(driverClass, 0, "");
        }

        public JobBuilder(Class<?> driverClass, int extraArgCount, String extrArgsUsage) throws IOException {
            this.driverClass = driverClass;
            this.extraArgCount = extraArgCount;
            this.job = new Job();
            this.job.setJarByClass(driverClass);
            this.extrArgsUsage = extrArgsUsage;
        }

        // vv JobBuilder
        public static Job parseInputAndOutput(Tool tool, Configuration conf, String[] args) throws IOException {

            if (args.length != 2) {
                printUsage(tool, "<input> <output>");
                return null;
            }
            Job job = new Job(conf);
            job.setJarByClass(tool.getClass());
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            return job;
        }

        public static void printUsage(Tool tool, String extraArgsUsage) {
            System.err.printf("Usage: %s [genericOptions] %s\n\n", tool.getClass().getSimpleName(), extraArgsUsage);
            GenericOptionsParser.printGenericCommandUsage(System.err);
        }
        // ^^ JobBuilder

        public JobBuilder withCommandLineArgs(String... args) throws IOException {
            Configuration conf = job.getConfiguration();
            GenericOptionsParser parser = new GenericOptionsParser(conf, args);
            String[] otherArgs = parser.getRemainingArgs();
            if (otherArgs.length < 2 && otherArgs.length > 3 + extraArgCount) {
                System.err.printf("Usage: %s [genericOptions] [-overwrite] <input path> <output path> %s\n\n",
                        driverClass.getSimpleName(), extrArgsUsage);
                GenericOptionsParser.printGenericCommandUsage(System.err);
                System.exit(-1);
            }
            int index = 0;
            boolean overwrite = false;
            if (otherArgs[index].equals("-overwrite")) {
                overwrite = true;
                index++;
            }
            Path input = new Path(otherArgs[index++]);
            Path output = new Path(otherArgs[index++]);

            if (index < otherArgs.length) {
                extraArgs = new String[otherArgs.length - index];
                System.arraycopy(otherArgs, index, extraArgs, 0, otherArgs.length - index);
            }

            if (overwrite) {
                output.getFileSystem(conf).delete(output, true);
            }

            FileInputFormat.addInputPath(job, input);
            FileOutputFormat.setOutputPath(job, output);
            return this;
        }

        public Job build() {
            return job;
        }

        public String[] getExtraArgs() {
            return extraArgs;
        }
    }

    //=*=*=*=*
    //./common/src/main/java/MetOfficeRecordParser.java
    import java.math.*;
    import org.apache.hadoop.io.Text;

    public class MetOfficeRecordParser {

        private String year;
        private String airTemperatureString;
        private int airTemperature;
        private boolean airTemperatureValid;

        public void parse(String record) {
            if (record.length() < 18) {
                return;
            }
            year = record.substring(3, 7);
            if (isValidRecord(year)) {
                airTemperatureString = record.substring(13, 18);
                if (!airTemperatureString.trim().equals("---")) {
                    BigDecimal temp = new BigDecimal(airTemperatureString.trim());
                    temp = temp.multiply(new BigDecimal(BigInteger.TEN));
                    airTemperature = temp.intValueExact();
                    airTemperatureValid = true;
                }
            }
        }

        private boolean isValidRecord(String year) {
            try {
                Integer.parseInt(year);
                return true;
            } catch (NumberFormatException e) {
                return false;
            }
        }

        public void parse(Text record) {
            parse(record.toString());
        }

        public String getYear() {
            return year;
        }

        public int getAirTemperature() {
            return airTemperature;
        }

        public String getAirTemperatureString() {
            return airTemperatureString;
        }

        public boolean isValidTemperature() {
            return airTemperatureValid;
        }

    }

    //=*=*=*=*
    //./common/src/main/java/NcdcRecordParser.java
    import java.text.*;
    import java.util.Date;

    import org.apache.hadoop.io.Text;

    public class NcdcRecordParser {

        private static final int MISSING_TEMPERATURE = 9999;

        private static final DateFormat DATE_FORMAT = new SimpleDateFormat("yyyyMMddHHmm");

        private String stationId;
        private String observationDateString;
        private String year;
        private String airTemperatureString;
        private int airTemperature;
        private boolean airTemperatureMalformed;
        private String quality;

        public void parse(String record) {
            stationId = record.substring(4, 10) + "-" + record.substring(10, 15);
            observationDateString = record.substring(15, 27);
            year = record.substring(15, 19);
            airTemperatureMalformed = false;
            // Remove leading plus sign as parseInt doesn't like them
            if (record.charAt(87) == '+') {
                airTemperatureString = record.substring(88, 92);
                airTemperature = Integer.parseInt(airTemperatureString);
            } else if (record.charAt(87) == '-') {
                airTemperatureString = record.substring(87, 92);
                airTemperature = Integer.parseInt(airTemperatureString);
            } else {
                airTemperatureMalformed = true;
            }
            airTemperature = Integer.parseInt(airTemperatureString);
            quality = record.substring(92, 93);
        }

        public void parse(Text record) {
            parse(record.toString());
        }

        public boolean isValidTemperature() {
            return !airTemperatureMalformed && airTemperature != MISSING_TEMPERATURE && quality.matches("[01459]");
        }

        public boolean isMalformedTemperature() {
            return airTemperatureMalformed;
        }

        public boolean isMissingTemperature() {
            return airTemperature == MISSING_TEMPERATURE;
        }

        public String getStationId() {
            return stationId;
        }

        public Date getObservationDate() {
            try {
                System.out.println(observationDateString);
                return DATE_FORMAT.parse(observationDateString);
            } catch (ParseException e) {
                throw new IllegalArgumentException(e);
            }
        }

        public String getYear() {
            return year;
        }

        public int getYearInt() {
            return Integer.parseInt(year);
        }

        public int getAirTemperature() {
            return airTemperature;
        }

        public String getAirTemperatureString() {
            return airTemperatureString;
        }

        public String getQuality() {
            return quality;
        }

    }

    //=*=*=*=*
    //./common/src/main/java/NcdcStationMetadata.java
    import java.io.*;
    import java.util.*;
    import org.apache.hadoop.io.IOUtils;

    public class NcdcStationMetadata {

        private Map<String, String> stationIdToName = new HashMap<String, String>();

        public void initialize(File file) throws IOException {
            BufferedReader in = null;
            try {
                in = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
                NcdcStationMetadataParser parser = new NcdcStationMetadataParser();
                String line;
                while ((line = in.readLine()) != null) {
                    if (parser.parse(line)) {
                        stationIdToName.put(parser.getStationId(), parser.getStationName());
                    }
                }
            } finally {
                IOUtils.closeStream(in);
            }
        }

        public String getStationName(String stationId) {
            String stationName = stationIdToName.get(stationId);
            if (stationName == null || stationName.trim().length() == 0) {
                return stationId; // no match: fall back to ID
            }
            return stationName;
        }

        public Map<String, String> getStationIdToNameMap() {
            return Collections.unmodifiableMap(stationIdToName);
        }

    }

    //=*=*=*=*
    //./common/src/main/java/NcdcStationMetadataParser.java
    import org.apache.hadoop.io.Text;

    public class NcdcStationMetadataParser {

        private String stationId;
        private String stationName;

        public boolean parse(String record) {
            if (record.length() < 42) { // header
                return false;
            }
            String usaf = record.substring(0, 6);
            String wban = record.substring(7, 12);
            stationId = usaf + "-" + wban;
            stationName = record.substring(13, 42);
            try {
                Integer.parseInt(usaf); // USAF identifiers are numeric
                return true;
            } catch (NumberFormatException e) {
                return false;
            }
        }

        public boolean parse(Text record) {
            return parse(record.toString());
        }

        public String getStationId() {
            return stationId;
        }

        public String getStationName() {
            return stationName;
        }

    }

    //=*=*=*=*
    //./common/src/main/java/oldapi/JobBuilder.java
    package oldapi;

    import java.io.IOException;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;

    public class JobBuilder {

        private final Class<?> driverClass;
        private final JobConf conf;
        private final int extraArgCount;
        private final String extrArgsUsage;

        private String[] extraArgs;

        public JobBuilder(Class<?> driverClass) {
            this(driverClass, 0, "");
        }

        public JobBuilder(Class<?> driverClass, int extraArgCount, String extrArgsUsage) {
            this.driverClass = driverClass;
            this.extraArgCount = extraArgCount;
            this.conf = new JobConf(driverClass);
            this.extrArgsUsage = extrArgsUsage;
        }

        public static JobConf parseInputAndOutput(Tool tool, Configuration conf, String[] args) {

            if (args.length != 2) {
                printUsage(tool, "<input> <output>");
                return null;
            }
            JobConf jobConf = new JobConf(conf, tool.getClass());
            FileInputFormat.addInputPath(jobConf, new Path(args[0]));
            FileOutputFormat.setOutputPath(jobConf, new Path(args[1]));
            return jobConf;
        }

        public static void printUsage(Tool tool, String extraArgsUsage) {
            System.err.printf("Usage: %s [genericOptions] %s\n\n", tool.getClass().getSimpleName(), extraArgsUsage);
            GenericOptionsParser.printGenericCommandUsage(System.err);
        }

        public JobBuilder withCommandLineArgs(String... args) throws IOException {
            GenericOptionsParser parser = new GenericOptionsParser(conf, args);
            String[] otherArgs = parser.getRemainingArgs();
            if (otherArgs.length < 2 && otherArgs.length > 3 + extraArgCount) {
                System.err.printf("Usage: %s [genericOptions] [-overwrite] <input path> <output path> %s\n\n",
                        driverClass.getSimpleName(), extrArgsUsage);
                GenericOptionsParser.printGenericCommandUsage(System.err);
                System.exit(-1);
            }
            int index = 0;
            boolean overwrite = false;
            if (otherArgs[index].equals("-overwrite")) {
                overwrite = true;
                index++;
            }
            Path input = new Path(otherArgs[index++]);
            Path output = new Path(otherArgs[index++]);

            if (index < otherArgs.length) {
                extraArgs = new String[otherArgs.length - index];
                System.arraycopy(otherArgs, index, extraArgs, 0, otherArgs.length - index);
            }

            if (overwrite) {
                output.getFileSystem(conf).delete(output, true);
            }

            FileInputFormat.addInputPath(conf, input);
            FileOutputFormat.setOutputPath(conf, output);
            return this;
        }

        public JobConf build() {
            return conf;
        }

        public String[] getExtraArgs() {
            return extraArgs;
        }
    }

    //=*=*=*=*
    //./common/src/main/java/oldapi/MetOfficeRecordParser.java
    package oldapi;

    import java.math.*;
    import org.apache.hadoop.io.Text;

    public class MetOfficeRecordParser {

        private String year;
        private String airTemperatureString;
        private int airTemperature;
        private boolean airTemperatureValid;

        public void parse(String record) {
            if (record.length() < 18) {
                return;
            }
            year = record.substring(3, 7);
            if (isValidRecord(year)) {
                airTemperatureString = record.substring(13, 18);
                if (!airTemperatureString.trim().equals("---")) {
                    BigDecimal temp = new BigDecimal(airTemperatureString.trim());
                    temp = temp.multiply(new BigDecimal(BigInteger.TEN));
                    airTemperature = temp.intValueExact();
                    airTemperatureValid = true;
                }
            }
        }

        private boolean isValidRecord(String year) {
            try {
                Integer.parseInt(year);
                return true;
            } catch (NumberFormatException e) {
                return false;
            }
        }

        public void parse(Text record) {
            parse(record.toString());
        }

        public String getYear() {
            return year;
        }

        public int getAirTemperature() {
            return airTemperature;
        }

        public String getAirTemperatureString() {
            return airTemperatureString;
        }

        public boolean isValidTemperature() {
            return airTemperatureValid;
        }

    }

    //=*=*=*=*
    //./common/src/main/java/oldapi/NcdcRecordParser.java
    package oldapi;

    import java.text.*;
    import java.util.Date;

    import org.apache.hadoop.io.Text;

    public class NcdcRecordParser {

        private static final int MISSING_TEMPERATURE = 9999;

        private static final DateFormat DATE_FORMAT = new SimpleDateFormat("yyyyMMddHHmm");

        private String stationId;
        private String observationDateString;
        private String year;
        private String airTemperatureString;
        private int airTemperature;
        private boolean airTemperatureMalformed;
        private String quality;

        public void parse(String record) {
            stationId = record.substring(4, 10) + "-" + record.substring(10, 15);
            observationDateString = record.substring(15, 27);
            year = record.substring(15, 19);
            airTemperatureMalformed = false;
            // Remove leading plus sign as parseInt doesn't like them
            if (record.charAt(87) == '+') {
                airTemperatureString = record.substring(88, 92);
                airTemperature = Integer.parseInt(airTemperatureString);
            } else if (record.charAt(87) == '-') {
                airTemperatureString = record.substring(87, 92);
                airTemperature = Integer.parseInt(airTemperatureString);
            } else {
                airTemperatureMalformed = true;
            }
            airTemperature = Integer.parseInt(airTemperatureString);
            quality = record.substring(92, 93);
        }

        public void parse(Text record) {
            parse(record.toString());
        }

        public boolean isValidTemperature() {
            return !airTemperatureMalformed && airTemperature != MISSING_TEMPERATURE && quality.matches("[01459]");
        }

        public boolean isMalformedTemperature() {
            return airTemperatureMalformed;
        }

        public boolean isMissingTemperature() {
            return airTemperature == MISSING_TEMPERATURE;
        }

        public String getStationId() {
            return stationId;
        }

        public Date getObservationDate() {
            try {
                System.out.println(observationDateString);
                return DATE_FORMAT.parse(observationDateString);
            } catch (ParseException e) {
                throw new IllegalArgumentException(e);
            }
        }

        public String getYear() {
            return year;
        }

        public int getYearInt() {
            return Integer.parseInt(year);
        }

        public int getAirTemperature() {
            return airTemperature;
        }

        public String getAirTemperatureString() {
            return airTemperatureString;
        }

        public String getQuality() {
            return quality;
        }

    }

    //=*=*=*=*
    //./common/src/main/java/oldapi/NcdcStationMetadata.java
    package oldapi;

    import java.io.*;
    import java.util.*;
    import org.apache.hadoop.io.IOUtils;

    public class NcdcStationMetadata {

        private Map<String, String> stationIdToName = new HashMap<String, String>();

        public void initialize(File file) throws IOException {
            BufferedReader in = null;
            try {
                in = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
                NcdcStationMetadataParser parser = new NcdcStationMetadataParser();
                String line;
                while ((line = in.readLine()) != null) {
                    if (parser.parse(line)) {
                        stationIdToName.put(parser.getStationId(), parser.getStationName());
                    }
                }
            } finally {
                IOUtils.closeStream(in);
            }
        }

        public String getStationName(String stationId) {
            String stationName = stationIdToName.get(stationId);
            if (stationName == null || stationName.trim().length() == 0) {
                return stationId; // no match: fall back to ID
            }
            return stationName;
        }

        public Map<String, String> getStationIdToNameMap() {
            return Collections.unmodifiableMap(stationIdToName);
        }

    }

    //=*=*=*=*
    //./common/src/main/java/oldapi/NcdcStationMetadataParser.java
    package oldapi;

    import org.apache.hadoop.io.Text;

    public class NcdcStationMetadataParser {

        private String stationId;
        private String stationName;

        public boolean parse(String record) {
            if (record.length() < 42) { // header
                return false;
            }
            String usaf = record.substring(0, 6);
            String wban = record.substring(7, 12);
            stationId = usaf + "-" + wban;
            stationName = record.substring(13, 42);
            try {
                Integer.parseInt(usaf); // USAF identifiers are numeric
                return true;
            } catch (NumberFormatException e) {
                return false;
            }
        }

        public boolean parse(Text record) {
            return parse(record.toString());
        }

        public String getStationId() {
            return stationId;
        }

        public String getStationName() {
            return stationName;
        }

    }

    //=*=*=*=*
    //./common/src/test/java/MetOfficeRecordParserTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;
    import org.junit.*;

    public class MetOfficeRecordParserTest {

        private MetOfficeRecordParser parser;

        @Before
        public void setUp() {
            parser = new MetOfficeRecordParser();
        }

        @Test
        public void parsesValidRecord() {
            parser.parse("   1978   1    7.5     2.0       6   134.1    64.7");
            assertThat(parser.getYear(), is("1978"));
            assertThat(parser.getAirTemperature(), is(75));
            assertThat(parser.getAirTemperatureString(), is("  7.5"));
            assertThat(parser.isValidTemperature(), is(true));
        }

        @Test
        public void parsesNegativeTemperature() {
            parser.parse("   1978   1  -17.5     2.0       6   134.1    64.7");
            assertThat(parser.getYear(), is("1978"));
            assertThat(parser.getAirTemperature(), is(-175));
            assertThat(parser.getAirTemperatureString(), is("-17.5"));
            assertThat(parser.isValidTemperature(), is(true));
        }

        @Test
        public void parsesMissingTemperature() {
            parser.parse("   1853   1    ---     ---     ---    57.3     ---");
            assertThat(parser.getAirTemperatureString(), is("  ---"));
            assertThat(parser.isValidTemperature(), is(false));
        }

        @Test
        public void parsesHeaderLine() {
            parser.parse("Cardiff Bute Park");
            assertThat(parser.isValidTemperature(), is(false));
        }

        @Test(expected = NumberFormatException.class)
        public void cannotParseMalformedTemperature() {
            parser.parse("   1978   1    X.5     2.0       6   134.1    64.7");
        }
    }

    //=*=*=*=*
    //./common/src/test/java/NcdcRecordParserTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;
    import org.junit.*;

    public class NcdcRecordParserTest {

        private NcdcRecordParser parser;

        @Before
        public void setUp() {
            parser = new NcdcRecordParser();
        }

        @Test
        public void parsesValidRecord() {
            parser.parse(
                    "0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999");
            assertThat(parser.getStationId(), is("011990-99999"));
            assertThat(parser.getYear(), is("1950"));
            assertThat(parser.getAirTemperature(), is(22));
            assertThat(parser.getAirTemperatureString(), is("0022"));
            assertThat(parser.isValidTemperature(), is(true));
            assertThat(parser.getQuality(), is("1"));
        }

        @Test
        public void parsesMissingTemperature() {
            parser.parse(
                    "0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+99991+99999999999");
            assertThat(parser.getAirTemperature(), is(9999));
            assertThat(parser.getAirTemperatureString(), is("9999"));
            assertThat(parser.isValidTemperature(), is(false));
        }

        @Test(expected = NumberFormatException.class)
        public void cannotParseMalformedTemperature() {
            parser.parse(
                    "0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+XXXX1+99999999999");
        }
    }

    //=*=*=*=*
    //./common/src/test/java/NcdcStationMetadataParserTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;
    import org.junit.*;

    public class NcdcStationMetadataParserTest {

        private NcdcStationMetadataParser parser;

        @Before
        public void setUp() {
            parser = new NcdcStationMetadataParser();
        }

        @Test
        public void parsesValidRecord() {
            assertThat(parser.parse("715390 99999 MOOSE JAW CS                  CN CA SA CZMJ  +50317 -105550 +05770"),
                    is(true));
            assertThat(parser.getStationId(), is("715390-99999"));
            assertThat(parser.getStationName().trim(), is("MOOSE JAW CS"));
        }

        @Test
        public void parsesHeader() {
            assertThat(parser.parse("Integrated Surface Database Station History, November 2007"), is(false));
        }

        public void parsesBlankLine() {
            assertThat(parser.parse(""), is(false));
        }

    }

    //=*=*=*=*
    //./experimental/src/test/java/FileInputFormatTest.java
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;

    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.InputSplit;
    import org.apache.hadoop.mapred.JobConf;
    import org.junit.*;

    public class FileInputFormatTest {
        private static final String BASE_PATH = "/Users/tom/workspace/htdg/input/fileinput";

        @Test(expected = IOException.class)
        @Ignore("See HADOOP-5588")
        public void directoryWithSubdirectory() throws Exception {
            JobConf conf = new JobConf();

            Path path = new Path(BASE_PATH, "dir");
            FileInputFormat.addInputPath(conf, path);

            conf.getInputFormat().getSplits(conf, 1);
        }

        @Test
        @Ignore("See HADOOP-5588")
        public void directoryWithSubdirectoryUsingGlob() throws Exception {
            JobConf conf = new JobConf();

            Path path = new Path(BASE_PATH, "dir/a*");
            FileInputFormat.addInputPath(conf, path);

            InputSplit[] splits = conf.getInputFormat().getSplits(conf, 1);
            assertThat(splits.length, is(1));
        }

        @Test
        public void inputPathProperty() throws Exception {
            JobConf conf = new JobConf();
            FileInputFormat.setInputPaths(conf, new Path("/{a,b}"), new Path("/{c,d}"));
            assertThat(conf.get("mapred.input.dir"), is("file:/{a\\,b},file:/{c\\,d}"));
        }

    }

    //=*=*=*=*
    //./experimental/src/test/java/SplitTest.java
    import static org.hamcrest.CoreMatchers.instanceOf;
    import static org.hamcrest.CoreMatchers.is;
    import static org.junit.Assert.assertThat;

    import java.io.IOException;
    import java.io.OutputStream;
    import java.io.OutputStreamWriter;
    import java.io.Writer;
    import java.util.Arrays;
    import java.util.Random;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.hdfs.MiniDFSCluster;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.CompressionCodecFactory;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileSplit;
    import org.apache.hadoop.mapred.InputFormat;
    import org.apache.hadoop.mapred.InputSplit;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.RecordReader;
    import org.apache.hadoop.mapred.Reporter;
    import org.junit.*;

    /**
     * Create a file with 4k blocksize, 3 blocks, lines 1023 bytes + 1 byte nl
     *   Make each line begin with its line number 01 to 12
     * Then expect to get 3 splits (one per block), and 4 records per split
     *   * each split corresponds exactly to one block
     * If lines are 1024 * 1.5 bytes long (in nl), then what do we get for each record?
     *   * Do we lose records?
     *   * If not then in general need to get end from another block?
     *   
     * How does compression fit in!?
     */
    /*
    
    Line
    0   0
    1   1024
    2   2048
    3   
    
     */
    public class SplitTest {

        private static final Random r = new Random();

        private static final String[] lines1 = new String[120];
        static {
            for (int i = 0; i < lines1.length; i++) {
                char[] c = new char[1023];
                c[0] = Integer.toHexString(i % 16).charAt(0);
                for (int j = 1; j < c.length; j++) {
                    c[j] = (char) (r.nextInt(26) + (int) 'a');
                }
                lines1[i] = new String(c);
            }
        }

        private static final String[] lines2 = new String[12];
        static {
            for (int i = 0; i < lines2.length; i++) {
                char[] c = new char[1023 + 512];
                c[0] = Integer.toHexString(i % 16).charAt(0);
                for (int j = 1; j < c.length; j++) {
                    c[j] = (char) (r.nextInt(26) + (int) 'a');
                }
                lines2[i] = new String(c);
            }
        }

        private static MiniDFSCluster cluster; // use an in-process HDFS cluster for testing
        private static FileSystem fs;

        @BeforeClass
        public static void setUp() throws IOException {
            Configuration conf = new Configuration();
            if (System.getProperty("test.build.data") == null) {
                System.setProperty("test.build.data", "/tmp");
            }
            cluster = new MiniDFSCluster(conf, 1, true, null);
            fs = cluster.getFileSystem();
        }

        @AfterClass
        public static void tearDown() throws IOException {
            fs.close();
            cluster.shutdown();
        }

        @Test
        @Ignore("Needs more investigation")
        public void recordsCoincideWithBlocks() throws IOException {
            int recordLength = 1024;
            Path input = new Path("input");
            createFile(input, 12, recordLength);

            JobConf job = new JobConf();
            job.set("fs.default.name", fs.getUri().toString());
            FileInputFormat.addInputPath(job, input);
            InputFormat<LongWritable, Text> inputFormat = job.getInputFormat();
            InputSplit[] splits = inputFormat.getSplits(job, job.getNumMapTasks());

            assertThat(splits.length, is(3));
            checkSplit(splits[0], 0, 4096);
            checkSplit(splits[1], 4096, 4096);
            checkSplit(splits[2], 8192, 4096);

            checkRecordReader(inputFormat, splits[0], job, recordLength, 0, 4);
            checkRecordReader(inputFormat, splits[1], job, recordLength, 4, 8);
            checkRecordReader(inputFormat, splits[2], job, recordLength, 8, 12);
        }

        @Test
        public void recordsDontCoincideWithBlocks() throws IOException {
            int recordLength = 1024 + 512;
            Path input = new Path("input");
            createFile(input, 8, recordLength);

            JobConf job = new JobConf();
            job.set("fs.default.name", fs.getUri().toString());
            FileInputFormat.addInputPath(job, input);
            InputFormat<LongWritable, Text> inputFormat = job.getInputFormat();
            InputSplit[] splits = inputFormat.getSplits(job, job.getNumMapTasks());

            System.out.println(Arrays.asList(splits));
            checkSplit(splits[0], 0, 4096);
            checkSplit(splits[1], 4096, 4096);
            checkSplit(splits[2], 8192, 4096);

            checkRecordReader(inputFormat, splits[0], job, recordLength, 0, 3);
            checkRecordReader(inputFormat, splits[1], job, recordLength, 3, 6);
            checkRecordReader(inputFormat, splits[2], job, recordLength, 6, 8);

        }

        @Test
        @Ignore("Needs more investigation")
        public void compression() throws IOException {
            int recordLength = 1024;
            Path input = new Path("input.bz2");
            createFile(input, 24, recordLength);
            System.out.println(">>>>>>" + fs.getLength(input));

            JobConf job = new JobConf();
            job.set("fs.default.name", fs.getUri().toString());
            FileInputFormat.addInputPath(job, input);
            InputFormat<LongWritable, Text> inputFormat = job.getInputFormat();
            InputSplit[] splits = inputFormat.getSplits(job, job.getNumMapTasks());

            System.out.println(Arrays.asList(splits));
            assertThat(splits.length, is(2));
            checkSplit(splits[0], 0, 4096);
            checkSplit(splits[1], 4096, 4096);

            checkRecordReader(inputFormat, splits[0], job, recordLength, 0, 4);
            checkRecordReader(inputFormat, splits[1], job, recordLength, 5, 12);

        }

        private void checkSplit(InputSplit split, long start, long length) {
            assertThat(split, instanceOf(FileSplit.class));
            FileSplit fileSplit = (FileSplit) split;
            assertThat(fileSplit.getStart(), is(start));
            assertThat(fileSplit.getLength(), is(length));
        }

        private void checkRecord(int record, RecordReader<LongWritable, Text> recordReader, long expectedKey,
                String expectedValue) throws IOException {
            LongWritable key = new LongWritable();
            Text value = new Text();
            assertThat(recordReader.next(key, value), is(true));
            assertThat("Record " + record, value.toString(), is(expectedValue));
            assertThat("Record " + record, key.get(), is(expectedKey));
        }

        private void checkRecordReader(InputFormat<LongWritable, Text> inputFormat, InputSplit split, JobConf job,
                long recordLength, int startLine, int endLine) throws IOException {
            RecordReader<LongWritable, Text> recordReader = inputFormat.getRecordReader(split, job, Reporter.NULL);
            for (int i = startLine; i < endLine; i++) {
                checkRecord(i, recordReader, i * recordLength, line(i, recordLength));
            }
            assertThat(recordReader.next(new LongWritable(), new Text()), is(false));
        }

        private void createFile(Path input, int records, int recordLength) throws IOException {
            long fileSize = 4096;
            OutputStream out = fs.create(input, true, 4096, (short) 1, fileSize);
            CompressionCodecFactory codecFactory = new CompressionCodecFactory(new Configuration());
            CompressionCodec codec = codecFactory.getCodec(input);
            if (codec != null) {
                out = codec.createOutputStream(out);
            }
            Writer writer = new OutputStreamWriter(out);
            try {
                for (int n = 0; n < records; n++) {
                    writer.write(line(n, recordLength));
                    writer.write("\n");
                }
            } finally {
                writer.close();
            }
        }

        private String line(int i, long recordLength) {
            return recordLength == 1024 ? lines1[i] : lines2[i];
        }

    }

    //=*=*=*=*
    //./experimental/src/test/java/crunch/CogroupCrunchTest.java
    package crunch;
    import static com.cloudera.crunch.type.writable.Writables.strings;
    import static com.cloudera.crunch.type.writable.Writables.tableOf;

    import java.io.IOException;
    import java.io.Serializable;
    import java.util.Collection;
    import java.util.Iterator;

    import org.junit.Test;

    import com.cloudera.crunch.DoFn;
    import com.cloudera.crunch.Emitter;
    import com.cloudera.crunch.PCollection;
    import com.cloudera.crunch.PTable;
    import com.cloudera.crunch.Pair;
    import com.cloudera.crunch.Pipeline;
    import com.cloudera.crunch.impl.mr.MRPipeline;
    import com.cloudera.crunch.lib.Cogroup;
    import com.cloudera.crunch.lib.Join;
    import com.google.common.base.Splitter;

    public class CogroupCrunchTest implements Serializable {

        @Test
        public void test() throws IOException {
            Pipeline pipeline = new MRPipeline(CogroupCrunchTest.class);
            PCollection<String> a = pipeline.readTextFile("join/A");
            PCollection<String> b = pipeline.readTextFile("join/B");

            PTable<String, String> aTable = a.parallelDo(new DoFn<String, Pair<String, String>>() {
                @Override
                public void process(String input, Emitter<Pair<String, String>> emitter) {
                    Iterator<String> split = Splitter.on('\t').split(input).iterator();
                    emitter.emit(Pair.of(split.next(), split.next()));
                }
            }, tableOf(strings(), strings()));

            PTable<String, String> bTable = b.parallelDo(new DoFn<String, Pair<String, String>>() {
                @Override
                public void process(String input, Emitter<Pair<String, String>> emitter) {
                    Iterator<String> split = Splitter.on('\t').split(input).iterator();
                    String l = split.next();
                    String r = split.next();
                    emitter.emit(Pair.of(r, l));
                }
            }, tableOf(strings(), strings()));

            PTable<String, Pair<Collection<String>, Collection<String>>> cogroup = Cogroup.cogroup(aTable, bTable);

            pipeline.writeTextFile(cogroup, "output-cogrouped");
            pipeline.run();
        }

    }

    //=*=*=*=*
    //./experimental/src/test/java/crunch/JoinCrunchTest.java
    package crunch;
    import static com.cloudera.crunch.type.writable.Writables.strings;
    import static com.cloudera.crunch.type.writable.Writables.tableOf;

    import java.io.IOException;
    import java.io.Serializable;
    import java.util.Iterator;

    import org.junit.Test;

    import com.cloudera.crunch.DoFn;
    import com.cloudera.crunch.Emitter;
    import com.cloudera.crunch.PCollection;
    import com.cloudera.crunch.PTable;
    import com.cloudera.crunch.Pair;
    import com.cloudera.crunch.Pipeline;
    import com.cloudera.crunch.impl.mr.MRPipeline;
    import com.cloudera.crunch.lib.Join;
    import com.google.common.base.Splitter;

    public class JoinCrunchTest implements Serializable {

        @Test
        public void test() throws IOException {
            Pipeline pipeline = new MRPipeline(JoinCrunchTest.class);
            PCollection<String> a = pipeline.readTextFile("join/A");
            PCollection<String> b = pipeline.readTextFile("join/B");

            PTable<String, String> aTable = a.parallelDo(new DoFn<String, Pair<String, String>>() {
                @Override
                public void process(String input, Emitter<Pair<String, String>> emitter) {
                    Iterator<String> split = Splitter.on('\t').split(input).iterator();
                    emitter.emit(Pair.of(split.next(), split.next()));
                }
            }, tableOf(strings(), strings()));

            PTable<String, String> bTable = b.parallelDo(new DoFn<String, Pair<String, String>>() {
                @Override
                public void process(String input, Emitter<Pair<String, String>> emitter) {
                    Iterator<String> split = Splitter.on('\t').split(input).iterator();
                    String l = split.next();
                    String r = split.next();
                    emitter.emit(Pair.of(r, l));
                }
            }, tableOf(strings(), strings()));

            PTable<String, Pair<String, String>> join = Join.join(aTable, bTable);

            pipeline.writeTextFile(join, "output-joined");
            pipeline.run();
        }

    }

    //=*=*=*=*
    //./experimental/src/test/java/crunch/MaxTemperatureCrunchTest.java
    package crunch;
    import static com.cloudera.crunch.type.writable.Writables.ints;
    import static com.cloudera.crunch.type.writable.Writables.strings;
    import static com.cloudera.crunch.type.writable.Writables.tableOf;

    import java.io.IOException;

    import org.junit.Test;

    import com.cloudera.crunch.CombineFn;
    import com.cloudera.crunch.DoFn;
    import com.cloudera.crunch.Emitter;
    import com.cloudera.crunch.PCollection;
    import com.cloudera.crunch.PTable;
    import com.cloudera.crunch.Pair;
    import com.cloudera.crunch.Pipeline;
    import com.cloudera.crunch.impl.mr.MRPipeline;

    public class MaxTemperatureCrunchTest {

        private static final int MISSING = 9999;

        @Test
        public void test() throws IOException {
            Pipeline pipeline = new MRPipeline(MaxTemperatureCrunchTest.class);
            PCollection<String> records = pipeline.readTextFile("input");

            PTable<String, Integer> maxTemps = records.parallelDo(toYearTempPairsFn(), tableOf(strings(), ints()))
                    .groupByKey().combineValues(CombineFn.<String>MAX_INTS());

            pipeline.writeTextFile(maxTemps, "output");
            pipeline.run();
        }

        private static DoFn<String, Pair<String, Integer>> toYearTempPairsFn() {
            return new DoFn<String, Pair<String, Integer>>() {
                @Override
                public void process(String input, Emitter<Pair<String, Integer>> emitter) {
                    String line = input.toString();
                    String year = line.substring(15, 19);
                    int airTemperature;
                    if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
                        airTemperature = Integer.parseInt(line.substring(88, 92));
                    } else {
                        airTemperature = Integer.parseInt(line.substring(87, 92));
                    }
                    String quality = line.substring(92, 93);
                    if (airTemperature != MISSING && quality.matches("[01459]")) {
                        emitter.emit(Pair.of(year, airTemperature));
                    }
                }
            };
        }

    }

    //=*=*=*=*
    //./experimental/src/test/java/crunch/SortCrunchTest.java
    package crunch;
    import static com.cloudera.crunch.lib.Sort.ColumnOrder.by;
    import static com.cloudera.crunch.lib.Sort.Order.ASCENDING;
    import static com.cloudera.crunch.lib.Sort.Order.DESCENDING;
    import static com.cloudera.crunch.type.writable.Writables.ints;
    import static com.cloudera.crunch.type.writable.Writables.pairs;

    import java.io.IOException;
    import java.io.Serializable;
    import java.util.Iterator;

    import org.junit.Test;

    import com.cloudera.crunch.DoFn;
    import com.cloudera.crunch.Emitter;
    import com.cloudera.crunch.PCollection;
    import com.cloudera.crunch.Pair;
    import com.cloudera.crunch.Pipeline;
    import com.cloudera.crunch.impl.mr.MRPipeline;
    import com.cloudera.crunch.lib.Sort;
    import com.google.common.base.Splitter;

    public class SortCrunchTest implements Serializable {

        @Test
        public void test() throws IOException {
            Pipeline pipeline = new MRPipeline(SortCrunchTest.class);
            PCollection<String> records = pipeline.readTextFile("sort/A");

            PCollection<Pair<Integer, Integer>> pairs = records.parallelDo(new DoFn<String, Pair<Integer, Integer>>() {
                @Override
                public void process(String input, Emitter<Pair<Integer, Integer>> emitter) {
                    Iterator<String> split = Splitter.on('\t').split(input).iterator();
                    String l = split.next();
                    String r = split.next();
                    emitter.emit(Pair.of(Integer.parseInt(l), Integer.parseInt(r)));
                }
            }, pairs(ints(), ints()));

            PCollection<Pair<Integer, Integer>> sorted = Sort.sortPairs(pairs, by(1, ASCENDING), by(2, DESCENDING));

            pipeline.writeTextFile(sorted, "output-sorted");
            pipeline.run();
        }

    }

    //=*=*=*=*
    //./experimental/src/test/java/crunch/ToYearTempPairsFn.java
    package crunch;

    import com.cloudera.crunch.DoFn;
    import com.cloudera.crunch.Emitter;
    import com.cloudera.crunch.Pair;

    public class ToYearTempPairsFn extends DoFn<String, Pair<String, Integer>> {

        private static final int MISSING = 9999;

        @Override
        public void process(String input, Emitter<Pair<String, Integer>> emitter) {
            String line = input.toString();
            String year = line.substring(15, 19);
            int airTemperature;
            if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
                airTemperature = Integer.parseInt(line.substring(88, 92));
            } else {
                airTemperature = Integer.parseInt(line.substring(87, 92));
            }
            String quality = line.substring(92, 93);
            if (airTemperature != MISSING && quality.matches("[01459]")) {
                emitter.emit(Pair.of(year, airTemperature));
            }
        }

    }

    //=*=*=*=*
    //./snippet/src/test/java/ExamplesIT.java
    import static org.junit.Assert.assertEquals;
    import static org.junit.Assert.assertNotNull;
    import static org.junit.Assert.fail;
    import static org.junit.Assume.assumeTrue;

    import com.google.common.base.Splitter;
    import com.google.common.collect.Lists;
    import com.google.common.io.Files;
    import com.google.common.io.InputSupplier;

    import java.io.ByteArrayOutputStream;
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import java.util.ArrayList;
    import java.util.Collection;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    import java.util.zip.GZIPInputStream;

    import junitx.framework.FileAssert;

    import org.apache.commons.exec.CommandLine;
    import org.apache.commons.exec.DefaultExecutor;
    import org.apache.commons.exec.ExecuteException;
    import org.apache.commons.exec.PumpStreamHandler;
    import org.apache.commons.exec.environment.EnvironmentUtils;
    import org.apache.commons.io.FileUtils;
    import org.apache.commons.io.filefilter.HiddenFileFilter;
    import org.apache.commons.io.filefilter.IOFileFilter;
    import org.apache.commons.io.filefilter.NotFileFilter;
    import org.apache.commons.io.filefilter.OrFileFilter;
    import org.apache.commons.io.filefilter.PrefixFileFilter;
    import org.junit.Before;
    import org.junit.BeforeClass;
    import org.junit.Test;
    import org.junit.runner.RunWith;
    import org.junit.runners.Parameterized;
    import org.junit.runners.Parameterized.Parameters;

    /**
     * This test runs the examples and checks that they produce the expected output.
     * It takes each input.txt file and runs it as a script, then tests that the
     * output produced is the same as all the files in output.
     */
    @RunWith(Parameterized.class)
    public class ExamplesIT {

        private static final File PROJECT_BASE_DIR = new File(
                System.getProperty("hadoop.book.basedir", "/Users/tom/book-workspace/hadoop-book"));

        private static final String MODE_PROPERTY = "example.mode";
        private static final String MODE_DEFAULT = "local";

        private static final String EXAMPLE_CHAPTERS_PROPERTY = "example.chapters";
        private static final String EXAMPLE_CHAPTERS_DEFAULT = "ch02,ch04,ch04-avro,ch05,ch07,ch08";

        private static final IOFileFilter HIDDEN_FILE_FILTER = new OrFileFilter(HiddenFileFilter.HIDDEN,
                new PrefixFileFilter("_"));
        private static final IOFileFilter NOT_HIDDEN_FILE_FILTER = new NotFileFilter(HIDDEN_FILE_FILTER);

        @Parameters
        public static Collection<Object[]> data() {
            Collection<Object[]> data = new ArrayList<Object[]>();
            String exampleDirs = System.getProperty(EXAMPLE_CHAPTERS_PROPERTY, EXAMPLE_CHAPTERS_DEFAULT);
            int i = 0;
            for (String dirName : Splitter.on(',').split(exampleDirs)) {
                File dir = new File(new File(PROJECT_BASE_DIR, dirName), "src/main/examples");
                if (!dir.exists()) {
                    fail(dir + " does not exist");
                }
                for (File file : dir.listFiles()) {
                    if (file.isDirectory()) {
                        data.add(new Object[] { file });
                        // so we can see which test corresponds to which file
                        System.out.printf("%s: %s\n", i++, file);
                    }
                }
            }
            return data;
        }

        private File example; // parameter
        private File actualOutputDir = new File(PROJECT_BASE_DIR, "output");
        private static Map<String, String> env;
        private static String version;
        private static String mode;

        public ExamplesIT(File example) {
            this.example = example;
        }

        @SuppressWarnings("unchecked")
        @BeforeClass
        public static void setUpClass() throws IOException {
            mode = System.getProperty(MODE_PROPERTY, MODE_DEFAULT);
            System.out.printf("mode=%s\n", mode);

            String hadoopHome = System.getenv("HADOOP_HOME");
            assertNotNull("Export the HADOOP_HOME environment variable " + "to run the snippet tests", hadoopHome);
            env = new HashMap<String, String>(EnvironmentUtils.getProcEnvironment());
            env.put("HADOOP_HOME", hadoopHome);
            env.put("PATH", env.get("HADOOP_HOME") + "/bin" + ":" + env.get("PATH"));
            env.put("HADOOP_CONF_DIR", "snippet/conf/" + mode);
            env.put("HADOOP_USER_CLASSPATH_FIRST", "true");
            env.put("HADOOP_CLASSPATH", "hadoop-examples.jar:avro-examples.jar");

            System.out.printf("HADOOP_HOME=%s\n", hadoopHome);

            String versionOut = execute(hadoopHome + "/bin/hadoop version");
            for (String line : Splitter.on("\n").split(versionOut)) {
                Matcher matcher = Pattern.compile("^Hadoop (.+)+$").matcher(line);
                if (matcher.matches()) {
                    version = matcher.group(1);
                }
            }
            assertNotNull("Version not found", version);
            System.out.printf("version=%s\n", version);

        }

        @Before
        public void setUp() throws IOException {
            assumeTrue(!example.getPath().endsWith(".ignore"));
            execute(new File("src/test/resources/setup.sh").getAbsolutePath());
        }

        @Test
        public void test() throws Exception {
            System.out.println("Running " + example);

            File exampleDir = findBaseExampleDirectory(example);
            File inputFile = new File(exampleDir, "input.txt");
            System.out.println("Running input " + inputFile);

            String systemOut = execute(inputFile.getAbsolutePath());
            System.out.println(systemOut);

            execute(new File("src/test/resources/copyoutput.sh").getAbsolutePath());

            File expectedOutputDir = new File(exampleDir, "output");
            if (!expectedOutputDir.exists()) {
                FileUtils.copyDirectory(actualOutputDir, expectedOutputDir);
                fail(expectedOutputDir + " does not exist - creating.");
            }

            List<File> expectedParts = Lists.newArrayList(
                    FileUtils.listFiles(expectedOutputDir, NOT_HIDDEN_FILE_FILTER, NOT_HIDDEN_FILE_FILTER));
            List<File> actualParts = Lists
                    .newArrayList(FileUtils.listFiles(actualOutputDir, NOT_HIDDEN_FILE_FILTER, NOT_HIDDEN_FILE_FILTER));
            assertEquals("Number of parts (got " + actualParts + ")", expectedParts.size(), actualParts.size());

            for (int i = 0; i < expectedParts.size(); i++) {
                File expectedFile = expectedParts.get(i);
                File actualFile = actualParts.get(i);
                if (expectedFile.getPath().endsWith(".gz")) {
                    File expectedDecompressed = decompress(expectedFile);
                    File actualDecompressed = decompress(actualFile);
                    FileAssert.assertEquals(expectedFile.toString(), expectedDecompressed, actualDecompressed);
                } else if (expectedFile.getPath().endsWith(".avro")) {
                    // Avro files have a random sync marker
                    // so just check lengths for the moment
                    assertEquals("Avro file length", expectedFile.length(), actualFile.length());
                } else {
                    FileAssert.assertEquals(expectedFile.toString(), expectedFile, actualFile);
                }
            }
            System.out.println("Completed " + example);
        }

        private File findBaseExampleDirectory(File example) {
            // Look in base/<version>/<mode> then base/<version> then base/<mode>
            File[] candidates = { new File(new File(example, version), mode), new File(example, version),
                    new File(example, mode), };
            for (File candidate : candidates) {
                if (candidate.exists()) {
                    File inputFile = new File(candidate, "input.txt");
                    // if no input file then skip test
                    assumeTrue(inputFile.exists());
                    return candidate;
                }
            }
            return example;
        }

        private static String execute(String commandLine) throws ExecuteException, IOException {
            ByteArrayOutputStream stdout = new ByteArrayOutputStream();
            PumpStreamHandler psh = new PumpStreamHandler(stdout);
            CommandLine cl = CommandLine.parse("/bin/bash " + commandLine);
            DefaultExecutor exec = new DefaultExecutor();
            exec.setWorkingDirectory(PROJECT_BASE_DIR);
            exec.setStreamHandler(psh);
            try {
                exec.execute(cl, env);
            } catch (ExecuteException e) {
                System.out.println(stdout.toString());
                throw e;
            } catch (IOException e) {
                System.out.println(stdout.toString());
                throw e;
            }
            return stdout.toString();
        }

        private File decompress(File file) throws IOException {
            File decompressed = File.createTempFile(getClass().getSimpleName(), ".txt");
            decompressed.deleteOnExit();
            final GZIPInputStream in = new GZIPInputStream(new FileInputStream(file));
            try {
                Files.copy(new InputSupplier<InputStream>() {
                    public InputStream getInput() throws IOException {
                        return in;
                    }
                }, decompressed);
            } finally {
                in.close();
            }
            return decompressed;
        }

    }

    //=*=*=*=*