Untitled

File Formats:
  Input Formats
    TextInputFormat-  File of values only where hadoop will generate keys which we are not interested on
    KeyValueTextInputFormat- File of keys & values where the default seperator is "\t" or TAB
                           - we can chage the separtor by adding below conf:
                              conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",")
    SequenceFileInputFormat- Compressed format which can be used to inforation with less Disk required
                           - It is useful when output of one job is input to another job since it requires less
                             disk writing and reading which speeds up the job
    NLineInputForamt- We can specify the size of split using NLineInputFormat, we actually say each split it this no of lines
                    - We determine the size of split using below conf:
                        job.getConfiguration().setInt("mapreduce.input.lineinputformat.linespermap", 1000);

  Output Formats
    TextOutputFormat- Default output format of hadoop which produces a key, value pair on each line separated by TAB
    SequenceFileOutputFormat - Equivalent to SequenceFileOutputFormat

By default size of each split is 128 MB

To override parameter at runtime:
  hadoop jar FileName.jar ClassName -D
  mapreduce.input.keyvaluelinerecordreader.key.value.separator=%
  input output

  // add below line as well in production
  job.submit();
  return 0;

  // in standalone
  return job.waitForCompletion(true) ? 0:1