Guest User

Untitled

a guest
Mar 22nd, 2018
67
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.52 KB | None | 0 0
  1. File Formats:
  2. Input Formats
  3. TextInputFormat- File of values only where hadoop will generate keys which we are not interested on
  4. KeyValueTextInputFormat- File of keys & values where the default seperator is "\t" or TAB
  5. - we can chage the separtor by adding below conf:
  6. conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", ",")
  7. SequenceFileInputFormat- Compressed format which can be used to inforation with less Disk required
  8. - It is useful when output of one job is input to another job since it requires less
  9. disk writing and reading which speeds up the job
  10. NLineInputForamt- We can specify the size of split using NLineInputFormat, we actually say each split it this no of lines
  11. - We determine the size of split using below conf:
  12. job.getConfiguration().setInt("mapreduce.input.lineinputformat.linespermap", 1000);
  13.  
  14. Output Formats
  15. TextOutputFormat- Default output format of hadoop which produces a key, value pair on each line separated by TAB
  16. SequenceFileOutputFormat - Equivalent to SequenceFileOutputFormat
  17.  
  18. By default size of each split is 128 MB
  19.  
  20. To override parameter at runtime:
  21. hadoop jar FileName.jar ClassName -D
  22. mapreduce.input.keyvaluelinerecordreader.key.value.separator=%
  23. input output
  24.  
  25. // add below line as well in production
  26. job.submit();
  27. return 0;
  28.  
  29. // in standalone
  30. return job.waitForCompletion(true) ? 0:1
Add Comment
Please, Sign In to add comment