Advertisement
Guest User

Untitled

a guest
Feb 23rd, 2019
186
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.36 KB | None | 0 0
  1. beeline -u jdbc:hive2://node1:10000 --showDbInPrompt
  2.  
  3. Create a database with your name:
  4.  
  5. jdbc:hive2://node1:10000 (default)> CREATE DATABASE your_name;
  6. jdbc:hive2://node1:10000 (default)> USER your_name;
  7. jdbc:hive2://node1:10000 (your_name)>
  8.  
  9. SELECT current_database()
  10.  
  11. CSV file encoding
  12.  
  13. The file we put on HDFS is in CSV format. Create an external table pointing to the directory on HDFS to run SQL queries.
  14.  
  15. CREATE EXTERNAL TABLE your_name.ext_stop_times (
  16. trip_id STRING,
  17. arrival_time STRING,
  18. departure_time STRING,
  19. stop_id BIGINT,
  20. stop_sequence SMALLINT
  21. )
  22. ROW FORMAT DELIMITED
  23. FIELDS TERMINATED BY ','
  24. STORED AS TEXTFILE
  25. LOCATION '/user/your_name/stm_gtfs/stop_times';
  26.  
  27. 0: jdbc:hive2://node1:10000 (folakemi)> SHOW CREATE TABLE folakemi.ext_stop_times;
  28. +----------------------------------------------------+
  29. | createtab_stmt |
  30. +----------------------------------------------------+
  31. | CREATE EXTERNAL TABLE `folakemi.ext_stop_times`( |
  32. | `trip_id` string, |
  33. | `arrival_time` string, |
  34. | `departure_time` string, |
  35. | `stop_id` bigint, |
  36. | `stop_sequence` smallint) |
  37. | ROW FORMAT SERDE |
  38. | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
  39. | WITH SERDEPROPERTIES ( |
  40. | 'field.delim'=',', |
  41. | 'serialization.format'=',') |
  42. | STORED AS INPUTFORMAT |
  43. | 'org.apache.hadoop.mapred.TextInputFormat' |
  44. | OUTPUTFORMAT |
  45. | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
  46. | LOCATION |
  47. | 'hdfs://node1:9000/user/folakemi/stm_gtfs/stop_times' |
  48. | TBLPROPERTIES ( |
  49. | 'transient_lastDdlTime'='1550800563') |
  50. +----------------------------------------------------+
  51. 19 rows selected (0.178 seconds)
  52.  
  53. SELECT * FROM ext_stop_times LIMIT 10
  54.  
  55. Create another table (managed) by importing data from external file
  56.  
  57. CREATE TABLE your_name.stop_times_txt
  58. ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE
  59. AS SELECT * FROM your_name.ext_stop_times;
  60.  
  61. Check the files:
  62.  
  63. hadoop fs -ls -h /user/hive/warehouse/your_name.db/stop_times_txt
  64. -rwxrwxr-x 1 root supergroup 248.1 M 2018-09-01 23:53 /user/hive/warehouse/your_name.db/stop_times_txt/000000_0
  65.  
  66. Question: Why only one file?
  67.  
  68. hadoop fs -tail /user/hive/warehouse/stm_gtfs.db/stop_times_txt/000000_0;
  69. 5:43:57,05:43:57,46,10
  70. 18S_18S_F2_2_0,05:45:13,05:45:13,36,11
  71. 18S_18S_F2_2_0,05:46:44,05:46:44,17,12
  72. ...
  73.  
  74. Output compression
  75.  
  76. Enable GZip encoding in Hive:
  77.  
  78. jdbc:hive2://node1:10000 (default)> set mapred.map.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec;
  79. jdbc:hive2://node1:10000 (default)> set hive.exec.compress.output=true;
  80.  
  81. CREATE TABLE stop_times_txt_gz
  82. ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
  83. AS SELECT * FROM stop_times_txt;
  84.  
  85. hadoop dfs -ls -h /user/hive/warehouse/your_name.db/stop_times_txt_gz;
  86. -rwxrwxr-x 1 root supergroup 41.3 M 2018-09-02 00:43 /user/hive/warehouse/your_name.db/stop_times_txt_gz/000000_0.deflate
  87.  
  88. Sequence file
  89.  
  90. CREATE TABLE stop_times_seq
  91. ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS SEQUENCEFILE
  92. AS SELECT * FROM stop_times_txt;
  93.  
  94. hadoop fs -ls -h /user/hive/warehouse/stm_gtfs.db/stop_times_seq;
  95. -rwxrwxr-x 1 root supergroup 311.2 M 2018-09-02 00:02 /user/hive/warehouse/stm_gtfs.db/stop_times_seq/000000_0
  96. hive (stm_gtfs)> dfs -tail /user/hive/warehouse/stm_gtfs.db/stop_times_seq/000000_0; :10,13,16+�&18S_18S_F2_2_0,05:52:13,05:52:13,12,17+�&18S_18S_F2_2_0,05:53:44,05:53:44,11,18+�&18S_18S_F2_2_0,...
  97. Sequence file compressed
  98.  
  99. CREATE TABLE stop_times_seq_gz ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' AS SELECT * FROM stop_times_txt;
  100.  
  101. hive (stm_gtfs)> dfs -ls -h /user/hive/warehouse/stm_gtfs.db/stop_times_seq_gz; -rwxrwxr-x 1 root supergroup 41.3 M 2018-09-02 01:36 /user/hive/warehouse/stm_gtfs.db/stop_times_seq_gz/000000_0.deflate
  102.  
  103.  
  104. CREATE EXTERNAL TABLE IF NOT EXSISTS weatherext ( wban INT, date STRING)
  105. PARTITIONED BY (year INT, month STRING)
  106. ROW FORMAT DELIMITED
  107. FIELDS TERMINATED BY ‘,’
  108. LOCATION ‘ /hive/data/weatherext’;
  109.  
  110.  
  111.  
  112. Load the data in table
  113.  
  114. LOAD DATA INPATH ‘hdfs:/data/2012.txt’ INTO TABLE weatherext;
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement