Advertisement
Guest User

Untitled

a guest
Dec 11th, 2019
343
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.16 KB | None | 0 0
  1. [cloudera@quickstart Desktop]$ spark-shell
  2. Setting default log level to "WARN".
  3. To adjust logging level use sc.setLogLevel(newLevel).
  4. Welcome to
  5. ____ __
  6. / __/__ ___ _____/ /__
  7. _\ \/ _ \/ _ `/ __/ '_/
  8. /___/ .__/\_,_/_/ /_/\_\ version 1.6.0
  9. /_/
  10.  
  11. Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67)
  12. Type in expressions to have them evaluated.
  13. Type :help for more information.
  14. Spark context available as sc (master = yarn-client, app id = application_1576096912396_0008).
  15. 19/12/11 17:02:41 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0-cdh5.13.0
  16. 19/12/11 17:02:41 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
  17. SQL context available as sqlContext.
  18.  
  19. scala> sqlContext.sql("CREATE TABLE sample_07 (code string,description string,total_emp int,salary int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TextFile")
  20. res0: org.apache.spark.sql.DataFrame = [result: string]
  21.  
  22. scala> sqlContext.sql("CREATE TABLE sample_08 (code string,description string,total_emp int,salary int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TextFile")
  23. res1: org.apache.spark.sql.DataFrame = [result: string]
  24.  
  25. scala> sqlContext.sql("LOAD DATA INPATH '/user/cloudera/mydata/sample_07.csv' OVERWRITE INTO TABLE sample_07")
  26. chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/sample_07/sample_07.csv': User does not belong to supergroup
  27. res2: org.apache.spark.sql.DataFrame = [result: string]
  28.  
  29. scala> sqlContext.sql("LOAD DATA INPATH '/user/cloudera/mydata/sample_08.csv' OVERWRITE INTO TABLE sample_08")
  30. chgrp: changing ownership of 'hdfs://quickstart.cloudera:8020/user/hive/warehouse/sample_08/sample_08.csv': User does not belong to supergroup
  31. res3: org.apache.spark.sql.DataFrame = [result: string]
  32.  
  33. scala> val df_07 = sqlContext.sql("SELECT * from sample_07")
  34. df_07: org.apache.spark.sql.DataFrame = [code: string, description: string, total_emp: int, salary: int]
  35.  
  36. scala> val df_08 = sqlContext.sql("SELECT * from sample_08")
  37. df_08: org.apache.spark.sql.DataFrame = [code: string, description: string, total_emp: int, salary: int]
  38.  
  39. scala> df_07.filter(df_07("salary") > 150000).show()
  40. +-------+--------------------+---------+------+
  41. | code| description|total_emp|salary|
  42. +-------+--------------------+---------+------+
  43. |11-1011| Chief executives| 299160|151370|
  44. |29-1022|Oral and maxillof...| 5040|178440|
  45. |29-1023| Orthodontists| 5350|185340|
  46. |29-1024| Prosthodontists| 380|169360|
  47. |29-1061| Anesthesiologists| 31030|192780|
  48. |29-1062|Family and genera...| 113250|153640|
  49. |29-1063| Internists, general| 46260|167270|
  50. |29-1064|Obstetricians and...| 21340|183600|
  51. |29-1067| Surgeons| 50260|191410|
  52. |29-1069|Physicians and su...| 237400|155150|
  53. +-------+--------------------+---------+------+
  54.  
  55.  
  56. scala> df_07.printShema()
  57. <console>:28: error: value printShema is not a member of org.apache.spark.sql.DataFrame
  58. df_07.printShema()
  59. ^
  60.  
  61. scala> val df_09 = df_07.join(df_08, df_07("code") === df_08("code")).select(df_07.col("code"), df_07.col("description"))
  62. df_09: org.apache.spark.sql.DataFrame = [code: string, description: string]
  63.  
  64. scala> df_09.show()
  65. +-------+--------------------+
  66. | code| description|
  67. +-------+--------------------+
  68. |00-0000| All Occupations|
  69. |11-0000|Management occupa...|
  70. |11-1011| Chief executives|
  71. |11-1021|General and opera...|
  72. |11-1031| Legislators|
  73. |11-2011|Advertising and p...|
  74. |11-2021| Marketing managers|
  75. |11-2022| Sales managers|
  76. |11-2031|Public relations ...|
  77. |11-3011|Administrative se...|
  78. |11-3021|Computer and info...|
  79. |11-3031| Financial managers|
  80. |11-3041|Compensation and ...|
  81. |11-3042|Training and deve...|
  82. |11-3049|Human resources m...|
  83. |11-3051|Industrial produc...|
  84. |11-3061| Purchasing managers|
  85. |11-3071|Transportation, s...|
  86. |11-9011|Farm, ranch, and ...|
  87. |11-9012|Farmers and ranchers|
  88. +-------+--------------------+
  89. only showing top 20 rows
  90.  
  91.  
  92. scala> val df_10 = sqlContext.sql("select s7.code, s7.description from sample_07 s7 join sample_08 s8 on s7.code == s8.code")
  93. df_10: org.apache.spark.sql.DataFrame = [code: string, description: string]
  94.  
  95. scala> df_10.show()
  96. +-------+--------------------+
  97. | code| description|
  98. +-------+--------------------+
  99. |00-0000| All Occupations|
  100. |11-0000|Management occupa...|
  101. |11-1011| Chief executives|
  102. |11-1021|General and opera...|
  103. |11-1031| Legislators|
  104. |11-2011|Advertising and p...|
  105. |11-2021| Marketing managers|
  106. |11-2022| Sales managers|
  107. |11-2031|Public relations ...|
  108. |11-3011|Administrative se...|
  109. |11-3021|Computer and info...|
  110. |11-3031| Financial managers|
  111. |11-3041|Compensation and ...|
  112. |11-3042|Training and deve...|
  113. |11-3049|Human resources m...|
  114. |11-3051|Industrial produc...|
  115. |11-3061| Purchasing managers|
  116. |11-3071|Transportation, s...|
  117. |11-9011|Farm, ranch, and ...|
  118. |11-9012|Farmers and ranchers|
  119. +-------+--------------------+
  120. only showing top 20 rows
  121.  
  122.  
  123. scala> df_10.registerTempTable("sparkle_my_ass")
  124.  
  125. scala> sqlContext.sql("select * from sparkle_my_ass").show()
  126. +-------+--------------------+
  127. | code| description|
  128. +-------+--------------------+
  129. |00-0000| All Occupations|
  130. |11-0000|Management occupa...|
  131. |11-1011| Chief executives|
  132. |11-1021|General and opera...|
  133. |11-1031| Legislators|
  134. |11-2011|Advertising and p...|
  135. |11-2021| Marketing managers|
  136. |11-2022| Sales managers|
  137. |11-2031|Public relations ...|
  138. |11-3011|Administrative se...|
  139. |11-3021|Computer and info...|
  140. |11-3031| Financial managers|
  141. |11-3041|Compensation and ...|
  142. |11-3042|Training and deve...|
  143. |11-3049|Human resources m...|
  144. |11-3051|Industrial produc...|
  145. |11-3061| Purchasing managers|
  146. |11-3071|Transportation, s...|
  147. |11-9011|Farm, ranch, and ...|
  148. |11-9012|Farmers and ranchers|
  149. +-------+--------------------+
  150. only showing top 20 rows
  151.  
  152.  
  153. scala> df_10.persist()
  154. res10: df_10.type = [code: string, description: string]
  155.  
  156. scala> sqlContext.sql("select * from sparkle_my_ass").show()
  157. +-------+--------------------+
  158. | code| description|
  159. +-------+--------------------+
  160. |00-0000| All Occupations|
  161. |11-0000|Management occupa...|
  162. |11-1011| Chief executives|
  163. |11-1021|General and opera...|
  164. |11-1031| Legislators|
  165. |11-2011|Advertising and p...|
  166. |11-2021| Marketing managers|
  167. |11-2022| Sales managers|
  168. |11-2031|Public relations ...|
  169. |11-3011|Administrative se...|
  170. |11-3021|Computer and info...|
  171. |11-3031| Financial managers|
  172. |11-3041|Compensation and ...|
  173. |11-3042|Training and deve...|
  174. |11-3049|Human resources m...|
  175. |11-3051|Industrial produc...|
  176. |11-3061| Purchasing managers|
  177. |11-3071|Transportation, s...|
  178. |11-9011|Farm, ranch, and ...|
  179. |11-9012|Farmers and ranchers|
  180. +-------+--------------------+
  181. only showing top 20 rows
  182.  
  183.  
  184. scala>
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement