Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ```
- id first_name last_name email department ip_salary
- 1 Randy Griffin rgriffin0@discuz.net Product Management €3244,84
- 2 Teresa Perry tperry1@microsoft.com Services €2138,48
- 3 Matthew Ellis mellis2@livejournal.com Human Resources €2431,08
- 4 Joyce Rogers jrogers3@miibeian.gov.cn Marketing €3100,05
- 5 Joyce Parker jparker4@delicious.com Engineering €1718,04
- 6 Kenneth Willis kwillis5@canalblog.com Support €2579,97
- 7 Nicole Armstrong narmstrong6@nbcnews.com Support €3679,80
- 8 Harry Chavez hchavez7@symantec.com Product Management €4101,69
- 9 Frances Wright fwright8@bigcartel.com Training €2055,84
- 10 James Freeman jfreeman9@prlog.org Research and Development €4039,92
- 11 Emily Mason emasona@cnn.com Engineering €1309,85
- 12 Willie Alexander walexanderb@state.gov Legal €1800,52
- 13 Gerald Weaver gweaverc@imageshack.us Accounting €2776,68
- 14 Teresa Burns tburnsd@fastcompany.com Research and Development €2390,65
- 15 Mary Lee mleee@tumblr.com Product Management €2313,01
- 16 Carolyn Cooper ccooperf@addtoany.com Business Development €2521,03
- 17 Cheryl Fox cfoxg@cargocollective.com Product Management €1420,52
- 18 Diane Lawrence dlawrenceh@diigo.com Sales €1883,02
- 19 Susan Porter sporteri@shareasale.com Research and Development €3540,25
- 20 Irene Turner iturnerj@sbwire.com Product Management €3162,10
- ```
- val df = spark.read.option("header", "true").option("delimiter", "\t").csv("file:///Users/mprescha/salaries.txt")
- val df2 = df.withColumn("salary", translate(col("ip_salary").substr(2,10),",","." ).cast( DoubleType))
- df2.groupBy("department").avg("salary").show
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement