Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- //Zadanie3
- Dataset<Row> department = spark.read().option("header", "true").csv("departments.csv");
- Dataset<Row> products = spark.read().option("header", "true").csv("products.csv");
- Dataset<Row> products_department = products.join(department, "department_id");
- Dataset<Row> pdCtr = products_department.groupBy("department").count();
- Dataset<Row> all = pdCtr.select(pdCtr.col("department"), pdCtr.col("count"), pdCtr.col("count").divide(products.count()).multiply(100).as("precent"));
- all.show();
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement