Guest User

Untitled

a guest
Jul 17th, 2018
74
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.15 KB | None | 0 0
  1. I have to believe that this has been addressed but here's the workaround I did to get EMR Spark working with BQ.
  2.  
  3. Since a Spark job can pick any one of the available cluster nodes to be the Spark driver for an app, each node will need to have the BigQuery json cred file in it’s /home/hadoop/.gcp dir as bq.json. Here’s the steps to getting that done:
  4.  
  5. Start ssh-agent on your laptop so that it can proxy PEM crews on the remote EMR master node (slave nodes are not directly available).
  6. ```
  7. $ ssh-agent
  8. […]
  9. ```
  10. Then add in the .pem file required:
  11. ```
  12. $ ssh-add ~/.ssh/me.pem
  13. ```
  14. Next, ssh using “-A” to use the ssh agent:
  15. ```
  16. $ ssh -A -I ~/.ssh/me.pem <master-node-ip>
  17. ```
  18. Then find each of the slave node IPs in the cluster
  19. ```
  20. $ aws emr list-clusters --region us-west-2 --active | grep Id
  21. "Id": "j-3UJBBJ07DDEEF”
  22. ```
  23. And finally copy the bg.json file from master to each core node:
  24. ```
  25. $ for n in `aws emr list-instances --cluster j-3UJBBJ07DDEEF --region us-west-2 \
  26. --instance-group-types CORE | grep IpAddress | awk -F"\"" '{print $4}'`; \
  27. do echo $n; ssh $n "mkdir .gcp"; scp ./.gcp/bq.json $n:.gcp/; done
  28. ```
Add Comment
Please, Sign In to add comment