lamiastella

.bash inside .slurm is not being run

Sep 13th, 2021
193
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.76 KB | None | 0 0
  1. so, $ sbatch torch_gpu_sanity_venv385-11.slurm runs the stuff inside the .slurm except running the /research/jalal/slurm/torch_gpu_sanity_venv385-11.bash line
  2. [jalal@goku fashion_compatibility]$ cat slurm-28334.out
  3. Mon Sep 13 20:06:44 EDT 2021 ivcgpu10.bu.edu env
  4. CUDA_HOME=/usr/local/cuda-10.0
  5. BASEPATH=/home/grad3/jalal/.local/bin:/opt/rh/rh-php70/root/bin:/scratch2/system/opt/pycharm-community-2018.1.2/bin:/scratch2/google-cloud-sdk/bin:/scratch2/google-cloud-sdk/bin:/scratch3/.cargo/bin:/home/grad3/jalal/.gdrive-downloader:/usr/local/cuda-10.0/bin:/scratch3/MATLAB/matlab/bin:/home/grad3/jalal/.local/bin:/opt/rh/rh-php70/root/bin:/scratch2/system/opt/pycharm-community-2018.1.2/bin:/scratch2/google-cloud-sdk/bin:/scratch2/google-cloud-sdk/bin:/usr/lib64/qt-3.3/bin:/scratch/sjn-p3/anaconda/anaconda3/condabin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/local/IT/bin:/usr/bin/python3:/home/grad3/jalal/bin:/scratch/sjn-p2/anaconda/anaconda2/bin
  6. CUDA_DEVICE_ORDER=PCI_BUS_ID
  7. CUDA_VISIBLE_DEVICES=7
  8. PATH=/home/grad3/jalal/.gdrive-downloader:/usr/local/cuda-10.0/bin:/scratch3/MATLAB/matlab/bin:/home/grad3/jalal/.local/bin:/opt/rh/rh-php70/root/bin:/scratch2/system/opt/pycharm-community-2018.1.2/bin:/scratch2/google-cloud-sdk/bin:/scratch2/google-cloud-sdk/bin:/scratch3/.cargo/bin:/home/grad3/jalal/.gdrive-downloader:/usr/local/cuda-10.0/bin:/scratch3/MATLAB/matlab/bin:/home/grad3/jalal/.local/bin:/opt/rh/rh-php70/root/bin:/scratch2/system/opt/pycharm-community-2018.1.2/bin:/scratch2/google-cloud-sdk/bin:/scratch2/google-cloud-sdk/bin:/usr/lib64/qt-3.3/bin:/scratch/sjn-p3/anaconda/anaconda3/condabin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/bin:/sbin:/usr/local/IT/bin:/usr/bin/python3:/home/grad3/jalal/bin:/scratch/sjn-p2/anaconda/anaconda2/bin:/usr/bin/python3
  9. torch vers = 1.8.1+cu111
  10. tensor([[0.6323, 0.2388, 0.8772],
  11. [0.6798, 0.7087, 0.4152],
  12. [0.5309, 0.6991, 0.4273],
  13. [0.6753, 0.0531, 0.5897],
  14. [0.8470, 0.7392, 0.3246]])
  15. cuda avail = True
  16. cuda current dev= 0
  17. dev 0 = NVIDIA RTX A6000
  18. and
  19. [jalal@goku fashion_compatibility]$ cat torch_gpu_sanity_venv385-11.slurm
  20. #!/bin/bash
  21. #SBATCH --partition=gpu-L --gres=gpu:1
  22. # -------------------------> ask for 1 GPU
  23. d=$(date)
  24. h=$(hostname)
  25. echo $d $h env # show CUDA related Env vars
  26. env|grep -i cuda
  27. # nvidia-smi
  28. # actual work
  29. /research/jalal/slurm/torch_gpu_sanity_venv385-11.bash
  30. and
  31. [jalal@goku fashion_compatibility]$ cat torch_gpu_sanity_venv385-11.bash
  32. #!/bin/bash
  33. source /research/jalal/slurm/venv/fash/bin/activate
  34. python main.py --name both_attn_1e_SA --learned --l2_embed --datadir ../../data/ --all_attend_fc --use_attend --epochs 1
  35. deactivate
  36. does anyone know why the 10th line of the .slurm script is not being run?
Add Comment
Please, Sign In to add comment