Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- -- audio track download -- issue stereo default instead of MONO which precludes transcribed BOTH tracks
- ads$ yt-dlp --no-playlist -f 'ba' https://www.youtube.com/watch?v=bEuIQosyQBo -o 'chd_audio_1.%(ext)s'
- [youtube] Extracting URL: https://www.youtube.com/watch?v=bEuIQosyQBo
- [youtube] bEuIQosyQBo: Downloading webpage
- [youtube] bEuIQosyQBo: Downloading android player API JSON
- [info] bEuIQosyQBo: Downloading 1 format(s): 251
- [dashsegments] Total fragments: 5
- [download] Destination: chd_audio_1.webm
- [download] 100% of 46.95MiB in 00:00:14 at 3.32MiB/s
- -- cloud upload input for transcription --
- Downloads$ gsutil cp chd_audio_1.webm gs://workbox-demos-1b95f-us-notebooks
- Updates are available for some Google Cloud CLI components. To install them,
- please run:
- $ gcloud components update
- -- transcribe the input to json out (dupped) --
- curl -X POST \
- -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
- -H "Content-Type: application/json; charset=utf-8" \
- --data "{
- 'config': {
- 'language_code': 'en-US','encoding': 'WEBM_OPUS', 'audioChannelCount':2, 'enableSeparateRecognitionPerChannel': true
- },
- 'audio':{
- 'uri':''
- }
- }" "https://speech.googleapis.com/v1/speech:longrunningrecognize"
- Copying file://chd_audio_1.webm [Content-Type=video/webm]...
- / [1 files][ 47.0 MiB/ 47.0 MiB]
- Operation completed over 1 objects/47.0 MiB.
- $ curl -X POST -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) -H "Content-Type: application/json; charset=utf-8" --data "{
- 'config': {
- 'language_code': 'en-US','encoding': 'WEBM_OPUS', 'audioChannelCount':2, 'enableSeparateRecognitionPerChannel': true
- },
- 'audio':{
- 'uri':'gs://workbox-demos-1b95f-us-notebooks/chd_audio_1.webm'
- }
- }" "https://speech.googleapis.com/v1/speech:longrunningrecognize"
- {
- "name": "2765399186392764876"
- }
- --copy the file from stdout -- dest chd_transcribed_raw_1.txt
- -- sed to just json tag , awk deduped to solve the stereo issue --
- sed -nr '/"transcript": "/p' chd_transcribed_raw_1.txt | awk 'NR %2 == 0' > chd_tran_raw_dedupd.txt
- rm the tags w manual edit on 80 "transcript": tags in json
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement