Advertisement
rowntreerob

Convert 3gp audio to Text (googl/speech-api)

Oct 19th, 2012
472
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.77 KB | None | 0 0
  1. 3gp audio file (aac 44100hz) to flac then segmented then to Google speech-to-text
  2.  
  3.  
  4. convert it first flac output at slower sample rate (step down 44100 to 16000 )
  5. ffmpeg -y -loglevel verbose -i voicet.3gp -ar 16000 -ab 16000 voice.flac
  6.  
  7. Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Downloads/test.3gp':
  8. Metadata:
  9. major_brand : 3gp4
  10. minor_version : 0
  11. compatible_brands: isom3gp4
  12. creation_time : 2012-10-09 16:57:24
  13. Duration: 00:00:04.57, start: 0.000000, bitrate: 69 kb/s
  14. Stream #0:0(eng): Audio: aac (mp4a / 0x6134706D), 44100 Hz, mono, s16, 64 kb/s
  15. Metadata:
  16. creation_time : 2012-10-09 16:57:24
  17. handler_name : SoundHandle
  18. The bitrate parameter is set too low. It takes bits/s as argument, not kbits/s
  19. Output #0, flac, to 'voice.flac':
  20. Metadata:
  21. major_brand : 3gp4
  22. minor_version : 0
  23. --- END STep
  24. ---split file to 10 Sec segments
  25.  
  26. ffmpeg -i voice.flac -y -c copy -flags global_header -map 0 -f segment -segment_time 10 -segment_list fileList.txt ../data/beacon/segment_%02d.flac
  27.  
  28. [flac @ 0x2bfe260] max_analyze_duration 5000000 reached at 5040000
  29. Input #0, flac, from 'voicet.flac':
  30. Metadata:
  31. MAJOR_BRAND : 3gp4
  32. MINOR_VERSION : 0
  33. COMPATIBLE_BRANDS: isom3gp4
  34. ENCODER : Lavf54.29.105
  35. Duration: 00:00:54.03, bitrate: 163 kb/s
  36. Stream #0:0: Audio: flac, 16000 Hz, mono, s16
  37. Output #0, segment, to '../data/beacon/segment_%02d.flac':
  38. Metadata:
  39. MAJOR_BRAND : 3gp4
  40. MINOR_VERSION : 0
  41. COMPATIBLE_BRANDS: isom3gp4
  42. encoder : Lavf54.29.105
  43. Stream #0:0: Audio: flac, 16000 Hz, mono
  44. Stream mapping:
  45. Stream #0:0 -> #0:0 (copy)
  46. Press [q] to stop, [?] for help
  47. size= 0kB time=00:00:54.00 bitrate= 0.0kbits/s
  48. video:0kB audio:1071kB subtitle:0 global headers:0kB muxing overhead -100.000000%
  49.  
  50. --> list OUT files each 10 second
  51. rob@ beacon$ ls -l *flac
  52. -rw-rw-r-- 1 rob rob 220508 Oct 19 15:37 segment_00.flac
  53. -rw-rw-r-- 1 rob rob 213396 Oct 19 15:37 segment_01.flac
  54. -rw-rw-r-- 1 rob rob 203085 Oct 19 15:37 segment_02.flac
  55. -rw-rw-r-- 1 rob rob 204558 Oct 19 15:37 segment_03.flac
  56. -rw-rw-r-- 1 rob rob 213091 Oct 19 15:37 segment_04.flac
  57. -rw-rw-r-- 1 rob rob 92195 Oct 19 15:37 segment_05.flac
  58.  
  59. --> process the above list submitted to google/speech-api for text OUTPUT
  60.  
  61. $ for f in *flac; \
  62. do curl -X POST -H "Content-Type: audio/x-flac; rate=16000" \
  63. -T $f "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=speech2text&maxresults=1&lang=en-US"; \
  64. done
  65.  
  66. --Responses to 6 requests: text in 'utterance' TAG
  67.  
  68. {"status":0,"id":"c90030465ed0deb7c0723687a0657c02-1","hypotheses":[{"utterance":"streaming service did you did on Netflix in 2007 offering movies and television programs in a","confidence":0.7910386}]}
  69. {"status":0,"id":"8b8ac5cd6b8a1effc350ab55fb3c7123-1","hypotheses":[{"utterance":"sweet corn on with subscribers for the price of a subscription to Netflix which is about 10 dollars a month you could walk","confidence":0.8551374}]}
  70. {"status":0,"id":"91c1ab4c90c557332436ffa7b06d7e55-1","hypotheses":[{"utterance":"is many movies as you want and if you didn't like 1 you could start another it brought channel surfing to movies cancel","confidence":0.76847774}]}
  71. {"status":0,"id":"0550f4606da7bb58458fc7ae379f3ba3-1","hypotheses":[{"utterance":"says for many people Netflix was the first glimpse of a kind of content Holy Grail limitless choice","confidence":0.80531996}]}
  72. {"status":0,"id":"9125c7c98f5a83170421398918bcd3b6-1","hypotheses":[{"utterance":"all on demand available on any internet connected device by 2010 streaming have become a major part of","confidence":0.8495959}]}
  73. {"status":0,"id":"81dc3362d64ad2c6af3ec2319fce4f9b-1","hypotheses":[{"utterance":"business its stock price sword","confidence":0.66059893}]}
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement