Advertisement
Guest User

Untitled

a guest
Sep 19th, 2024
56
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.91 KB | None | 0 0
  1. ## Whisper + Pyannote, audio-to-text transcription with speaker diarization ##
  2. """
  3. https://github.com/pyannote/pyannote-audio/issues/1474#issuecomment-1746998271
  4. https://huggingface.co/pyannote/speaker-diarization-3.0
  5. https://huggingface.co/pyannote/speaker-diarization-3.1
  6. https://huggingface.co/pyannote/speaker-diarization
  7. https://huggingface.co/pyannote/segmentation
  8. """
  9. import whisper
  10. from pyannote.audio import Pipeline
  11. import json
  12.  
  13. # Load the audio file
  14. audio_file = "r:\\output\\chunk_4.wav"
  15.  
  16. # Transcribe with Whisper
  17. model = whisper.load_model("small")
  18. result = model.transcribe(audio_file, language="slovenian")
  19.  
  20. # Perform diarization with Pyannote
  21. pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token="YOUR_TOKEN_HERE")
  22. diarization = pipeline(audio_file, num_speakers=2)
  23.  
  24. # Prepare output
  25. output = []
  26.  
  27. # Match diarization with transcription
  28. for segment in result["segments"]:
  29. start = segment["start"]
  30. end = segment["end"]
  31. text = segment["text"]
  32.  
  33. # Find the speaker for this segment
  34. speaker = None
  35. for turn, _, spk in diarization.itertracks(yield_label=True):
  36. if turn.start <= start < turn.end:
  37. speaker = spk
  38. break
  39.  
  40. output.append({
  41. "start": start,
  42. "end": end,
  43. "speaker": speaker,
  44. "text": text
  45. })
  46.  
  47. # Write results to file
  48. with open("r:\\transcription_with_speakers.txt", "w", encoding="utf-8") as f:
  49. for item in output:
  50. f.write(f"Speaker {item['speaker']}:\n")
  51. f.write(f"{item['text']}\n")
  52. f.write(f"[{item['start']:.2f} - {item['end']:.2f}]\n\n")
  53.  
  54. # # Optionally, also save as JSON for easier parsing later
  55. # with open("r:\\output\\transcription_with_speakers.json", "w", encoding="utf-8") as f:
  56. # json.dump(output, f, ensure_ascii=False, indent=2)
  57.  
  58. print("Transcription with speaker diarization completed and saved to file.")
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement