Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- ~~~~ Entertrainment Tasklist ~~~~
- - Text -> Spectrogram
- - Tacotron2 Public Colab models (done)
- - Tacotron2 Old Multispeaker with GST (paused)
- - Tacotron2 Multispeaker with TorchMoji intergration (in Training)
- - Tacotron2 Multispeaker with CORRECT Window Length (done)
- - Tacotron2 Multispeaker with TPGST
- - Tacotron2 Multispeaker with 'SEMI-SUPERVISED GENERATIVE MODELING FOR CONTROLLABLE SPEECH SYNTHESIS'
- https://arxiv.org/pdf/1910.01709.pdf
- - Spectrogram -> Waveform
- - WaveGlow 22Khz pretrained from Nvidia (done)
- - MemEfficient source code (to be refactored and uploaded)
- - WaveGlow 48Khz MemEfficient Large (to be uploaded)
- - WaveGlow 48Khz MemEfficient Large 3.5 SpeakerEmbedded (~to be uploaded~ being trained more)
- - WaveGlow 48Khz MemEfficient Large 5.1 (paused)
- - WaveGlow 48khz MemEfficient Small GlobalSpeakerEmbeddings (done)
- - Upload fimfiction files in Colab
- - txt
- - html
- - epub
- - Parse into sections for inference (later, using synthbot.ai)
- - split by line (done)
- - split larger lines into pieces (done)
- - split by quote (done)
- - split intelligently
- - Figure out who's speaking
- - All text using same chosen speaker ID (done)
- - All text using speaker names e.g; Twilight Sparkle instead of speaker_id 32
- - Infer from explicit information e.g: 'said Twilight'
- - Infer from public NLP models
- - Infer from synthbot.ai or custom solution
- - Generate audio
- - View in browser (done)
- - Save in Google Drive
- - Download each .wav to browser (done)
- - Package into Zip for Download
- - time-synced .LRC files for text
- - time-synced .SRT files for text
- - thumbnail for .epub input
- - Misc
- - Batch infer Tacotron2 (done!)
- - Preserve Decoder state between lines
- - Update requirements for stop-tokens (done)
- - Stop after delay (done)
- - Stop on alignment collapse (done)
- - monitor Average max attention weight during inference (triple done!)
Add Comment
Please, Sign In to add comment