Entertrainment Tasklist

~~~~ Entertrainment Tasklist ~~~~

- Text -> Spectrogram
	- Tacotron2 Public Colab models (done)
	- Tacotron2 Old Multispeaker with GST (paused)
	- Tacotron2 Multispeaker with TorchMoji intergration (in Training)
	- Tacotron2 Multispeaker with CORRECT Window Length (done)
	- Tacotron2 Multispeaker with TPGST
	- Tacotron2 Multispeaker with 'SEMI-SUPERVISED GENERATIVE MODELING FOR CONTROLLABLE SPEECH SYNTHESIS'
		https://arxiv.org/pdf/1910.01709.pdf

- Spectrogram -> Waveform
	- WaveGlow 22Khz pretrained from Nvidia (done)
	- MemEfficient source code (to be refactored and uploaded)
	- WaveGlow 48Khz MemEfficient Large (to be uploaded)
	- WaveGlow 48Khz MemEfficient Large 3.5 SpeakerEmbedded (~to be uploaded~ being trained more)
	- WaveGlow 48Khz MemEfficient Large 5.1 (paused)
	- WaveGlow 48khz MemEfficient Small GlobalSpeakerEmbeddings (done)

- Upload fimfiction files in Colab
	- txt
	- html
	- epub

- Parse into sections for inference (later, using synthbot.ai)
	- split by line (done)
	- split larger lines into pieces (done)
	- split by quote (done)
	- split intelligently

- Figure out who's speaking
	- All text using same chosen speaker ID (done)
	- All text using speaker names e.g; Twilight Sparkle instead of speaker_id 32
	- Infer from explicit information e.g: 'said Twilight'
	- Infer from public NLP models
	- Infer from synthbot.ai or custom solution

- Generate audio
	- View in browser (done)
	- Save in Google Drive
	- Download each .wav to browser (done)
	- Package into Zip for Download
		- time-synced .LRC files for text
		- time-synced .SRT files for text
		- thumbnail for .epub input

- Misc
	- Batch infer Tacotron2 (done!)
	- Preserve Decoder state between lines
	- Update requirements for stop-tokens (done)
		- Stop after delay (done)
		- Stop on alignment collapse (done)
			- monitor Average max attention weight during inference (triple done!)