whisper paper example

from https://huggingface.co/spaces/pszemraj/document-summarization demo

---


Section 0: In this paper, we discuss how speech recognition can be improved by using "large-scale weak supervision". While some of these methods are good at finding patterns in training, they're not very good at generalizing to other sets of information. For example, Radford and his colleagues found that object recognition was better when trained on the same set of images but could still make basic errors when compared to human beings. This is because there's no way humans can guess what's going on behind the scenes. We're starting to work on more advanced speech recognition techniques like speechstew, which uses unsupervised pretraining to learn from large amounts of raw speech. These approaches have been around for years, but they've only recently been extended to include large-scale datasets.
Section 1: In this paper, we present a speech recognition system that uses large-scale machine learning to recognize speech in a language as small as Wook Kim's voice. We demonstrate how well the system works and show how it can be used to train speech recognition systems on large, unsupervised datasets. This work also introduces an approach called "Whissper" which uses natural language detection to improve speech recognition.
Section 2: In this paper, we demonstrate how we can use a large-scale speech recognition dataset as a training example for speech recognition. We split the training example into 30 second segments to train on all of the major speech recognition tasks and then de-dupe the entire sequence in order to reduce the number of training steps required. For example, we need to train a speech model that recognizes only one language at a time. Instead of using a separate machine learning system, we instead use an offtheshelf approach.
Section 3: In this paper, we describe how we use a speech recognition system to predict time and then display it in a way that can be easily interpreted. We show how the speech recognition process is used to train large-scale speech recognition tasks such as English transcription and translation.