Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- 00:00
- good morning hi my name is Amelie I'm
- 00:04
- going to be a session chair for the
- 00:05
- morning so it's my great pleasure to
- 00:08
- introduce cryovac solo who is going to
- 00:11
- give an invited talk karai is a director
- 00:15
- of research in deep mine and he's one of
- 00:17
- the star researchers in our community he
- 00:20
- has contributed to many highly
- 00:22
- influential projects in deep mind such
- 00:25
- as spatial transformer networks auto
- 00:28
- regressive generated models such as
- 00:31
- pixel recurrent networks and wave nets
- 00:34
- and debris enforcement learning for
- 00:36
- playing Atari games and alphago today he
- 00:39
- will talk about from generative models
- 00:42
- to generative agents so let's welcome
- 00:46
- karai
- 00:47
- [Applause]
- 00:48
- [Music]
- 00:53
- thank you very much Hong Kong for the
- 00:55
- very nice introduction
- 00:56
- and thanks everyone for being here it's
- 00:59
- it's absolute pleasure
- 01:00
- so is Hong like mentioned I'm going to
- 01:04
- try to talk about unsupervised learning
- 01:08
- in general starting from the generative
- 01:10
- models may be a classical way when I try
- 01:11
- to give another view that I think is
- 01:14
- quite interesting that we have been we
- 01:16
- have been working on recently when I
- 01:20
- think about what are the important
- 01:21
- things for us to do is a is a community
- 01:24
- I think everyone here sort of agrees
- 01:27
- that in the end what is important is to
- 01:29
- be to be doing constitute to us learning
- 01:31
- we sort of realize that supervised
- 01:33
- learning has all sorts of successes but
- 01:37
- in the end unsupervised learning is kind
- 01:39
- of like the next frontier and when I
- 01:42
- think about unsupervised learning there
- 01:46
- are there are sort of like different
- 01:48
- explanations that come to my mind and
- 01:50
- when talking to people I think we all
- 01:52
- have sort of different opinions on this
- 01:54
- one of the things that I think is a
- 01:57
- common explanation is we have an
- 01:58
- unsupervised learning algorithm we run
- 02:00
- it on our data what we expect is the
- 02:02
- algorithm to understand our data and to
- 02:04
- explain our data or or or our
- 02:07
- environment right and and what we expect
- 02:10
- from this is that the algorithm is going
- 02:12
- to learn the intrinsic properties of our
- 02:14
- data of our environment and then it's
- 02:16
- going to be able to explain that through
- 02:18
- those properties but most of the time
- 02:20
- what happens is because of the kinds of
- 02:24
- models that we use we resort to and at
- 02:26
- the end
- 02:26
- looking at samples and what we look at
- 02:28
- the samples we try to see that did our
- 02:31
- model really understand the environment
- 02:32
- and if it understood the environment
- 02:34
- then then the sample should be
- 02:36
- meaningful of course we look at all
- 02:38
- sorts of objective measures that we try
- 02:39
- to that that we use during training like
- 02:41
- Inception scores looked at cahoots and
- 02:43
- such but in the end we always resort the
- 02:45
- samples in terms of like understanding
- 02:47
- if our model really can explain what's
- 02:49
- going on in the environment the other
- 02:51
- kind of general explanation that we all
- 02:53
- use is like the goal of unsupervised
- 02:56
- learning is to learn rich
- 02:57
- representations right it's already
- 02:58
- embedded in the name of the skill of
- 03:01
- this conference the main goal of deep
- 03:03
- learning unsupervised learning is with
- 03:05
- learning
- 03:05
- those are presentations but then when we
- 03:08
- think about those representations again
- 03:09
- it doesn't this explanation doesn't give
- 03:11
- us an objective measure what we think
- 03:13
- about is why those like how are we going
- 03:17
- to think about those representations in
- 03:19
- terms of being great and useful and to
- 03:21
- me the most important bit is if we have
- 03:24
- good and richer presentations then they
- 03:26
- are useful for generalization for
- 03:28
- transfer right and we need to we need to
- 03:31
- sort of if you have a good unsupervised
- 03:33
- learning model and it can give us good
- 03:35
- through presentations then we can get
- 03:37
- generalization so what I'm going to do
- 03:39
- is today also tie it together with
- 03:41
- something else that is really I think
- 03:43
- for me it is very important as long as
- 03:45
- I've mentioned some a big chunk of work
- 03:47
- that we have been doing a deep mine that
- 03:49
- I've been doing is about agents and
- 03:51
- reinforcement learning and in this talk
- 03:53
- I'm going to sort of take a look at
- 03:55
- unsupervised learning from classical
- 03:57
- sense of like learning a learning a
- 04:00
- generative model and also learning an
- 04:02
- agent that can do on supervised learning
- 04:03
- so I'm going to start from the wavenet
- 04:06
- model hopefully as many of you know it
- 04:10
- is a generative model of audio it's a
- 04:12
- pure deep learning model and turns it
- 04:14
- does you can model any audio signal like
- 04:17
- speech and and and music and then you
- 04:20
- can get really realistic samples out of
- 04:21
- that and the next thing I'm going to do
- 04:25
- is I'm going to explain this other sort
- 04:27
- of new approach that that I find really
- 04:29
- interesting to unsupervised learning
- 04:31
- that is based on deep reinforcement
- 04:34
- learning learning an agent that can
- 04:36
- actually that does unsupervised learning
- 04:38
- so this model called spiral is based on
- 04:41
- a new agent architecture that we have
- 04:43
- been that we have been working on that
- 04:45
- we have published recently called Impala
- 04:46
- it's a very large highly scaleable
- 04:49
- efficient off-post elearning agent
- 04:51
- architecture that we use in spiral to do
- 04:54
- unsupervised learning and the
- 04:57
- interesting bit about the spiral work is
- 04:59
- it does generalization through using
- 05:01
- some sort of tool space tools that we as
- 05:03
- people have created that we have created
- 05:06
- so that we can actually solve not one
- 05:08
- specific problem we can solve many
- 05:10
- different problems using these tools and
- 05:12
- using the interface of a two
- 05:14
- and having an agent you can actually now
- 05:16
- learn a generative model of your
- 05:19
- environment
- 05:19
- all right so without like more delay the
- 05:24
- first thing that I'm going to try to
- 05:25
- introduce is like quickly the very net
- 05:27
- model way net is a generative model of
- 05:30
- of audio as I said it is it samples the
- 05:33
- robot your signal it doesn't use any
- 05:35
- sort of interface to model the audio
- 05:38
- signal audio in general is very very
- 05:41
- high dimensional so the the standard
- 05:43
- audio signal that we started when Miller
- 05:45
- done moved a bit when we were at the
- 05:48
- beginning but 16,000 samples per second
- 05:50
- like if you compare that our usual
- 05:52
- language modeling and and and machine
- 05:55
- translation kind of tasks it is several
- 05:57
- orders of magnitude more data so the
- 06:00
- kinds of dependencies that one needs to
- 06:02
- model to be able to model good audio
- 06:04
- signals is very it's very long so this
- 06:09
- model what it does is it samples it
- 06:11
- models one sample at a time and it is a
- 06:14
- soft max distribution to model the model
- 06:17
- each sample depending on dependent on
- 06:20
- all the all the previous samples of the
- 06:22
- of the signal when you look at it more
- 06:26
- closely though it is it is it is an
- 06:28
- architecture that has quite a bit of
- 06:30
- resemblance to the pixel CNN model maybe
- 06:32
- some of you also are familiar with that
- 06:34
- in the end it is a stack of multiple
- 06:38
- commotion layers to be a little bit more
- 06:40
- specific it has these residual blocks
- 06:42
- you use multiples of those residual
- 06:44
- blocks and each decision and in each
- 06:46
- residual work there are these dilated
- 06:49
- convolutional layers that that go on top
- 06:53
- of each other and through those dilated
- 06:54
- convolutional layers that are causal
- 06:56
- convolutions we can model very long
- 06:59
- dependencies so through that we can get
- 07:01
- the modelling dependency in time now one
- 07:06
- of the biggest design considerations
- 07:08
- about wag net is it is designed to be
- 07:11
- very very efficient during training
- 07:13
- because during training what you can do
- 07:15
- is because all the targets are known
- 07:17
- when you generate the signal you
- 07:19
- generate the whole signal at once just
- 07:20
- run it like a convulsion on net you get
- 07:22
- your signal then because you have the
- 07:24
- targets you get your error signal
- 07:26
- from that propagate back so training is
- 07:28
- very efficient but of course when it
- 07:30
- comes to sampling time in the end this
- 07:31
- is an autoregressive model and through
- 07:34
- those causal emotions you need to run
- 07:36
- through them one sample at a time so if
- 07:38
- you are sampling let's say 24 kilohertz
- 07:39
- 24,000 samples per second you need to
- 07:42
- generate one sample at a time just like
- 07:44
- you see in this animation and of course
- 07:46
- this is painful this is painful but in
- 07:49
- the end it works quite well and we can
- 07:51
- generate very very high quality audio
- 07:53
- with this so what I want to do is I want
- 07:58
- to actually I want to I want to make you
- 08:01
- listen to the unconditional samples from
- 08:04
- this model so rag model the speed signal
- 08:07
- and without any conditioning on text or
- 08:10
- anything just take the audio signal and
- 08:12
- model that with model that it wavenet
- 08:14
- and then when you sample this is the
- 08:17
- kind of so as you can see or here
- 08:30
- hopefully the the quality is very high
- 08:35
- and this is modeling really the raw
- 08:37
- audio grow audio signal and this is
- 08:40
- completely unconditional so what you
- 08:42
- hear is sometimes you even hear short
- 08:44
- words like okay from and then if you try
- 08:48
- to listen all the tonation and
- 08:49
- everything sounds quite natural and
- 08:52
- sometimes it feels like you are
- 08:53
- listening to someone speaking in a
- 08:55
- language that you don't know so the the
- 08:57
- main characteristics of the of the
- 09:00
- signal is all captured there so in terms
- 09:02
- of dependencies we are looking into like
- 09:04
- something like several thousand samples
- 09:06
- of dependencies are actually properly
- 09:09
- and correctly modelled there and then of
- 09:12
- course sorry and then of course what you
- 09:16
- can do is you can you can augment this
- 09:18
- model by conditioning on a text signal
- 09:22
- that is associated with the signal that
- 09:24
- you want to generate and by conditioning
- 09:26
- on the text signal now you have a
- 09:28
- generative model a conditional
- 09:30
- generative model that actually solves a
- 09:32
- real-world problem just by itself and
- 09:34
- turn deep learning right so
- 09:37
- the text you create the linguistic
- 09:38
- embeddings from that using those
- 09:40
- linguistic embeddings you can generate
- 09:42
- the signal and then and then it starts
- 09:46
- it's not talking right so it's a it's a
- 09:48
- solution to the whole text to speech
- 09:51
- synthesis problem that as you know is
- 09:53
- very very common used in in in real
- 09:57
- world sorry alright so when we did this
- 10:03
- the the bayonet model and this was
- 10:07
- around like almost two years ago now we
- 10:10
- looked at the we looked at equality when
- 10:12
- we use it as a TTS model and in green
- 10:15
- what you see is the quality of the human
- 10:17
- speech I can obtain through this mean
- 10:19
- opinion scores and in blue you see the
- 10:21
- wavenet and the other colors are the
- 10:23
- other models that were the best models
- 10:25
- around and at the time and you can see
- 10:27
- that they met close the gap between the
- 10:30
- human called speech and other models by
- 10:33
- by a big margin so at the time this this
- 10:37
- really got us excited because now we
- 10:39
- actually had a model a deep learning
- 10:41
- model that comes with all the
- 10:42
- flexibilities and advantages of doing
- 10:44
- deep learning and at the same time it's
- 10:46
- modeling raw audio and it is it is it is
- 10:49
- very very high quality
- 10:50
- I could play text to speech samples that
- 10:53
- is generated by this model but actually
- 10:55
- what you can do is what I'm going to go
- 10:56
- into next if you are using Google
- 10:58
- assistant right now you are already
- 10:59
- hearing back that there because this is
- 11:01
- already in production so anyone who's
- 11:03
- using Google assistant and like querying
- 11:05
- Wikipedia and things like that the the
- 11:08
- speech that is generated there is
- 11:10
- actually coming from the very net model
- 11:11
- and what I want to do is I want to
- 11:13
- explain how we how we did that and that
- 11:18
- brings me into our next project that we
- 11:20
- did in the wagonette in the very net
- 11:22
- domain this is the parallel way net
- 11:24
- power the net project so of course when
- 11:27
- you have a research project and at some
- 11:29
- point you realize that okay it is
- 11:30
- actually lands it actually lands itself
- 11:33
- into the solution of a real-world
- 11:34
- problem and you want to put it into
- 11:37
- production in a very challenging
- 11:39
- environment then then of course it
- 11:41
- requires much more than our little
- 11:44
- research group so this was a big
- 11:45
- cooperation between the D point research
- 11:47
- applied and the Google
- 11:48
- speech teams actually so in this slide
- 11:52
- what but what what I show is basis the
- 11:55
- the the basic ingredients of how we turn
- 11:58
- a wave net architecture into a
- 12:01
- feed-forward and parallel architecture
- 12:03
- because what we realize pretty soon when
- 12:06
- we started when we try to attempt doing
- 12:09
- doing putting putting a system like this
- 12:13
- into production was actually speed of
- 12:15
- course is very important quality is very
- 12:17
- very important but the the importance is
- 12:19
- of speed is it is not enough to actually
- 12:22
- run something in real time the kind of
- 12:24
- constraints that we track those ovals
- 12:26
- like orders of magnitude faster than
- 12:27
- real time even actually being able to
- 12:30
- run in constant time so when one day
- 12:32
- when the constraint becomes being able
- 12:34
- to run in constant time the only thing
- 12:36
- you can do is create a feed-forward
- 12:38
- Network and then paralyze the signal
- 12:40
- generation right so that is what we did
- 12:43
- so in this slide at the top what you see
- 12:45
- is the usual wavenet model we call it
- 12:48
- the teacher now in the setting this
- 12:49
- wavenet model is pure trained and it is
- 12:52
- fixed and it is used as a scoring
- 12:55
- function at the bottom what you see is
- 12:57
- the generator that we call the student
- 12:59
- and this student model is again an
- 13:02
- architecture that is very close to write
- 13:04
- net but it is a it is it is run as a
- 13:07
- feed-forward convolutional network and
- 13:09
- the way it is run is and it is trained
- 13:11
- is actually it has two components one
- 13:13
- component is coming from a net we know
- 13:15
- that it is very efficient in training as
- 13:17
- I said but slow in something the other
- 13:19
- the other thing is based on the inverse
- 13:21
- autoregressive flow work that was done
- 13:22
- by the king - colleagues at opening I
- 13:24
- last year and and and and this this
- 13:28
- structure gives gives us the capability
- 13:30
- to actually get a input noise signal in
- 13:33
- and slowly transform that noise signal
- 13:36
- into a into a proper distribution that
- 13:39
- is going to be the speed signal right so
- 13:42
- the way we train this is random noise
- 13:44
- goes in together with the linguistic
- 13:46
- features through layers and layers of
- 13:48
- these flows the signal gets that that
- 13:50
- random noise gets transferred into
- 13:52
- speech signal that speed signal goes
- 13:54
- into a net very net is like already the
- 13:57
- best kind of scoring function that we
- 13:59
- can use because it's a
- 14:00
- it's a density model and wavenet scores
- 14:03
- that and that score from that we get the
- 14:06
- gradients back into the generator and
- 14:09
- then we update the generator we call
- 14:11
- this process the proper water density
- 14:12
- distribution but of course when you are
- 14:15
- trying to do real-world things and if
- 14:18
- things are very challenging like speed
- 14:19
- signals that is by itself not enough so
- 14:21
- I have highlighted two components here
- 14:23
- one of them as I said is the magnet
- 14:25
- scoring function the other thing that we
- 14:27
- use is a power loss because what happens
- 14:30
- is when we train the model in this
- 14:32
- manner the signal tends to be very low
- 14:35
- energy sort of like whispering someone
- 14:38
- speaks but they are like whispering so
- 14:39
- during training we sort of edit this
- 14:41
- extra loss that tries to conserve the
- 14:43
- energy of the generated speech and with
- 14:47
- these two the the wavenet scoring and
- 14:49
- the power loss we were already getting
- 14:51
- very high called speed signal but of
- 14:54
- course like the constraints are very
- 14:55
- very tough and what we did was we
- 14:58
- trained another wave net model so we
- 15:00
- sort of used wavenet everywhere right
- 15:01
- that we are generating through a leg net
- 15:03
- through convolution we are using very
- 15:04
- net as a scoring function we again
- 15:07
- trained another very net model this time
- 15:08
- we used it as a speech recognition
- 15:10
- system and that is the perceptual loss
- 15:12
- that you see there so we train the wave
- 15:14
- net again as a speech recognition system
- 15:16
- what we do is during training of course
- 15:18
- you have the text and the corresponding
- 15:21
- speech signal we generate the we
- 15:25
- generate the corresponding speech
- 15:27
- through our generator we get the text
- 15:29
- give that the speech recognition system
- 15:30
- the speech recognition system of course
- 15:32
- not needs to decode we generated signal
- 15:35
- into those into that text right and we
- 15:37
- get the error from there propagate back
- 15:39
- into our generator so that's another
- 15:41
- sort of quality improvement that we get
- 15:42
- by using speech recognition as a
- 15:45
- perceptual loss in our generation system
- 15:47
- and the last thing that we did was using
- 15:51
- a contrasting term that basically uses
- 15:53
- okay we generate a signal conditioned on
- 15:55
- some text you can you can create a
- 15:58
- contrast applause we're saying that the
- 16:01
- signal that is generated with the
- 16:02
- corresponding text is it should be
- 16:05
- different than the same signal if it if
- 16:07
- it was conditioned on a separate text
- 16:09
- right
- 16:10
- there's a contrasting luster so more
- 16:12
- specifically what we have is in the end
- 16:14
- we end up with these four terms at the
- 16:18
- top we see that the the original sort of
- 16:22
- using vena there's a scoring function
- 16:24
- the problem with advances the
- 16:25
- distillation idea then we have the power
- 16:28
- loss that that uses Fourier transforms
- 16:31
- eternal to to conserve the energy and
- 16:34
- the contrastive term and find out the
- 16:36
- perceptual was that does the that does
- 16:40
- the speech of cognition and when we all
- 16:42
- these then of course what we did was we
- 16:44
- looked at the quality now what what I'm
- 16:47
- showing here is the quality with respect
- 16:49
- to the again the best non wavenet model
- 16:52
- so this is sort of like a year after the
- 16:54
- original research pretty much exactly a
- 16:57
- year and so during that time of course
- 17:00
- the the best speech synthesis models
- 17:02
- also improved but wavenet was still
- 17:04
- better than better than anything else
- 17:06
- and it was matching the quality of so
- 17:08
- the new magnet the parallel bayonet is
- 17:11
- exactly matching the quality of the of
- 17:15
- the original magnitude and what what I'm
- 17:18
- showing here is three different US
- 17:20
- English voices and also Japanese and
- 17:21
- this is the kind of thing that we always
- 17:23
- want from deep learning right the
- 17:25
- ability to generalize to new datasets to
- 17:27
- new domains so we have developed all
- 17:29
- this model one practically one single US
- 17:31
- English voice and it was just a matter
- 17:33
- of collecting or getting another data
- 17:35
- set from another either speaker or
- 17:38
- another language like some speaker
- 17:41
- speaking Japanese you just get that run
- 17:43
- it and there you go you have a speech
- 17:45
- synthesis you have a production called
- 17:46
- speaks into the system just by doing
- 17:48
- that this is the kind of thing that we
- 17:50
- really like from deep line right and and
- 17:52
- if you are thinking about from from deep
- 17:54
- learning and if you are thinking about
- 17:55
- unsupervised learning I think this is
- 17:57
- this is this is a very good
- 17:58
- demonstration of that
- 17:59
- so before switching to the next one I
- 18:02
- also want to mention that we have also
- 18:04
- done some further work on this called
- 18:06
- wave RN and that is recently published
- 18:08
- and
- 18:09
- I encourage you to look into that one
- 18:11
- too that's a very interesting piece of
- 18:12
- work also for generating speech at very
- 18:15
- very high speed the next thing I want to
- 18:18
- talk about is the Impala architecture
- 18:20
- the new agent architecture that I said
- 18:22
- because as I said so now wavenet is a
- 18:25
- sort of in a classical sense of of
- 18:30
- unsupervised model that actually can
- 18:32
- solve a real world problem now the next
- 18:35
- thing I want to sort of start talking
- 18:36
- about is this new different way of doing
- 18:38
- unsupervised learning but for that most
- 18:41
- another exciting bit is to be able to do
- 18:44
- deep reinforcement learning at scale
- 18:47
- sorry all right so I want to sort of
- 18:54
- motivate why do we want to actually push
- 18:56
- our deep reinforcement learning models
- 18:57
- further and further because most of the
- 18:59
- time what we do because this is a new
- 19:01
- area is we take sort of like very simple
- 19:05
- tasks in in some simple environments and
- 19:08
- what we try to do is we try to train an
- 19:10
- agent that shows a single task in that
- 19:12
- environment well what we what we want to
- 19:15
- do is we want to go further than that
- 19:16
- right like again going back to the point
- 19:18
- of generalization and being able to
- 19:19
- solve multiple tasks we have created the
- 19:22
- new task set this is an open source task
- 19:24
- set that we have like we have an open
- 19:26
- source environment called vm lab and as
- 19:28
- part of that we have created this new
- 19:29
- task set vm lab 30 it is 30 environments
- 19:33
- that are sort of covering tasks around
- 19:36
- language memory and navigation and those
- 19:38
- kinds of things and the goal is not to
- 19:41
- solve each one of them individually the
- 19:43
- goal is to have one single agent one
- 19:45
- single network that is that is solving
- 19:48
- all those thoughts all at the same time
- 19:50
- there is nothing custom in that agent
- 19:52
- that is specific to any single one of
- 19:55
- these environments when you look at
- 19:56
- those environments I'm showing some of
- 19:59
- those here the agency has a first-person
- 20:02
- view so it is in like a maze-like
- 20:04
- environment and the agent has a
- 20:06
- first-person view camera input and it
- 20:08
- can navigate around go forward backwards
- 20:10
- and rotate around look up down jump and
- 20:13
- those kinds of things and and it is
- 20:16
- solving all different kinds of tasks
- 20:18
- that are that are catered to test
- 20:19
- different
- 20:20
- kinds of kinds of abilities but the goal
- 20:22
- is as I said again to solve all of them
- 20:24
- at the same time one thing that becomes
- 20:26
- really really important in this case is
- 20:27
- of course the stability of our
- 20:29
- algorithms because now we are not
- 20:32
- solving one single task we are solving
- 20:34
- 30 of them and we want to really stable
- 20:36
- models because we don't have the chance
- 20:37
- to tune hyper parameters one single task
- 20:39
- anymore and of course what becomes
- 20:41
- really important is task interference
- 20:43
- right hopefully what we expect again by
- 20:45
- using deep learning is this is like a
- 20:47
- multi task setting and in this multi
- 20:48
- task setting we hope to see positive
- 20:51
- transfer rather than task interference
- 20:53
- and and and we hope to demonstrate this
- 20:55
- in this in this challenging
- 20:56
- reinforcement of a reinforcement
- 20:58
- learning domain - okay I sort of
- 21:03
- realized that I needed to put a slide
- 21:05
- about by deep reinforcement learning
- 21:07
- because a little bit to my surprise that
- 21:10
- was actually not much reinforcement
- 21:11
- learning in this conference this year
- 21:12
- and I wanted to sort of a little bit
- 21:15
- touch on why I think is important for
- 21:18
- for the deep learning community before
- 21:20
- this community to actually do deep
- 21:22
- reinforcement learning because it is to
- 21:24
- me it is at the core of if if one of the
- 21:26
- goals that we work for here is AI then
- 21:28
- it is at the core of order right
- 21:30
- reinforcement learning is a very general
- 21:32
- framework for it
- 21:33
- for doing sequential decision-making for
- 21:36
- learning sequential decision making
- 21:38
- tasks and deep learning on the other
- 21:40
- hand of course is the best model that we
- 21:43
- have the best set of algorithms we have
- 21:45
- to learn representations and
- 21:47
- combinations of those combinations of
- 21:51
- these two different models is is the
- 21:55
- most sort of like arm is the best answer
- 21:58
- so far we have in terms of learning very
- 22:00
- good state representations of very
- 22:03
- challenging tasks that are not just for
- 22:05
- like solving toy domains but actually to
- 22:08
- solve challenging real world problems of
- 22:11
- course there are many things that are
- 22:12
- there are open problems there like some
- 22:14
- of them that are sort of interesting at
- 22:16
- least for me is the idea of separating
- 22:20
- the computational power of a model from
- 22:22
- the number of weights or the number of
- 22:24
- layers it has or basically again going
- 22:27
- back to on supervised learning learning
- 22:29
- to transfer
- 22:30
- so if we do this deep reinforcement
- 22:32
- learning models with the idea to to
- 22:35
- actually generalize to transfer okay so
- 22:39
- the Impala agent is based on the on
- 22:44
- another work that we have done couple of
- 22:46
- years ago called the a synchronous
- 22:48
- advantage actor critic the a3c model in
- 22:50
- the end it's a it's opposed to gradient
- 22:53
- methods but you have is like that I
- 22:54
- tried to sort of cartoonishly explain
- 22:56
- that in the in the in the figure at
- 22:58
- every time step the agent sees the
- 23:00
- environment and at that time step the
- 23:03
- agent outputs a post distribution and
- 23:06
- also the also the value function the
- 23:08
- value function is the agents expectation
- 23:12
- of the total amount of reward that it's
- 23:14
- going to get until the end of the
- 23:16
- episode being in that state all right
- 23:18
- and the policy is the distribution over
- 23:19
- the actions that the agent has and at
- 23:21
- every time step the agent looks at the
- 23:23
- environment and updates is policy so
- 23:25
- that it can be can actually act in the
- 23:27
- environment and it updates his value
- 23:28
- function and the way you train this is
- 23:30
- with the with the post the gradient
- 23:32
- intuitively this is actually is actually
- 23:34
- very simple what you do is the gradient
- 23:36
- of the policy is scaled by the
- 23:39
- difference between the total reward that
- 23:41
- the agent actually gets in the
- 23:43
- environment - the baseline and the
- 23:46
- baseline is the value function right so
- 23:48
- what it means is if the agent ends up
- 23:50
- doing better than what the value
- 23:52
- function what its assumption was then
- 23:55
- it's a good thing you have a positive
- 23:56
- gradient you're going to reinforce your
- 23:57
- understanding of the environment if the
- 23:59
- agent does worse than what it got so
- 24:02
- well so the value was higher than the
- 24:04
- total reward that you got then you have
- 24:06
- a negative gradient you need to shuffle
- 24:08
- things around and the way you learn the
- 24:10
- value function is by the usual and step
- 24:13
- and step TD error now the a3c algorithm
- 24:17
- so this was the actor critic part the a
- 24:20
- synchronous party in 3 C algorithm is
- 24:22
- composed of multiple actors and each
- 24:24
- actor independently operates in the
- 24:27
- environment and and and collecting for
- 24:30
- collect observations
- 24:32
- acts in the environment computes the
- 24:34
- posted gradients and and
- 24:37
- completes the gradients with respect to
- 24:39
- the parameters of its network then what
- 24:41
- it does is it sends those gradients back
- 24:43
- into the parameter server then the
- 24:45
- parameter server collects all these
- 24:46
- gradients from all different actors
- 24:48
- combines them together and then shares
- 24:50
- those parameters with all the actors
- 24:52
- around now what happens in this case is
- 24:55
- as you increase the number of actors
- 24:56
- this is the usual asynchronous
- 24:58
- stochastic gradient descent setup as the
- 25:01
- number of actors increases the stale
- 25:03
- grade the staleness of the gradients
- 25:05
- becomes a problem so what happens is in
- 25:08
- the end is distribution the experience
- 25:10
- collection is actually something very
- 25:11
- very advantages it's very good and but
- 25:14
- what happens is communicating gradients
- 25:16
- might become a bottleneck as you try to
- 25:17
- really scale things up so for that what
- 25:21
- we tried was a different architecture
- 25:27
- the idea of a sanctuary server is
- 25:31
- actually quite useful but rather than
- 25:33
- using it to just to just do the
- 25:36
- accumulate the parameter updates the
- 25:39
- idea of that learner is to make the
- 25:42
- centralized component into a learner so
- 25:45
- the all the whole learning algorithm is
- 25:46
- is contained in that what the actors
- 25:48
- does is only act in the environment not
- 25:50
- compute the gradients or anything
- 25:52
- send the observations back into learners
- 25:54
- to the learner and the learner sends the
- 25:56
- parameters back and in this in this way
- 25:58
- what you are doing is you are completely
- 26:00
- decoupling what happens about your
- 26:02
- experience collection in your
- 26:04
- environments from your learning
- 26:06
- algorithm and in this way you are
- 26:07
- actually gaining a lot of robustness
- 26:09
- into noise in your environments
- 26:11
- sometimes rendering times vary some some
- 26:14
- environments are slow some environments
- 26:16
- are fast
- 26:17
- all that is completely decoupled from
- 26:18
- your learning algorithm but of course
- 26:20
- what you need is a good learning
- 26:22
- algorithm to to be able to deal with
- 26:24
- that kind of variation so in the end we
- 26:27
- empower what we have is we have a very
- 26:29
- efficient decoupled backward pass if you
- 26:31
- were so actors generate trajectories as
- 26:33
- I said but then but that that decoupling
- 26:37
- creates this of posionous write the
- 26:39
- policy in the actors the behavior poles
- 26:41
- if you will is separate from the policy
- 26:44
- in the learner
- 26:45
- target policy so what we need is enough
- 26:47
- posted earning of course there are many
- 26:48
- of posted learning algorithms but we
- 26:50
- really wanted to have a post gradient
- 26:52
- method and and for that we developed
- 26:56
- this new method called V trace and it's
- 26:58
- an off-post advantage critic algorithm
- 27:00
- the advantage of V traces it is using
- 27:04
- these truncated important sampling
- 27:06
- ratios to actually come up with an
- 27:08
- estimate for the valley so because of
- 27:12
- there is this imbalance between the
- 27:13
- learners that and the actors you need to
- 27:15
- balance those you need to balance that
- 27:17
- difference the good thing about this is
- 27:19
- it's an algorithm is a smooth transition
- 27:22
- between the on post case and off policy
- 27:24
- case when they when the actors and the
- 27:26
- learner are completely in sync so you're
- 27:29
- in the on policy case the algorithm
- 27:30
- actually boils down to the usual a3c
- 27:33
- update with the n steps bellman equation
- 27:35
- if they become more separate than the
- 27:38
- correction of the algorithm kicks in and
- 27:41
- then you have the corrected corrected
- 27:43
- estimate the algorithm has two main
- 27:47
- components to those truncation factors
- 27:49
- to control two different aspects of the
- 27:52
- of off learning one of them is the robe
- 27:55
- which controls the reach value function
- 27:58
- the algorithm is going to converge
- 28:00
- towards the behavior the value function
- 28:02
- that code that corresponds to the
- 28:04
- behavior policy or the value function
- 28:06
- that corresponds to the target policy in
- 28:07
- the learner and the other one controls
- 28:09
- the speed of convergence the C factor by
- 28:13
- by controlling the by controlling the
- 28:15
- truncation that it can it can increase
- 28:17
- or decrease the variance in learning and
- 28:19
- the stick and it can it can it can have
- 28:22
- an effect on the speed of convergence
- 28:24
- now than me when we tested this of
- 28:28
- course the goal is to test on all
- 28:29
- environments at once but what we wanted
- 28:31
- to do was first you look at the single
- 28:33
- task is also we look at five different
- 28:35
- environments and we see that in these
- 28:37
- environments the Impala algorithm always
- 28:39
- very stable it performs at the top so
- 28:44
- the comparisons here are the Impala
- 28:45
- algorithm the batch a3c method and they
- 28:50
- touch a to C method and then different
- 28:52
- versions of a three C algorithms and you
- 28:54
- can see that Impala and batch a to C are
- 28:56
- always at
- 28:57
- performing at the top Impala seems to be
- 29:00
- doing fine
- 29:01
- they're like the the dark blue curve and
- 29:03
- and this gives us the sort of feeling
- 29:06
- that okay we have a nice outlet now of
- 29:08
- course the other thing that is very
- 29:09
- important and that is discussed a lot is
- 29:12
- the stability of these algorithms right
- 29:14
- I actually really like these floods
- 29:16
- since during the a3c work actually keep
- 29:19
- looking at these floods and we always
- 29:21
- put them in the papers the plot here is
- 29:23
- on the x-axis we have the heart we have
- 29:25
- the hyper parameter combinations when
- 29:27
- you when you of course trade any model
- 29:29
- what we do all of us is we do some sort
- 29:31
- of hyper parameter sweep and here what
- 29:33
- we are doing is we are looking at the
- 29:35
- final score that we achieve with every
- 29:37
- single hyper parameter setting that we
- 29:39
- that we get and you sort it and in the
- 29:42
- in this kind of thought what you have is
- 29:44
- the the the KERS the algorithms that are
- 29:47
- at the top and that our most flood are
- 29:49
- the most like better performing and most
- 29:52
- stable algorithms right and what we see
- 29:54
- here is Impala is always of course it's
- 29:57
- achieving better results but it's not
- 29:58
- achieving those results because there is
- 30:00
- one sort of lucky - parameter setting is
- 30:03
- consistently at the top and you can see
- 30:05
- that it's not of course completely flat
- 30:07
- because in the end we are sort of
- 30:08
- searching over three orders of magnitude
- 30:10
- in parameter settings the but we can see
- 30:18
- that the algorithm is actually quite
- 30:19
- stable now when we look at our our our
- 30:22
- main goal here what we are looking at in
- 30:24
- on the x-axis we have the wall clock
- 30:26
- time and on the y-axis we have the sort
- 30:29
- of the normalized score and the and the
- 30:32
- red line that you see there is the a3
- 30:34
- see and you can see that Impala not only
- 30:37
- H is much better of course if they
- 30:39
- choose them much much much faster the
- 30:41
- other thing is comparing the green and
- 30:43
- the orange line thirds that is the
- 30:45
- comparison between training Impala in an
- 30:47
- expert setting versus a multi task City
- 30:49
- and we see that it achieves better
- 30:51
- scores like the faster which again gives
- 30:54
- us the idea that we are actually seeing
- 30:56
- positive transfer it's it's a like to
- 30:58
- like setting the all the all the all the
- 31:02
- details of the network and the agent are
- 31:03
- the same in one case you have one
- 31:05
- network
- 31:06
- tasks and in other case you train the
- 31:08
- same network on all the tasks and what
- 31:10
- you achieve is a better result because
- 31:12
- of the positive transfer between those
- 31:14
- tasks and what happens is if you give
- 31:17
- Impala more resources you end up with
- 31:20
- this almost vertical takeoff from there
- 31:23
- right and what you have is you can
- 31:24
- actually solve this challenging turkey
- 31:26
- task domain in under 24 hours given the
- 31:29
- resources and that is the kind of
- 31:30
- algorithmic sort of power that we want
- 31:33
- to be able to train these very highly
- 31:35
- scalable agents now why do we want to do
- 31:38
- that that is the point that I want to
- 31:40
- come next and and and in the final part
- 31:43
- this is the new spiral algorithm that I
- 31:46
- want to talk about now just quickly
- 31:49
- going back to the original ideas that
- 31:52
- that I talked about unsupervised
- 31:54
- learning is also about explaining
- 31:56
- environments and generating samples but
- 31:59
- maybe generate examples by explaining
- 32:01
- environments and we talked about the
- 32:03
- fact that when we have these deep
- 32:04
- learning models like magnet we can
- 32:06
- generate amazing samples but at the same
- 32:08
- time maybe there's a different way we
- 32:09
- can do these things less implicit in the
- 32:11
- Sun set when we generate these samples
- 32:13
- they come with some explanation and that
- 32:15
- explanation can go through some using
- 32:17
- some tools in this particular case what
- 32:20
- we are going to do is we are going to
- 32:22
- use a painting tool and we are going to
- 32:24
- learn to control this painting tool it's
- 32:26
- a real drawing program and we are going
- 32:28
- to basically generate a program that the
- 32:31
- painting tool will use to generate the
- 32:33
- image and the main idea that I want to
- 32:36
- convey is by using tools by it by by
- 32:39
- learning how to use tools that are
- 32:41
- already available that we have actually
- 32:44
- we can start thinking about different
- 32:46
- kinds of generalizations that I'll try
- 32:47
- to demonstrate so in real word we have a
- 32:50
- lot of examples of programs and their
- 32:53
- executions and the results of those
- 32:55
- programs they can be arithmetic programs
- 32:57
- floating programs or even architectural
- 32:59
- blueprints right and what we do is
- 33:02
- because we know we have an information
- 33:06
- on that generation process when we see
- 33:10
- the results we can go and try to infer
- 33:13
- what was the program what was the
- 33:14
- blueprint that generated that that
- 33:16
- particular input so we can do this and
- 33:18
- the goal is to be able to do this with
- 33:20
- our with our agents too
- 33:22
- specifically we are going to use this
- 33:24
- environment called lead my paint it is
- 33:27
- actually a professional-grade
- 33:28
- open-source drawing library and it's
- 33:30
- used worldwide by many artists what we
- 33:33
- are doing is we are using a limited
- 33:34
- interface basically learning - learning
- 33:36
- to draw brushstrokes we are going to
- 33:39
- have an agent that does that the agent
- 33:41
- in the end called spiral has three main
- 33:43
- components first of all is the agent
- 33:45
- that generates the brushstrokes sort of
- 33:47
- I like to see that as writing the
- 33:49
- program the second one is the
- 33:51
- environment to lead my paint so the
- 33:53
- brushstrokes come in environment turns
- 33:55
- those into brushstrokes in the canvas
- 33:57
- and that cameras got those into a
- 34:00
- discriminator and the discriminator is
- 34:01
- trained like again and that
- 34:04
- discriminative looks at the generated
- 34:05
- image and says does this look like a
- 34:07
- real drawing and then gives a score and
- 34:09
- that score is opposed to the usual gun
- 34:11
- training rather than propagating the
- 34:13
- gradient packs we get that score and we
- 34:16
- train our agent with that score is a
- 34:18
- reward so when you think about this all
- 34:20
- these three components coming together
- 34:21
- you have an unsupervised learning model
- 34:23
- similar to the Ganz but rather than
- 34:26
- generating in the pixel space we
- 34:28
- generate in this program space and the
- 34:30
- training is done through the done
- 34:33
- through the reward that the agent itself
- 34:35
- also learns so we are sort of trusting
- 34:37
- another neural net just like in Gans
- 34:39
- setup to actually guide learning but not
- 34:41
- through its gradients just treat the
- 34:42
- score function so in my opinion it makes
- 34:44
- it in certain cases it makes it very
- 34:46
- very sort of capable of using a
- 34:49
- different kinds of tools so as I said
- 34:52
- this agent the the reinforcement
- 34:54
- learning part of the agent is completely
- 34:56
- the same as the Impala
- 34:57
- so we now that we have an agent that can
- 35:00
- actually solve really challenging
- 35:02
- reinforcement learning setups we take it
- 35:03
- and put it into this environment
- 35:05
- augmented with the ability to learn a
- 35:08
- discriminative function to actually have
- 35:11
- the reward the to emphasize again the
- 35:13
- important thing here is yes we have an
- 35:15
- agent but there is no environment that
- 35:17
- actually says that ok this is the reward
- 35:19
- that the agent should get the reward
- 35:22
- generation is also inside the agent
- 35:24
- thanks to again all the unsupervised
- 35:26
- learning models
- 35:26
- that is actually being studied here so
- 35:29
- we specifically use against it up there
- 35:31
- so can we generate the first thing of
- 35:35
- course we try is when you are doing
- 35:36
- unsupervised learning from scratch again
- 35:38
- you go back to illness right you start
- 35:40
- from M&S; and initially of course it's
- 35:42
- generating various crash pad like things
- 35:44
- but then through training it becomes
- 35:47
- better and better and better here in the
- 35:49
- middle you see that now the the agent
- 35:52
- learned - these are complete
- 35:53
- unconditional samples again the ones
- 35:55
- that you see in the middle it learn to
- 35:57
- create these trucks that generates these
- 35:59
- digits right to emphasize this this
- 36:01
- agent has never seen strokes that are
- 36:04
- coming from real people how we draw
- 36:06
- digits it learned to experiment with
- 36:09
- these drugs and it's sort of built its
- 36:11
- own policy to create these strokes that
- 36:14
- would generate these images of course
- 36:16
- you can train the whole set up is a
- 36:17
- conditional generation process to
- 36:19
- recreate a given image - I think the
- 36:22
- main thing about this is it's learning
- 36:24
- an unsupervised way to throw the strokes
- 36:26
- I see it as the environment the the
- 36:29
- league my paint environment sort of
- 36:31
- gives us a grounded bottleneck to
- 36:33
- actually create a meaningful
- 36:35
- representation space of course the next
- 36:38
- thing we tried was on the glut and again
- 36:39
- you see the same things it can generate
- 36:41
- unconditional meaningful only glove
- 36:43
- looking like samples or it can recreate
- 36:45
- on the glut samples but then
- 36:48
- generalization right so here what we
- 36:50
- tried was train the model on Omniglot
- 36:52
- and then ask it to generate endless
- 36:55
- digits right this is what you see in the
- 36:57
- middle middle road there can it draw in
- 36:59
- this digits this has never seen amnesty
- 37:02
- just before but we all know that only
- 37:04
- God is more general than in this and it
- 37:06
- can do it right given an amnesty yet
- 37:08
- it can actually draw that the network
- 37:10
- itself has never seen any any amnesty
- 37:13
- just during its training then we tried
- 37:17
- Smiley's right there line drawings okay
- 37:19
- so it can giving it smiley it can also
- 37:21
- drop Smiley's - that is great so can we
- 37:25
- do more we did this we took this cartoon
- 37:30
- drawing and this is done by chopping it
- 37:33
- up into 64 by 64 pieces and it's a
- 37:36
- general line drawing right again this is
- 37:38
- the
- 37:39
- imagine that if the Train using Omniglot
- 37:40
- and now you can see that it can actually
- 37:43
- recreate that trolling certain areas are
- 37:46
- read about right back around eyes
- 37:47
- insides they are really complicated but
- 37:49
- in general you can see that it is
- 37:51
- actually capable of generating those
- 37:52
- drawings so this gives you an idea of
- 37:55
- okay generalization I can I can sort of
- 37:58
- train on one domain and generalize the
- 38:00
- new ones
- 38:01
- so can I push it further the next thing
- 38:03
- that we tried was okay the advantage of
- 38:06
- using a tool is you have a meaningful
- 38:08
- representation space that we can
- 38:11
- hopefully transfer that representation
- 38:13
- space into a new environment so here
- 38:15
- what we do is again the same agent that
- 38:17
- is trained using Omniglot we transfer
- 38:19
- that simulated that that simulated
- 38:22
- environment into real world the way we
- 38:25
- do that is we we took that same program
- 38:28
- and our friends at the robotics group at
- 38:31
- deep mine wrote a controller to control
- 38:36
- that robotic arm to take that program
- 38:38
- and drove it this whole like experiment
- 38:41
- happened in under a week really and what
- 38:43
- we ended up with was the same agent the
- 38:47
- same agent it is not fine-tuned through
- 38:49
- all the setup or anything the same agent
- 38:51
- generates its brushstroke programs and
- 38:54
- then that program goes into a controller
- 38:56
- that can be realized by a real robotic
- 38:59
- arm right the advantage of doing this is
- 39:01
- the reason we can do this is the
- 39:03
- environment that we used is a real
- 39:05
- environment we didn't sort of create
- 39:07
- that environment the latent space if you
- 39:10
- will is not something some arbitrary
- 39:12
- latent space that we created because
- 39:14
- it's a latent space that is defined by
- 39:17
- us that is as a meaningful to space and
- 39:20
- the reason we create those tools is to
- 39:21
- solve many different problems anyways
- 39:24
- right and this is an example of that
- 39:25
- using that tool space gives us the
- 39:27
- ability to actually transfer its
- 39:29
- capability so with that I want to
- 39:32
- conclude I tried to give an explanation
- 39:36
- of you think about generative models and
- 39:39
- unsupervised learning and to me of
- 39:41
- course like I'm a hundred percent sure
- 39:43
- everyone agrees that our aim is not to
- 39:45
- just look at images right our aim is to
- 39:47
- do much more
- 39:48
- than that and I tried to give two
- 39:50
- different two different aspects one of
- 39:52
- them is the kind of genital models that
- 39:55
- we can do actually right now can solve
- 39:57
- real world problems like we have seen in
- 39:59
- Vienna and also we can think about a
- 40:01
- different kind of setup where we have
- 40:03
- agents actually training and and
- 40:06
- generating interpretable programs right
- 40:09
- that is an important aspect that we have
- 40:10
- seen that conversation coming up here
- 40:12
- actually through several of the talks
- 40:15
- here that being interbeing able to
- 40:17
- generate interpretable programs is one
- 40:19
- of the bottlenecks that we face right
- 40:21
- now because there are many critical
- 40:23
- applications that we want to solve there
- 40:24
- are many tools that we're gonna eat you
- 40:26
- eyes and this is one sort of step
- 40:28
- towards that best way how how I see and
- 40:30
- being able to do these requires us to
- 40:33
- create these very capable reinforcement
- 40:37
- learning agents that rely on new
- 40:39
- algorithms that we need to that we need
- 40:41
- to work on with that thank you very much
- 40:44
- I think I want to thank all my
- 40:46
- co-operators for their for their help on
- 40:49
- this thank you very much
- 40:50
- [Applause]
- 40:50
- [Music]
- 40:57
- [Applause]
- 41:06
- we have time for maybe one or two
- 41:09
- questions
- 41:24
- okay so I have 100 so how do you think
- 41:27
- about scaling to like more like general
- 41:32
- domains beyond some simple strokes how
- 41:37
- to generate like realistic scenes right
- 41:41
- so one thing that I haven't shown here
- 41:43
- actually yes creating realistic scenes
- 41:46
- is is one case one thing that I haven't
- 41:49
- talked about is actually as part of
- 41:51
- sorry as part of this work it's actually
- 41:54
- in the paper one thing that the team did
- 41:57
- by the way I had to mention and this was
- 41:59
- worked on most by Yaroslav gun in
- 42:00
- Melbourne he's actually PhD student at
- 42:03
- Mira and he spent his summer with us
- 42:04
- doing his internship so as an amazing
- 42:06
- job for actually doing it during an
- 42:08
- internship pretty big congratulations to
- 42:10
- him so one thing that that that we did
- 42:12
- was actually try to generate images so
- 42:14
- we took the survey data set and use the
- 42:16
- same drawing program to actually to
- 42:20
- actually draw those and in that case our
- 42:23
- setup is just scaling towards those like
- 42:26
- the same stuff set up actually scales
- 42:27
- because it's a general drawing - and you
- 42:30
- can control the color and we can do that
- 42:32
- but it requires a little bit more sort
- 42:35
- of like it was one of the last
- 42:36
- experiments that we did but like it is
- 42:38
- it is sort of in the words thanks for a
- 42:42
- great talker I had a question about the
- 42:44
- Impala results right you had a slide
- 42:47
- where one with a curve where all workers
- 42:51
- are learning versus having one
- 42:54
- centralized sorry centralized learner
- 42:57
- the all workers learning actually does
- 43:00
- better
- 43:01
- than the centralized letter and I found
- 43:04
- that not quite surprising but like you
- 43:07
- know it's great and it's great to see
- 43:10
- the positive transfer between tasks do
- 43:11
- you think
- 43:12
- have you tried that on other Suites of
- 43:13
- tasks do you think it's just because
- 43:14
- it's tasks in this suite of tasks are
- 43:17
- very similar to usually like it
- 43:19
- definitely depends on that but the
- 43:21
- reason we created those tasks it is for
- 43:24
- that reason right in real world what we
- 43:26
- have is we have the visual structure of
- 43:28
- our world is unique so the kind of setup
- 43:31
- that we have in deep defined lab that
- 43:33
- that that tasks it is that it's a
- 43:36
- unified visual environment you have one
- 43:38
- sort of one one one kind of agent with a
- 43:41
- unified action space and now you can
- 43:43
- focus on solving different kinds of
- 43:45
- tasks of course like that is the kind of
- 43:47
- thing that we were testing given all
- 43:48
- these through does it actually is it
- 43:51
- possible to do the multi task positive
- 43:53
- transfer that we see in supervised
- 43:55
- learning cases that we were able to see
- 43:57
- that in reinforcement learning yeah
- 44:01
- hello this is exciting I have a question
- 44:06
- about extending this to maybe more open
- 44:09
- domains so what is the challenge it's a
- 44:13
- challenge to be a number of actions to
- 44:16
- pick because the number of strokes maybe
- 44:19
- the strokes face smaller so what other
- 44:22
- challenge to extend to open domains with
- 44:27
- what do you like what do you have in
- 44:29
- mind is open domains like number of
- 44:31
- actions is definitely a challenge right
- 44:32
- it is definitely one of the big
- 44:34
- challenges that a lot of research in as
- 44:36
- far as I know in RL goes into that but
- 44:39
- that is that is I think only one of the
- 44:41
- main challenges the other challenge of
- 44:42
- course is the straight representation
- 44:45
- that is mainly why we sort of used deep
- 44:48
- learning right because we expect that
- 44:51
- with deep learning we are going to be
- 44:52
- able to learn better representations and
- 44:54
- that still remains as a challenge
- 44:56
- because being able to learn
- 44:57
- representations is not an architectural
- 44:59
- problem only it is also about finding
- 45:03
- the right sort of training set up and
- 45:05
- spyro was an example of that where we
- 45:07
- can get that reward function that that
- 45:08
- reward signal in an unsupervised way
- 45:11
- right and in many different domains
- 45:13
- like there are many different ways we
- 45:15
- can do this but actually finding those
- 45:16
- solutions also part of that
- 45:20
- okay so let's Bank arriving
- 45:24
- [Music]
- 45:27
- [Applause]
- Up next
- AUTOPLAY
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement