Untitled

Announcement:
- Intro to diffusion models for text-to-image and Image Editing
- DreamFusion — the power of pretrained diffusion models for 3D synthesis, and some following works
- InstructNeRF2NeRF — prompt-based editing of 3D models
- APAP — dragging manipulations with diffusion models as priors

Plan:
- About
	- I'll go over a few things, without focusing too much on one single thing, just jumping over cool ideas.
	- This is tough, but we have a whiteboard for any explanations, we can detour at any point
- Context
	- reminder: ASK PEOPLE WHETHER THEY KNOW THIS STUFF
	- Everyone knows about DALL-E / Midjourney / Stable Diffusion
	- There's stuff like ComfyUI and other community efforts for making Stable Diffusion smarter
	- Like taking in human pose, or normals, or depth
	- There have also been attempts at generating multi-view images of the same object with Stable Diffusion, for gaming assets
	- Well, why not use the power of diffusion models for actually creating 3D objects?
- Intro to diffusion
	- REMINDER TO ASK WHETHER ANYONE KNOWS ABOUT DIFFUSION OR NEURAL NETWORKS
	- The base diagram – forward process and back process
	- Funny formula slides
	- Btw, all this math is bullshit, researchers intentionally complicate this to make their papers look smarter
	- Here's the formula (with all the notation, still looks tough)
	- Let's simplify step-by-step
		- Take an image
		- Add noise
		- Ask the model to predict the noise
		- Loss is MSE(true_noise, predicted_noise)
	- Now let's replace with notation + expand the add_noise/reduce_noise, and we have exactly that
	- During sampling, in order to do it over multiple steps, we start from pure noise, denoise, but add it not fully, and so on
- For editing
	- Basic idea from SDEdit — just add noise and denoise with another prompt
	- InstrucPix2Pix — дообучаем принимать исходную картинку для сохранения геометрии
- For 3D generation
	- REMINDER TO ASK ABOUT WHETHER ANYONE KNOWS ABOUT 3D
	- Once again NeRFs — WE TRAIN THEM ONE TIME one the scene
	- Usually we have photos, pass them on to train the NeRF, get the 3D object like that
	- Now what if we take this image from this viewpoint, and instruct "make it more like prompt T", and use that for training
	- That is the idea behind DreamFusion (show images)
	- Works that do the same but finetuning over Objaverse (MVDream)
- For 3D editing
	- Apply the same idea but with a NeRF we already have, and InstructNeRF2NeRF that will preserve the geometry of the object
	- This gets us InstructNeRF2NeRF
- For 3D editing with dragging
	- For simplicity assume we're working with a mesh
	- On it we can define this differential geometry method that minimizes weird angles after moving vertices around, resulting in this nice deformation (ARAP)
	- Now let's add a diffusion model on top of that as prior, this becomes APAP