vazazon

Untitled

Jun 20th, 2025
68
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 44.92 KB | None | 0 0
  1. Here is the transcript of "Andrej Karpathy: Software Is Changing (Again)"
  2.  
  3. 0:01
  4. please welcome former director of AI Tesla Andre Carpathy
  5. 0:07
  6. [Music] hello
  7. 0:14
  8. [Music] wow a lot of people here hello
  9. 0:22
  10. um okay yeah so I'm excited to be here today to talk to you about software in the era of AI and I'm told that many of
  11. 0:30
  12. you are students like bachelors masters PhD and so on and you're about to enter the industry and I think it's actually
  13. 0:36
  14. like an extremely unique and very interesting time to enter the industry right now and I think fundamentally the
  15. 0:42
  16. reason for that is that um software is changing uh again and I say again
  17. 0:48
  18. because I actually gave this talk already um but the problem is that software keeps changing so I actually
  19. 0:54
  20. have a lot of material to create new talks and I think it's changing quite fundamentally i think roughly speaking
  21. 0:59
  22. software has not changed much on such a fundamental level for 70 years and then it's changed I think about twice quite
  23. 1:06
  24. rapidly in the last few years and so there's just a huge amount of work to do a huge amount of software to write and rewrite so let's take a look at maybe
  25. 1:13
  26. the realm of software so if we kind of think of this as like the map of software this is a really cool tool called map of GitHub um this is kind of
  27. 1:21
  28. like all the software that's written uh these are instructions to the computer for carrying out tasks in the digital
  29. Software evolution: From 1.0 to 3.0
  30. 1:26
  31. space so if you zoom in here these are all different kinds of repositories and this is all the code that has been written and a few years ago I kind of
  32. 1:33
  33. observed that um software was kind of changing and there was kind of like a new type of software around and I called
  34. 1:39
  35. this software 2.0 at the time and the idea here was that software 1.0 is the
  36. 1:44
  37. code you write for the computer software 2.0 know are basically neural networks and in particular the weights of a
  38. 1:50
  39. neural network and you're not writing this code directly you are most you are
  40. 1:55
  41. more kind of like tuning the data sets and then you're running an optimizer to create to create the parameters of this neural net and I think like at the time
  42. 2:02
  43. neural nets were kind of seen as like just a different kind of classifier like a decision tree or something like that and so I think it was kind of like um I
  44. 2:09
  45. think this framing was a lot more appropriate and now actually what we have is kind of like an equivalent of GitHub in the realm of software 2.0 And
  46. 2:15
  47. I think the hugging face is basically equivalent of GitHub in software 2.0 and
  48. 2:20
  49. there's also model atlas and you can visualize all the code written there in case you're curious by the way the giant circle the point in the middle uh these
  50. 2:28
  51. are the parameters of flux the image generator and so anytime someone tunes a on top of a flux model you basically
  52. 2:35
  53. create a git commit uh in this space and uh you create a different kind of a image generator so basically what we
  54. 2:41
  55. have is software 1.0 is the computer code that programs a computer software 2.0 are the weights which program neural
  56. 2:48
  57. networks uh and here's an example of Alexet image recognizer neural network now so far all of the neural networks
  58. 2:55
  59. that we've been familiar with until recently where kind of like fixed function computers image to categories
  60. 3:01
  61. or something like that and I think what's changed and I think is a quite fundamental change is that neural
  62. 3:06
  63. networks became programmable with large language models and so I I see this as
  64. 3:12
  65. quite new unique it's a new kind of a computer and uh so in my mind it's uh
  66. 3:18
  67. worth giving it a new designation of software 3.0 and basically your prompts are now programs that program the LLM
  68. 3:25
  69. and uh remarkably uh these uh prompts are written in English so it's kind of a very interesting programming language
  70. 3:33
  71. um so maybe uh to summarize the difference if you're doing sentiment classification for example you can
  72. 3:39
  73. imagine writing some uh amount of Python to to basically do sentiment classification or you can train a neural
  74. 3:46
  75. net or you can prompt a large language model uh so here this is a few short prompt and you can imagine changing it
  76. 3:51
  77. and programming the computer in a slightly different way so basically we have software 1.0 software 2.0 and I
  78. 3:57
  79. think we're seeing maybe you've seen a lot of GitHub code is not just like code anymore there's a bunch of like English
  80. 4:03
  81. interspersed with code and so I think kind of there's a growing category of new kind of code so not only is it a new
  82. 4:09
  83. programming paradigm it's also remarkable to me that it's in our native language of English and so when this
  84. 4:14
  85. blew my mind a few uh I guess years ago now I tweeted this and um I think it
  86. 4:20
  87. captured the attention of a lot of people and this is my currently pinned tweet uh is that remarkably we're now programming computers in English now
  88. 4:28
  89. when I was at uh Tesla um we were working on the uh autopilot and uh we
  90. 4:34
  91. were trying to get the car to drive and I sort of showed this slide at the time where you can imagine that the inputs to
  92. Programming in English: Rise of Software 3.0
  93. 4:41
  94. the car are on the bottom and they're going through a software stack to produce the steering and acceleration
  95. 4:47
  96. and I made the observation at the time that there was a ton of C++ code around in the autopilot which was the software
  97. 4:52
  98. 1.0 code and then there was some neural nets in there doing image recognition and uh I kind of observed that over time
  99. 4:58
  100. as we made the autopilot better basically the neural network grew in capability and size and in addition to
  101. 5:05
  102. that all the C++ code was being deleted and kind of like was um and a lot of the
  103. 5:12
  104. kind of capabilities and functionality that was originally written in 1.0 was migrated to 2.0 so as an example a lot
  105. 5:19
  106. of the stitching up of information across images from the different cameras and across time was done by a neural
  107. 5:24
  108. network and we were able to delete a lot of code and so the software 2.0 stack quite literally ate through the software
  109. 5:32
  110. stack of the autopilot so I thought this was really remarkable at the time and I think we're seeing the same thing again
  111. 5:37
  112. where uh basically we have a new kind of software and it's eating through the stack we have three completely different
  113. 5:42
  114. programming paradigms and I think if you're entering the industry it's a very good idea to be fluent in all of them
  115. 5:48
  116. because they all have slight pros and cons and you may want to program some functionality in 1.0 or 2.0 or 3.0 are
  117. 5:53
  118. you going to train neurallet are you going to just prompt an LLM should this be a piece of code that's explicit etc so we all have to make these decisions
  119. 6:00
  120. and actually potentially uh fluidly trans transition between these paradigms
  121. 6:05
  122. so what I wanted to get into now is first I want to in the first part talk
  123. LLMs as utilities, fabs, and operating systems
  124. 6:10
  125. about LLMs and how to kind of like think of this new paradigm and the ecosystem and what that looks like uh like what
  126. 6:16
  127. are what is this new computer what does it look like and what does the ecosystem look like um I was struck by this quote
  128. 6:23
  129. from Anduring actually uh many years ago now I think and I think Andrew is going to be speaking right after me uh but he
  130. 6:29
  131. said at the time AI is the new electricity and I do think that it um kind of captures something very
  132. 6:34
  133. interesting in that LLMs certainly feel like they have properties of utilities right now so
  134. 6:41
  135. um LLM labs like OpenAI Gemini Enthropic etc they spend capex to train the LLMs
  136. 6:47
  137. and this is kind of equivalent to building out a grid and then there's opex to serve that intelligence over
  138. 6:53
  139. APIs to all of us and this is done through metered access where we pay per
  140. 6:58
  141. million tokens or something like that and we have a lot of demands that are very utility- like demands out of this
  142. 7:03
  143. API we demand low latency high uptime consistent quality etc in electricity
  144. 7:08
  145. you would have a transfer switch so you can transfer your electricity source from like grid and solar or battery or
  146. 7:14
  147. generator in LLM we have maybe open router and easily switch between the different types of LLMs that exist
  148. 7:20
  149. because the LLM are software they don't compete for physical space so it's okay to have basically like six electricity
  150. 7:26
  151. providers and you can switch between them right because they don't compete in such a direct way and I think what's
  152. 7:31
  153. also a little fascinating and we saw this in the last few days actually a lot of the LLMs went down and people were
  154. 7:38
  155. kind of like stuck and unable to work and uh I think it's kind of fascinating to me that when the state-of-the-art LLMs go down it's actually kind of like
  156. 7:45
  157. an intelligence brownout in the world it's kind of like when the voltage is unreliable in the grid and uh the planet
  158. 7:52
  159. just gets dumber the more reliance we have on these models which already is like really dramatic and I think will
  160. 7:58
  161. continue to grow but LLM's don't only have properties of utilities i think it's also fair to say that they have
  162. 8:03
  163. some properties of fabs and the reason for this is that the capex required for
  164. 8:09
  165. building LLM is actually quite large uh it's not just like building some uh power station or something like that
  166. 8:15
  167. right you're investing a huge amount of money and I think the tech tree and uh for the technology is growing quite
  168. 8:22
  169. rapidly so we're in a world where we have sort of deep tech trees research and development secrets that are
  170. 8:28
  171. centralizing inside the LLM labs um and but I think the analogy muddies a little
  172. 8:34
  173. bit also because as I mentioned this is software and software is a bit less defensible because it is so malleable
  174. 8:40
  175. and so um I think it's just an interesting kind of thing to think about potentially there's many analogy
  176. 8:46
  177. analogies you can make like a 4 nanometer process node maybe is something like a cluster with certain max flops you can think about when
  178. 8:53
  179. you're use when you're using Nvidia GPUs and you're only doing the software and you're not doing the hardware that's kind of like the fabless model but if
  180. 8:59
  181. you're actually also building your own hardware and you're training on TPUs if you're Google that's kind of like the Intel model where you own your fab so I
  182. 9:05
  183. think there's some analogies here that make sense but actually I think the analogy that makes the most sense perhaps is that in my mind LLM have very
  184. 9:12
  185. strong kind of analogies to operating systems uh in that this is not just
  186. 9:17
  187. electricity or water it's not something that comes out of the tap as a commodity uh this is these are now increasingly
  188. 9:23
  189. complex software ecosystems right so uh they're not just like simple commodities
  190. 9:29
  191. like electricity and it's kind of interesting to me that the ecosystem is shaping in a very similar kind of way
  192. 9:34
  193. where you have a few closed source providers like Windows or Mac OS and then you have an open source alternative
  194. 9:40
  195. like Linux and I think for u neural for LLMs as well we have a kind of a few
  196. 9:46
  197. competing closed source providers and then maybe the llama ecosystem is currently like maybe a close
  198. 9:51
  199. approximation to something that may grow into something like Linux again I think it's still very early because these are
  200. 9:57
  201. just simple LLMs but we're starting to see that these are going to get a lot more complicated it's not just about the LLM itself it's about all the tool use
  202. 10:03
  203. and the multiodalities and how all of that works and so when I sort of had this realization a while back I tried to
  204. 10:09
  205. sketch it out and it kind of seemed to me like LLMs are kind of like a new operating system right so the LLM is a
  206. 10:15
  207. new kind of a computer it's sitting it's kind of like the CPU equivalent uh the context windows are kind of like the
  208. 10:21
  209. memory and then the LLM is orchestrating memory and compute uh for problem
  210. 10:26
  211. solving um using all of these uh capabilities here and so definitely if
  212. 10:32
  213. you look at it looks very much like operating system from that perspective um a few more analogies for example if
  214. 10:39
  215. you want to download an app say I go to VS Code and I go to download you can download VS Code and you can run it on
  216. 10:46
  217. Windows Linux or or Mac in the same way as you can take an LLM app like cursor
  218. 10:53
  219. and you can run it on GPT or cloud or Gemini series right it's just a drop down so it's kind of like similar in
  220. 10:59
  221. that way as well uh more analogies that I think strike me is that we're kind of like in this
  222. The new LLM OS and historical computing analogies
  223. 11:04
  224. 1960sish era where LLM compute is still very expensive for this new kind of a
  225. 11:10
  226. computer and that forces the LLMs to be centralized in the cloud and we're all
  227. 11:15
  228. just uh sort of thing clients that interact with it over the network and none of us have full utilization of
  229. 11:22
  230. these computers and therefore it makes sense to use time sharing where we're all just you know a dimension of the
  231. 11:28
  232. batch when they're running the computer in the cloud and this is very much what computers used to look like at during
  233. 11:33
  234. this time the operating systems were in the cloud everything was streamed around and there was batching and so the p the
  235. 11:40
  236. personal computing revolution hasn't happened yet because it's just not economical it doesn't make sense but I think some people are trying and it
  237. 11:46
  238. turns out that Mac minis for example are a very good fit for some of the LLMs because it's all if you're doing batch
  239. 11:52
  240. one inference this is all super memory bound so this actually works and uh I think these are some early
  241. 11:58
  242. indications maybe of personal computing uh but this hasn't really happened yet it's not clear what this looks like maybe some of you get to invent what
  243. 12:05
  244. what this is or how it works or uh what this should what this should be maybe
  245. 12:10
  246. one more analogy that I'll mention is whenever I talk to Chach or some LLM directly in text I feel like I'm talking
  247. 12:16
  248. to an operating system through the terminal like it's just it's it's text it's direct access to the operating
  249. 12:23
  250. system and I think a guey hasn't yet really been invented in like a general way like should chatt have a guey like
  251. 12:30
  252. different than just a tech bubbles uh certainly some of the apps that we're going to go into in a bit have guey but
  253. 12:36
  254. there's no like guey across all the tasks if that makes sense um there are
  255. 12:42
  256. some ways in which LLMs are different from kind of operating systems in some fairly unique way and from early
  257. 12:48
  258. computing and I wrote about uh this one particular property that strikes me as
  259. 12:54
  260. very different uh this time around it's that LLMs like flip they flip the
  261. 12:59
  262. direction of technology diffusion uh that is usually uh present in technology
  263. 13:05
  264. so for example with electricity cryptography computing flight internet GPS lots of new transformative technologies that have not been around
  265. 13:11
  266. typically it is the government and corporations that are the first users because it's new and expensive etc and
  267. 13:18
  268. it only later diffuses to consumer uh but I feel like LLMs are kind of like flipped around so maybe with early
  269. 13:23
  270. computers it was all about ballistics and military use but with LLMs it's all
  271. 13:28
  272. about how do you boil an egg or something like that this is certainly like a lot of my use and so it's really fascinating to me that we have a new
  273. 13:34
  274. magical computer and it's like helping me boil an egg it's not helping the government do something really crazy
  275. 13:39
  276. like some military ballistics or some special technology indeed corporations are governments are lagging behind the
  277. 13:45
  278. adoption of all of us of all of these technologies so it's just backwards and I think it informs maybe some of the
  279. 13:50
  280. uses of how we want to use this technology or like where are some of the first apps and so on
  281. 13:56
  282. so in summary so far LLM labs LLMs i think it's accurate language to use but
  283. 14:03
  284. LLMs are complicated operating systems they're circa 1960s in computing and we're redoing computing all over again
  285. 14:10
  286. and they're currently available via time sharing and distributed like a utility what is new and unprecedented is that
  287. 14:16
  288. they're not in the hands of a few governments and corporations they're in the hands of all of us because we all have a computer and it's all just
  289. 14:21
  290. software and Chaship was beamed down to our computers like billions of people
  291. 14:26
  292. like instantly and overnight and this is insane uh and it's kind of insane to me that this is the case and now it is our
  293. 14:33
  294. time to enter the industry and program these computers this is crazy so I think this is quite remarkable before we
  295. Psychology of LLMs: People spirits and cognitive quirks
  296. 14:39
  297. program LLMs we have to kind of like spend some time to think about what these things are and I especially like
  298. 14:45
  299. to kind of talk about their psychology so the way I like to think about LLMs is that they're kind of like people spirits
  300. 14:52
  301. um they are stoastic simulations of people um and the simulator in this case happens to be an auto reggressive
  302. 14:58
  303. transformer so transformer is a neural net uh it's and it just kind of like is
  304. 15:04
  305. goes on the level of tokens it goes chunk chunk chunk chunk chunk and there's an almost equal amount of compute for every single chunk um and um
  306. 15:13
  307. this simulator of course is is just is basically there's some weights involved and we fit it to all of text that we
  308. 15:19
  309. have on the internet and so on and you end up with this kind of a simulator and because it is trained on humans it's got
  310. 15:25
  311. this emergent psychology that is humanlike so the first thing you'll notice is of course uh LLM have
  312. 15:31
  313. encyclopedic knowledge and memory uh and they can remember lots of things a lot more than any single individual human
  314. 15:36
  315. can because they read so many things it's it actually kind of reminds me of this movie Rainman which I actually
  316. 15:42
  317. really recommend people watch it's an amazing movie i love this movie um and Dustin Hoffman here is an autistic
  318. 15:47
  319. savant who has almost perfect memory so he can read a he can read like a phone book and remember all of the names and
  320. 15:54
  321. phone numbers and I kind of feel like LM are kind of like very similar they can remember Shaw hashes and lots of
  322. 15:59
  323. different kinds of things very very easily so they certainly have superpowers in some set in some respects
  324. 16:05
  325. but they also have a bunch of I would say cognitive deficits so they hallucinate quite a bit um and they kind
  326. 16:12
  327. of make up stuff and don't have a very good uh sort of internal model of self-nowledge not sufficient at least
  328. 16:18
  329. and this has gotten better but not perfect they display jagged intelligence so they're going to be superhuman in
  330. 16:23
  331. some problems solving domains and then they're going to make mistakes that basically no human will make like you
  332. 16:29
  333. know they will insist that 9.11 is greater than 9.9 or that there are two Rs in strawberry these are some famous
  334. 16:34
  335. examples but basically there are rough edges that you can trip on so that's kind of I think also kind of unique um
  336. 16:41
  337. they also kind of suffer from entrograde amnesia um so uh and I think I'm
  338. 16:47
  339. alluding to the fact that if you have a co-orker who joins your organization this co-orker will over time learn your
  340. 16:52
  341. organization and uh they will understand and gain like a huge amount of context on the organization and they go home and
  342. 16:58
  343. they sleep and they consolidate knowledge and they develop expertise over time llms don't natively do this and this is not something that has
  344. 17:04
  345. really been solved in the R&D of LLM i think um and so context windows are really kind of like working memory and
  346. 17:10
  347. you have to sort of program the working memory quite directly because they don't just kind of like get smarter by uh by
  348. 17:15
  349. default and I think a lot of people get tripped up by the analogies uh in this way uh in popular culture I recommend
  350. 17:22
  351. people watch these two movies uh Momento and 51st dates in both of these movies the protagonists their weights are fixed
  352. 17:28
  353. and their context windows gets wiped every single morning and it's really problematic to go to work or have
  354. 17:34
  355. relationships when this happens and this happens to all the time i guess one more thing I would point to is security kind
  356. 17:41
  357. of related limitations of the use of LLM so for example LLMs are quite gullible uh they are susceptible to prompt
  358. 17:47
  359. injection risks they might leak your data etc and so um and there's many other considerations uh security related
  360. 17:54
  361. so so basically long story short you have to load your you have to load your
  362. 17:59
  363. you have to simultaneously think through this superhuman thing that has a bunch of cognitive deficits and issues how do
  364. 18:05
  365. we and yet they are extremely like useful and so how do we program them and
  366. 18:10
  367. how do we work around their deficits and enjoy their superhuman powers
  368. 18:15
  369. so what I want to switch to now is talk about the opportunities of how do we use these models and what are some of the biggest opportunities this is not a
  370. Designing LLM apps with partial autonomy
  371. 18:22
  372. comprehensive list just some of the things that I thought were interesting for this talk the first thing I'm kind of excited about is what I would call
  373. 18:29
  374. partial autonomy apps so for example let's work with the example of coding you can certainly go to chacht directly
  375. 18:36
  376. and you can start copy pasting code around and copyping bug reports and stuff around and getting code and copy
  377. 18:42
  378. pasting everything around why would you why would you do that why would you go directly to the operating system it makes a lot more sense to have an app
  379. 18:48
  380. dedicated for this and so I think many of you uh use uh cursor i do as well and
  381. 18:54
  382. uh cursor is kind of like the thing you want instead you don't want to just directly go to the chash apt and I think
  383. 18:59
  384. cursor is a very good example of an early LLM app that has a bunch of properties that I think are um useful
  385. 19:06
  386. across all the LLM apps so in particular you will notice that we have a traditional interface that allows a
  387. 19:12
  388. human to go in and do all the work manually just as before but in addition to that we now have this LLM integration
  389. 19:18
  390. that allows us to go in bigger chunks and so some of the properties of LLM apps that I think are shared and useful
  391. 19:24
  392. to point out number one the LLMs basically do a ton of the context management um number two they
  393. 19:31
  394. orchestrate multiple calls to LLMs right so in the case of cursor there's under the hood embedding models for all your
  395. 19:37
  396. files the actual chat models models that apply diffs to the code and this is all
  397. 19:42
  398. orchestrated for you a really big one that uh I think also maybe not fully appreciated always is application
  399. 19:49
  400. specific uh GUI and the importance of it um because you don't just want to talk to the operating system directly in text
  401. 19:55
  402. text is very hard to read interpret understand and also like you don't want to take some of these actions natively
  403. 20:01
  404. in text so it's much better to just see a diff as like red and green change and you can see what's being added is
  405. 20:07
  406. subtracted it's much easier to just do command Y to accept or command N to reject i shouldn't have to type it in
  407. 20:12
  408. text right so a guey allows a human to audit the work of these fallible systems
  409. 20:17
  410. and to go faster i'm going to come back to this point a little bit uh later as well and the last kind of feature I want
  411. 20:24
  412. to point out is that there's what I call the autonomy slider so for example in cursor you can just do tap completion
  413. 20:30
  414. you're mostly in charge you can select a chunk of code and command K to change just that chunk of code you can do
  415. 20:36
  416. command L to change the entire file or you can do command I which just you know let it rip do whatever you want in the
  417. 20:42
  418. entire repo and that's the sort of full autonomy agent agentic version and so you are in charge of the autonomy slider
  419. 20:48
  420. and depending on the complexity of the task at hand you can uh tune the amount of autonomy that you're willing to give
  421. 20:54
  422. up uh for that task maybe to show one more example of a fairly successful LLM app uh perplexity um it also has very
  423. 21:03
  424. similar features to what I've just pointed out to in cursor uh it packages up a lot of the information it
  425. 21:08
  426. orchestrates multiple LLMs it's got a GUI that allows you to audit some of its work so for example it will site sources
  427. 21:16
  428. and you can imagine inspecting them and it's got an autonomy slider you can either just do a quick search or you can do research or you can do deep research
  429. 21:22
  430. and come back 10 minutes later so this is all just varying levels of autonomy that you give up to the tool so I guess
  431. 21:28
  432. my question is I feel like a lot of software will become partially autonomous i'm trying to think through
  433. 21:33
  434. like what does that look like and for many of you who maintain products and services how are you going to make your
  435. 21:38
  436. products and services partially autonomous can an LLM see everything that a human can see can an LLM act in
  437. 21:45
  438. all the ways that a human could act and can humans supervise and stay in the loop of this activity because again
  439. 21:50
  440. these are fallible systems that aren't yet perfect and what does a diff look like in Photoshop or something like that
  441. 21:56
  442. you know and also a lot of the traditional software right now it has all these switches and all this kind of stuff that's all designed for human all
  443. 22:03
  444. of this has to change and become accessible to LLMs so one thing I want to stress with a lot
  445. 22:09
  446. of these LLM apps that I'm not sure gets as much attention as it should is um we
  447. 22:15
  448. we're now kind of like cooperating with AIS and usually they are doing the generation and we as humans are doing the verification it is in our interest
  449. 22:22
  450. to make this loop go as fast as possible so we're getting a lot of work done there are two major ways that I think uh
  451. 22:29
  452. this can be done number one you can speed up verification a lot um and I think guies for example are extremely
  453. 22:34
  454. important to this because a guey utilizes your computer vision GPU in all
  455. 22:39
  456. of our head reading text is effortful and it's not fun but looking at stuff is fun and it's it's just a kind of like a
  457. 22:46
  458. highway to your brain so I think guies are very useful for auditing systems and visual representations in general and
  459. 22:53
  460. number two I would say is we have to keep the AI on the leash we I think a
  461. 22:58
  462. lot of people are getting way over excited with AI agents and uh it's not useful to me to get a diff of 10,000
  463. 23:04
  464. lines of code to my repo like I have to I'm still the bottleneck right even though that 10,00 lines come out
  465. 23:10
  466. instantly I have to make sure that this thing is not introducing bugs it's just like and that it's doing the correct
  467. 23:16
  468. thing right and that there's no security issues and so on so um I think that um
  469. 23:22
  470. yeah basically you we have to sort of like it's in our interest to make the
  471. 23:28
  472. the flow of these two go very very fast and we have to somehow keep the AI on the leash because it gets way too overreactive it's uh it's kind of like
  473. 23:35
  474. this this is how I feel when I do AI assisted coding if I'm just bite coding everything is nice and great but if I'm
  475. The importance of human-AI collaboration loops
  476. 23:40
  477. actually trying to get work done it's not so great to have an overreactive uh agent doing all this kind of stuff so
  478. 23:47
  479. this slide is not very good i'm sorry but I guess I'm trying to develop like many of you some ways of utilizing these
  480. 23:53
  481. agents in my coding workflow and to do AI assisted coding and in my own work I'm always scared to get way too big
  482. 23:59
  483. diffs i always go in small incremental chunks i want to make sure that everything is good i want to spin this
  484. 24:06
  485. loop very very fast and um I sort of work on small chunks of single concrete thing uh and so I think many of you
  486. 24:13
  487. probably are developing similar ways of working with the with LLMs um I also saw a number of blog posts
  488. 24:19
  489. that try to develop these best practices for working with LLMs and here's one that I read recently and I thought was
  490. 24:25
  491. quite good and it kind of discussed some techniques and some of them have to do with how you keep the AI on the leash and so as an example if you are
  492. 24:32
  493. prompting if your prompt is vague then uh the AI might not do exactly what you wanted and in that case verification
  494. 24:38
  495. will fail you're going to ask for something else if a verification fails then you're going to start spinning so it makes a lot more sense to spend a bit
  496. 24:45
  497. more time to be more concrete in your prompts which increases the probability of successful verification and you can
  498. 24:50
  499. move forward and so I think a lot of us are going to end up finding um kind of techniques like this i think in my own
  500. 24:56
  501. work as well I'm currently interested in uh what education looks like in um together with kind of like now that we
  502. 25:01
  503. have AI uh and LLMs what does education look like and I think a a large amount
  504. 25:07
  505. of thought for me goes into how we keep AI on the leash i don't think it just works to go to chat and be like "Hey
  506. 25:13
  507. teach me physics." I don't think this works because the AI is like gets lost in the woods and so for me this is
  508. 25:18
  509. actually two separate apps for example there's an app for a teacher that creates courses and then there's an app
  510. 25:24
  511. that takes courses and serves them to students and in both cases we now have this intermediate artifact of a course
  512. 25:31
  513. that is auditable and we can make sure it's good we can make sure it's consistent and the AI is kept on the leash with respect to a certain syllabus
  514. 25:37
  515. a certain like um progression of projects and so on and so this is one way of keeping the AI on leash and I
  516. 25:44
  517. think has a much higher likelihood of working and the AI is not getting lost in the woods
  518. 25:49
  519. one more kind of analogy I wanted to sort of allude to is I'm not I'm no stranger to partial autonomy and I kind
  520. 25:56
  521. of worked on this I think for five years at Tesla and this is also a partial autonomy product and shares a lot of the
  522. Lessons from Tesla Autopilot & autonomy sliders
  523. 26:01
  524. features like for example right there in the instrument panel is the GUI of the autopilot so it's showing me what the
  525. 26:07
  526. what the neural network sees and so on and we have the autonomy slider where over the course of my tenure there we
  527. 26:13
  528. did more and more autonomous tasks for the user and maybe the story that I wanted to tell very briefly is uh
  529. 26:21
  530. actually the first time I drove a self-driving vehicle was in 2013 and I had a friend who worked at Whimo and uh
  531. 26:27
  532. he offered to give me a drive around Palo Alto i took this picture using Google Glass at the time and many of you
  533. 26:33
  534. are so young that you might not even know what that is uh but uh yeah this was like all the rage at the time and we
  535. 26:39
  536. got into this car and we went for about a 30-minute drive around Palo Alto highways uh streets and so on and this
  537. 26:45
  538. drive was perfect there was zero interventions and this was 2013 which is now 12 years ago and it kind of struck
  539. 26:52
  540. me because at the time when I had this perfect drive this perfect demo I felt like wow self-driving is imminent
  541. 26:59
  542. because this just worked this is incredible um but here we are 12 years later and we are still working on
  543. 27:04
  544. autonomy um we are still working on driving agents and even now we haven't actually like really solved the problem
  545. 27:10
  546. like you may see Whimos going around and they look driverless but you know there's still a lot of teleoperation and
  547. 27:16
  548. a lot of human in the loop of a lot of this driving so we still haven't even like declared success but I think it's
  549. 27:22
  550. definitely like going to succeed at this point but it just took a long time and so I think like like this is software is
  551. 27:29
  552. really tricky I think in the same way that driving is tricky and so when I see
  553. 27:34
  554. things like oh 2025 is the year of agents I get very concerned and I kind of feel like you know this is the decade
  555. 27:41
  556. of agents and this is going to be quite some time we need humans in the loop we need to do this carefully this is
  557. 27:47
  558. software let's be serious here one more kind of analogy that I always think
  559. The Iron Man analogy: Augmentation vs. agents
  560. 27:52
  561. through is the Iron Man suit uh I think this is I always love Iron Man i think
  562. 27:58
  563. it's like so um correct in a bunch of ways with respect to technology and how it will play out and what I love about
  564. 28:04
  565. the Iron Man suit is that it's both an augmentation and Tony Stark can drive it and it's also an agent and in some of
  566. 28:10
  567. the movies the Iron Man suit is quite autonomous and can fly around and find Tony and all this kind of stuff and so this is the autonomy slider is we can be
  568. 28:17
  569. we can build augmentations or we can build agents and we kind of want to do a bit of both but at this stage I would
  570. 28:23
  571. say working with fallible LLMs and so on i would say you know it's less Iron Man
  572. 28:29
  573. robots and more Iron Man suits that you want to build it's less like building flashy demos of autonomous agents and
  574. 28:35
  575. more building partial autonomy products and these products have custom gueies and UIUX and we're trying to um and this
  576. 28:43
  577. is done so that the generation verification loop of the human is very very fast but we are not losing the
  578. 28:48
  579. sight of the fact that it is in principle possible to automate this work and there should be an autonomy slider in your product and you should be
  580. 28:54
  581. thinking about how you can slide that autonomy slider and make your product uh sort of um more autonomous over time but
  582. 29:01
  583. this is kind of how I think there's lots of opportunities in these kinds of products i want to now switch gears a
  584. Vibe Coding: Everyone is now a programmer
  585. 29:06
  586. little bit and talk about one other dimension that I think is very unique not only is there a new type of programming language that allows for
  587. 29:12
  588. autonomy in software but also as I mentioned it's programmed in English which is this natural interface and
  589. 29:19
  590. suddenly everyone is a programmer because everyone speaks natural language like English so this is extremely
  591. 29:24
  592. bullish and very interesting to me and also completely unprecedented i would say it it used to be the case that you need to spend five to 10 years studying
  593. 29:31
  594. something to be able to do something in software this is not the case anymore so I don't know if by any chance anyone has
  595. 29:37
  596. heard of vibe coding uh this this is the tweet that kind of
  597. 29:42
  598. like introduced this but I'm told that this is now like a major meme um fun story about this is that I've been on
  599. 29:49
  600. Twitter for like 15 years or something like that at this point and I still have no clue which tweet will become viral
  601. 29:56
  602. and which tweet like fizzles and no one cares and I thought that this tweet was going to be the latter i don't know it
  603. 30:01
  604. was just like a shower of thoughts but this became like a total meme and I really just can't tell but I guess like it struck a chord and it gave a name to
  605. 30:08
  606. something that everyone was feeling but couldn't quite say in words so now there's a Wikipedia page and everything
  607. 30:17
  608. this is like [Applause]
  609. 30:25
  610. yeah this is like a major contribution now or something like that so um so Tom Wolf from HuggingFace shared
  611. 30:32
  612. this beautiful video that I really love um these are kids vibe coding
  613. 30:42
  614. and I find that this is such a wholesome video like I love this video like how can you look at this video and feel bad
  615. 30:48
  616. about the future the future is great i think this will end up being like a
  617. 30:53
  618. gateway drug to software development um I'm not a doomer about the future of the
  619. 30:59
  620. generation and I think yeah I love this video so I tried by coding a little bit
  621. 31:05
  622. uh as well because it's so fun uh so bike coding is so great when you want to build something super duper custom that
  623. 31:11
  624. doesn't appear to exist and you just want to wing it because it's a Saturday or something like that so I built this uh iOS app and I don't I can't actually
  625. 31:19
  626. program in Swift but I was really shocked that I was able to build like a super basic app and I'm not going to explain it it's really uh dumb but uh I
  627. 31:26
  628. kind of like this was just like a day of work and this was running on my phone like later that day and I was like "Wow
  629. 31:31
  630. this is amazing." I didn't have to like read through Swift for like five days or something like that to like get started
  631. 31:37
  632. i also vipcoded this app called Menu Genen and this is live you can try it in menu.app and I basically had this
  633. 31:44
  634. problem where I show up at a restaurant I read through the menu and I have no idea what any of the things are and I need pictures so this doesn't exist so I
  635. 31:51
  636. was like "Hey I'm going to bite code it." So um this is what it looks like you go to menu.app
  637. 31:58
  638. um and uh you take a picture of a of a menu and then menu generates the images
  639. 32:04
  640. and everyone gets $5 in credits for free when you sign up and therefore this is a
  641. 32:09
  642. major cost center in my life so this is a negative negative uh revenue app for
  643. 32:15
  644. me right now i've lost a huge amount of money on menu
  645. 32:21
  646. okay but the fascinating thing about menu genen for me is that the code of
  647. 32:28
  648. the v the vite coding part the code was actually the easy part of v of v coding menu and most of it actually was when I
  649. 32:35
  650. tried to make it real so that you can actually have authentication and payments and the domain name and averal deployment this was really hard and all
  651. 32:41
  652. of this was not code all of this devops stuff was in me in the browser clicking
  653. 32:47
  654. stuff and this was extreme slo and took another week so it was really fascinating that I had the menu genen um
  655. 32:54
  656. basically demo working on my laptop in a few hours and then it took me a week because I was trying to make it real and
  657. 33:01
  658. the reason for this is this was just really annoying um so for example if you try to add Google login to your web page
  659. 33:07
  660. I know this is very small but just a huge amount of instructions of this clerk library telling me how to
  661. 33:13
  662. integrate this and this is crazy like it's telling me go to this URL click on this dropdown choose this go to this and
  663. 33:19
  664. click on that and it's like telling me what to do like a computer is telling me the actions I should be taking like you
  665. 33:25
  666. do it why am I doing this what the hell
  667. 33:31
  668. i had to follow all these instructions this was crazy so I think the last part of my talk therefore focuses on can we
  669. Building for agents: Future-ready digital infrastructure
  670. 33:39
  671. just build for agents i don't want to do this work can agents do this thank you
  672. 33:46
  673. okay so roughly speaking I think there's a new category of consumer and manipulator of digital information it
  674. 33:53
  675. used to be just humans through GUIs or computers through APIs and now we have a completely new thing and agents are
  676. 34:00
  677. they're computers but they are humanlike kind of right they're people spirits there's people spirits on the internet
  678. 34:05
  679. and they need to interact with our software infrastructure like can we build for them it's a new thing so as an
  680. 34:10
  681. example you can have robots.txt on your domain and you can instruct uh or like advise I suppose um uh web crawlers on
  682. 34:18
  683. how to behave on your website in the same way you can have maybe lm.txt txt file which is just a simple markdown
  684. 34:23
  685. that's telling LLMs what this domain is about and this is very readable to a to an LLM if it had to instead get the HTML
  686. 34:31
  687. of your web page and try to parse it this is very errorprone and difficult and will screw it up and it's not going to work so we can just directly speak to
  688. 34:37
  689. the LLM it's worth it um a huge amount of documentation is currently written for people so you will see things like
  690. 34:43
  691. lists and bold and pictures and this is not directly accessible by an LLM so I
  692. 34:50
  693. see some of the services now are transitioning a lot of the their docs to be specifically for LLMs so Versell and
  694. 34:56
  695. Stripe as an example are early movers here but there are a few more that I've seen already and they offer their
  696. 35:03
  697. documentation in markdown markdown is super easy for LMS to understand this is
  698. 35:08
  699. great um maybe one simple example from from uh my experience as well maybe some
  700. 35:13
  701. of you know three blue one brown he makes beautiful animation videos on YouTube
  702. 35:19
  703. [Applause] yeah I love this library so that he
  704. 35:25
  705. wrote uh Manon and I wanted to make my own and uh there's extensive
  706. 35:30
  707. documentations on how to use manon and so I didn't want to actually read through it so I copy pasted the whole
  708. 35:35
  709. thing to an LLM and I described what I wanted and it just worked out of the box like LLM just bcoded me an animation
  710. 35:41
  711. exactly what I wanted and I was like wow this is amazing so if we can make docs legible to LLMs it's going to unlock a
  712. 35:48
  713. huge amount of um kind of use and um I think this is wonderful and should should happen more the other thing I
  714. 35:55
  715. wanted to point out is that you do unfortunately have to it's not just about taking your docs and making them appear in markdown that's the easy part
  716. 36:00
  717. we actually have to change the docs because anytime your docs say click this is bad an LLM will not be able to
  718. 36:06
  719. natively take this action right now so Verscell for example is replacing every occurrence of click with an equivalent
  720. 36:13
  721. curl command that your LM agent could take on your behalf um and so I think this is very interesting and then of
  722. 36:20
  723. course there's a model context protocol from Enthropic and this is also another way it's a protocol of speaking directly
  724. 36:25
  725. to agents as this new consumer and manipulator of digital information so I'm very bullish on these ideas the
  726. 36:30
  727. other thing I really like is a number of little tools here and there that are helping ingest data that in like very
  728. 36:37
  729. LLM friendly formats so for example when I go to a GitHub repo like my nanoGPT repo I can't feed this to an LLM and ask
  730. 36:43
  731. questions about it uh because it's you know this is a human interface on GitHub so when you just change the URL from
  732. 36:49
  733. GitHub to get ingest then uh this will actually concatenate all the files into a single giant text and it will create a
  734. 36:55
  735. directory structure etc and this is ready to be copy pasted into your favorite LLM and you can do stuff maybe
  736. 37:01
  737. even more dramatic example of this is deep wiki where it's not just the raw content of these files uh this is from
  738. 37:08
  739. Devon but also like they have Devon basically do analysis of the GitHub repo and Devon basically builds up a whole
  740. 37:14
  741. docs uh pages just for your repo and you can imagine that this is even more
  742. 37:19
  743. helpful to copy paste into your LLM so I love all the little tools that basically where you just change the URL and it
  744. 37:25
  745. makes something accessible to an LLM so this is all well and great and u I think there should be a lot more of it one
  746. 37:31
  747. more note I wanted to make is that it is absolutely possible that in the future LLMs will be able to this is not even
  748. 37:38
  749. future this is today they'll be able to go around and they'll be able to click stuff and so on but I still think it's very worth u basically meeting LLM
  750. 37:46
  751. halfway LLM's halfway and making it easier for them to access all this information uh because this is still
  752. 37:51
  753. fairly expensive I would say to use and uh a lot more difficult and so I do think that lots of software there will
  754. 37:58
  755. be a long tail where it won't like adapt apps because these are not like live player sort of repositories or digital
  756. 38:04
  757. infrastructure and we will need these tools uh but I think for everyone else I think it's very worth kind of like
  758. 38:09
  759. meeting in some middle point so I'm bullish on both if that makes sense so in summary what an amazing time to
  760. Summary: We’re in the 1960s of LLMs β€” time to build
  761. 38:17
  762. get into the industry we need to rewrite a ton of code a ton of code will be written by professionals and by coders
  763. 38:23
  764. these LLMs are kind of like utilities kind of like fabs but they're kind of especially like operating systems but
  765. 38:30
  766. it's so early it's like 1960s of operating systems and uh and I think a
  767. 38:35
  768. lot of the analogies cross over um and these LMS are kind of like these fallible uh you know people spirits that
  769. 38:42
  770. we have to learn to work with and in order to do that properly we need to adjust our infrastructure towards it so
  771. 38:48
  772. when you're building these LLM apps I describe some of the ways of working effectively with these LLMs and some of
  773. 38:54
  774. the tools that make that uh kind of possible and how you can spin this loop very very quickly and basically create
  775. 39:00
  776. partial tunneling products and then um yeah a lot of code has to also be written for the agents more directly but
  777. 39:06
  778. in any case going back to the Iron Man suit analogy I think what we'll see over the next decade roughly is we're going
  779. 39:12
  780. to take the slider from left to right and I'm very interesting it's going to be very interesting to see what that
  781. 39:18
  782. looks like and I can't wait to build it with all of you thank you
Advertisement
Add Comment
Please, Sign In to add comment