philknows

Untitled

Aug 16th, 2023
9
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 35.39 KB | None | 0 0
  1. Hey everyone, we are on the August 15th, 2023 stand up.
  2. Wow, it's been August.
  3. That is crazy how fast time flies.
  4. Back to school everybody.
  5. All right, what do I have on my list here?
  6. Okay, we'll do a little bit of the v1.11 planning.
  7. There are some things we have to reprioritize.
  8. Um?
  9. Yamux was one of them that was on my
  10. list here because yeah, there was.
  11. Mplex was removed, but I think Pawan
  12. one is reintegrating it.
  13. If I if I'm not mistaken, it came in.
  14. Yeah, I don't think they're going
  15. to actually cut the release.
  16. Without Mplex support.
  17. But right?
  18. Yeah, I mean, I guess it's still
  19. kind of a kick in the pants to get to get get Yamux over the finish line.
  20. is there a reason why they're implementing that is it more efficient
  21. or is it just a change or yeah so a blacks is deprecated by lipp2p specs
  22. and it got replaced by you know yamux which is supposedly better and
  23. and faster and all these things.
  24. - But is it whiz bang?
  25. - Our implementation isn't.
  26. So that's been why we haven't integrated it.
  27. I think there's a memory leak in it as well.
  28. - Right.
  29. - So discv5 memory leak.
  30. Is that related to the discv5 thing
  31. or are those different?
  32. - Different.
  33. Yeah, um, yeah, I know. I know we wanted to work on that. And then other things just kept
  34. getting in the way. There was a time where we wanted to like, continue like, well, we
  35. got libp2p upgraded. So now I don't know if that makes a difference, though, does it
  36. for the Yamux?
  37. Well, it just unblocks the, I guess, integration of, of, of the, yamux.
  38. Okay.
  39. Um, because I think it was only lighthouse that was, I guess, wanting
  40. to sort of deprecate, uh, Mplex, but do you still see this as like, well, I
  41. I mean, we should still do it as soon as we can,
  42. but is this something that we wanna try to maybe push
  43. in v1.11?
  44. I don't know if we'll get it fixed by then,
  45. but what was your opinion on it?
  46. - I think it's important, but not very urgent.
  47. - Okay.
  48. - If we can't, if the time comes for v1.11
  49. and it's available, that'd be great.
  50. And it is something that I started looking into yesterday.
  51. So I'll try to keep pushing on it.
  52. - Cool.
  53. All right, thanks.
  54. All right.
  55. Just one of those things that's just like a good reminder
  56. of when things like fall to the side or wherever,
  57. we just gotta keep pushing to the finish line
  58. on these things.
  59. But yeah, let's see what else do I have for v1.11 planning.
  60. I've been mostly focused on the Holesky stuff
  61. just to ensure that we're caught up
  62. with whatever we need to do to ensure
  63. that we will be successful on that testnet.
  64. - I think I did.
  65. - Oh, sorry.
  66. - No, go ahead.
  67. I didn't, I thought you were done.
  68. No, I was just going to say that the keys have been generated.
  69. We've uploaded them for Genesis.
  70. We plan to do 5,000 keys per server,
  71. but that has never been done before.
  72. So the infrastructure team is aware that things may break,
  73. and we may have to lower the amount of keys on each server.
  74. But anyways, that's all uploaded now.
  75. taken care of. Genesis should hopefully be within the next like two weeks I think,
  76. and we'll see how that goes. One of the things that was mentioned as well, thanks Tuyen for
  77. reminding me, is that we should figure out a way to deal with finalized archived states.
  78. So I put up an issue literally just before this call to see if anybody can implement
  79. some pruning features for that, because if we can limit the amount of beacon data that grows with
  80. that, it might help with some of the Holesky servers. That way, we don't keep running out
  81. of disk space and stuff. And I don't believe we need any of those finalized or archived states
  82. for a testnet like that. So if someone can take a look at that after the call and see if they
  83. they can implement it.
  84. I tagged it for v1.11 as well
  85. to see if we can get it in there.
  86. That would hopefully help with that.
  87. - Well, I'm actively working on the blocks again.
  88. Lion gave me some pointers in order to be able to do that.
  89. It's not state, but it's blocks,
  90. which kind of gets stored in the state as well.
  91. So I got some headway on it this week
  92. and I'm gonna keep working on it.
  93. - Okay.
  94. Great.
  95. >> What else do I have on here?
  96. >> There's something that I would like to add,
  97. make sure it gets in v1.11, is the,
  98. I've got a draft PR open for adding a boot node CLI command.
  99. So you can run a Lodestar boot node, and it basically will just spin up a discv5
  100. server and not spin up a beacon node.
  101. And we want to eventually, or after that,
  102. we want to spin up at least three different Lodestar boot
  103. nodes in different geographic areas
  104. and contribute to the boot node ecosystem.
  105. Because all the other CL clients are running boot nodes,
  106. so we really should as well.
  107. And I do think that this is an area where we can actually meaningfully impact security
  108. the network because I think everyone else is pretty much using like very similar providers,
  109. very similar jurisdictions.
  110. I think it's all like AWS.
  111. And then like, I think there's a bunch of people running things in Frankfurt.
  112. So it's like, it's not very well distributed among jurisdictions and providers.
  113. so we could do a lot better.
  114. Oh, and we'll have a sub cluster in the India.
  115. Thanks, Gajinder, for your contributions.
  116. OK.
  117. Anything else that people want to tag as for v1.11?
  118. I'm hoping to get this thing potentially ready
  119. within the next, I would say,
  120. if we can even get a beta out by like a week from now.
  121. But the thing is I'm not,
  122. like I'm taking the last week of August off,
  123. so I wouldn't be able to help sort of push this through.
  124. So if--
  125. - I'd like to see the stuff that Tuyen's working on,
  126. the fork choice. - The fork choice, yeah.
  127. core choice of performance improvements, that'd be nice.
  128. - So I do have a-
  129. - So I would like to have a protons upgrade
  130. in gossipsub to Cayman,
  131. 'cause we left that for a long time.
  132. The benchmark is just the same to protobufs-js.
  133. So there's no issue there,
  134. and there's the memory leak.
  135. I will test it back in group feature 3
  136. and maybe let the whole day to test it.
  137. >>Okay, cool.
  138. Yeah, I haven't looked too much into that PR,
  139. but I had one question about it,
  140. which was, I think we had some non-standard,
  141. or we added some limits to the protobuf decoding,
  142. where it would limit the number of messages
  143. that could be inside of a single protobuf,
  144. or a single RPC, I think limited the number of control
  145. messages, the number of IWants and IHaves, things like that.
  146. But at the decoding level, as an additional security check,
  147. basically.
  148. And I was wondering if protons in generating the protobuf
  149. decoding stuff using protons, if that feature is still
  150. going to be there or if how we would tackle that.
  151. I don't see it's being there, I can see the codec source code that was generated so it
  152. just do the same thing like what lion did, I bring the source code to a separate file,
  153. and then do our own codec with the limit option there.
  154. So it's not difficult.
  155. Okay.
  156. I want to make sure I've tagged all of your fork choice PRs for this Tuyen.
  157. If you can just either send me the PRs, I want to make sure I got them all.
  158. else. Let's see. Oh, um, the memory leak issue. I feel like I should make a tweet for this
  159. because I think the reason why Nico reopened it was because somebody didn't see that issue
  160. because we closed it right? Yeah, a few people popped it up. Yeah. Okay, maybe I feel like
  161. I should make some sort of announcement on discord and Twitter about this. So people
  162. What else?
  163. Anything else that's on your radars that could possibly get in within the next like week
  164. or two weeks at most?
  165. Like, no, nevermind.
  166. BLS?
  167. I would love to.
  168. It's running, it's looking nice.
  169. Like it's looking nice, but like to be fair,
  170. it's such a big change that like,
  171. I would like everybody's blessing and it runs a few days
  172. and lots of perf reports, but it looks nice.
  173. It's running.
  174. Feature two, if you guys see it, I mean like a lot of like,
  175. mesh peers are looking good,
  176. length of connections are good,
  177. message data bytes per sec,
  178. like all of the, like the metrics are looking nice.
  179. - What about state transition?
  180. - It's aggregating state faster.
  181. They're faster, like there's less lag in the state transitions.
  182. Like it's like across the board, like there are certain things that just, I mean, it looks nice.
  183. I'd love other eyes on it to make sure that I'm like, do you know what I mean?
  184. But it's on feature too.
  185. So you'd say it's ready for like a final review.
  186. Is that, uh, it's ready for like looking at the metrics and stuff.
  187. And then like the PR for actually like integrating it is a lot.
  188. There was a load of changes in there.
  189. So like it might be worth breaking into a couple pieces because I added a couple flags
  190. And renamed a couple flags and like also I had to it touches a lot of files because I didn't want any crossover of the classes
  191. So like basically I had to import like change the header and lots of files where the BLS was being imported or the types are being
  192. Imported they're all they're uniformed. You don't I'm saying but like so there's just a lot of little changes that
  193. That are spread. So it's a big PR
  194. And we have to publish the package and get that done and the CI and the blast library
  195. So there's a lot of little things left
  196. But I mean, it's all doable as long as like the metrics of everybody's okay with them. The only thing is the gossip score
  197. There are a couple like it's more average across the gossip scores
  198. But like there's a couple deeper spikes and I'm not sure why those couple
  199. Spikes are there so like there's a couple things that gives me pause but like otherwise most stuff look pretty good
  200. Like the BLS metrics are all working. It's it's yeah
  201. You said this is feature two or feature three
  202. to
  203. One of the other things I want to get an update on as well is the...
  204. I'm trying to think, did we run subscribe all subnets
  205. on any of the production nodes or anything on mainnet at all
  206. to see our performance?
  207. I don't know if we have, but we should look into that a little bit more
  208. because Tuyen also has his issue to increase target peers at some point.
  209. I'm not sure when we would want to do that.
  210. I guess we need to ensure that we're able to handle this, correct?
  211. Actually, I should probably run BLST with subscribes to all subnets as well
  212. and see what happens when you boost it up and it's doing more gossip transitions.
  213. That's when it would probably do even better.
  214. Hopefully, knock on wood.
  215. So, regarding the target peers, I can work on that next.
  216. It's just that the feature three is busy to test the protons.
  217. Once I'm done with that, I can test with the max peers of 100.
  218. Okay.
  219. Yeah, sounds good.
  220. And actually another thing that might be able to get in is adjusting new space, because I've been playing with that all week, and it significantly reduces GC time.
  221. time. So we've got like for instance on the main net nodes, it goes from 10 to 12% to like two, two
  222. and a half percent of no time doing GC. So like there's a there's a couple really good, which is
  223. affecting block times and a few of the other bits and pieces. I have noticed some regression in like
  224. scrape duration and a couple other random which is why I asked that question in in private of like
  225. what are the other things that I really should be wary of, maybe that I'm not paying attention to,
  226. just to make sure that it doesn't degrade node performance inadvertently in some places.
  227. But that's a good one that we can include as well.
  228. Um. All right.
  229. Uh, we have quite a few things already in the v1.11 milestone, so I think unless there's something
  230. pressing, we might be good for v1.11 planning here at this point.
  231. there. And there was one thing.
  232. There was one thing that I
  233. wanted to mention this while
  234. here. Just trying to find it
  235. again. Yeah I don't know if
  236. there was any advancement in
  237. Lodestar beacon node, the aggregated attestation errors,
  238. do you have any update on what's going on there, Nico?
  239. - So I posted the update on the issue.
  240. Sorry, Nazar, go ahead.
  241. - I was working on a PR,
  242. which where we can run different combination of validator
  243. and beacon nodes in the simulation test.
  244. Would that detect if the aggregate is not produced?
  245. - Yes, it will.
  246. - Yeah, so based on what I checked
  247. and I debugged it a bit is that,
  248. so we have this cache where we,
  249. when the VC wants to get an aggregate,
  250. we check in a cache.
  251. And I think this is filled from the previous attestation.
  252. And it looks like the data route is different,
  253. but at least we don't have anything in the cache
  254. that matches the data route that Lighthouse is sending.
  255. - Okay.
  256. Yeah, I just have a tag that's like v1.11,
  257. but I suspect that this is gonna get pushed,
  258. but I just wanted to check up on it.
  259. So thanks.
  260. - Yeah, so I posted detailed logs and stuff on the issue.
  261. If anyone has ideas why this could be,
  262. and maybe we can discuss there.
  263. - Yeah.
  264. You, you're also in our telegram chats
  265. with Lighthouse, correct?
  266. Just in case we need to correct.
  267. - I don't think so.
  268. - No? Okay.
  269. I'll make sure. - Not with Lighthouse, no.
  270. - Okay. I'll make sure you are,
  271. just in case there are some questions
  272. that need to be routed to them as well.
  273. Okay, why don't we just go with a round of updates
  274. if there's nothing else for planning.
  275. So anybody have anything else
  276. they want to bring up for planning today?
  277. Cool.
  278.  
  279.  
  280. All right, let's start with Tuyen today.
  281. Hi, so I investigate the issue with the bigboy testnet.
  282. The main issue we have is we have a long update head call, which may take up to eight seconds.
  283. That's why we have very low peers.
  284. And there was a two PR to improve that.
  285. The main issue we have is in the devnet test a lot of unfinalized proto nodes and the updateHead
  286. call grows exponentially.
  287. That the main issue is there's a lot of checking
  288. the node has the same finalized checkpoint.
  289. So the main fix was in node is verify for head.
  290. With that, it reduced a lot of time.
  291. And I think we will not have this same issue
  292. the, if we, we use the next version of Lodestar on that testnet.
  293. The other one we check is to track vote by index, they will have to improve the compute
  294. delta function.
  295. In the performance test it is like 2x, or 3x better, but somehow when it test in a mainnet node is like eight times faster. Normally an updateHead call is like 240 milliseconds,
  296. but it reduced like 30 to 40 milliseconds. So, please review that here.
  297. The other thing I'm working on is protons migration. So, the protobuf in gossipsub
  298. is implemented by protobufs-js, and we have a plan to migrate to protons, and we left that for a while. Now I think it's a good time to to implement that in the system, and I'm testing that in group.
  299. The other work is in gossipsub there was a small PR to the de-duplicate metrics in gossipsub.
  300. The issue is that time I have a PR to unbundle metrics, and we have to two validation phase, one in the gossipsub, and the other one from the application level, and the name is very similar, like, count number of invalid or valid messages. And then then the name is duplicated. So, in the PR, I suggest we de-duplicate the name, like in the system we can call it pre validation result, mainly to differentiate from the result.
  301. The validation result from the application. The main point is just to be duplicated it, but if someone had any better ideas, please comment on the PR.
  302. But that's important because when I run Lodestar with the latest gossipsub, I cannot operate without that PR.
  303. And last thing is the index gossip queue.
  304. The implementation of the queue was merged now I'm working on a PR to consume that as the
  305. last PR for this work.
  306. That's it for me.
  307. Awesome.
  308. Thanks, Tuyen.
  309.  
  310. Let's move forward with Nico.
  311. Okay, so one thing I looked at was that lighthouse issue and documented a bit about that.
  312. Then I also looked at updating the bootnodes.
  313. So that's already in.
  314. I basically just pulled the latest changes and the hard-coded values or basically updated
  315. the ENRs that we have in our code.
  316. Besides that, yesterday I looked at this one issue.
  317. we can also get that in 1.11 where we don't support authorization header anymore or basically
  318. basic auth is not supported. That was rather a bug in the previous implementation we had
  319. because it didn't follow the spec. And actually if we updated to NodeFetch version 3, we would
  320. have had the same thing. But so this PR just builds the header beforehand basically because
  321. fetch library itself doesn't do that. Yeah, besides that was mostly looking at moving
  322. the network to a process instead of a thread. So I got that to kind of work. But the problem
  323. right now is that all of a sudden, it just stops. So the so where I'm at right now is
  324. is that the main thread is just not receiving
  325. any message events at some point.
  326. So I'm not sure what the reason is for that yet,
  327. but we are definitely sending a ton of events
  328. from the worker to the main thread.
  329. So could be related to that, maybe.
  330. Not sure if it's a bug in the child process implementation,
  331. still need to figure that out.
  332. Yeah, so basically stuck on that right now,
  333. trying to debug that.
  334. Thanks for the update, Nico.
  335.  
  336.  
  337. All right, Gajinder.
  338. Hey guys.
  339. So mostly, I worked on
  340. on the block proposal improvement tracker.
  341. So I'm trying to integrate produce block V3,
  342. which is basically a combined API
  343. for both execution block and a builder block,
  344. and which basically will move our block builder
  345. versus execution race to the beacon.
  346. And apart from that, I also tried,
  347. I also just tried a bit on broadcasting the proposals,
  348. the proposal data that validators send.
  349. But then there are many questions that came into head
  350. over there, for example, in the validator,
  351. should we treat each beacon node URL
  352. as a separate beacon node,
  353. because for example, in some of the, in produce block,
  354. you know, when we send produce block API calls to these,
  355. we should send in parallel,
  356. even though they are configured as fallback,
  357. because any delay, if for example,
  358. the base, the zero URL doesn't respond.
  359. So then we, as a fallback,
  360. we send the request at a later point,
  361. which basically cause a delay.
  362. So should we send parallel requests
  363. or should we race them like how we race
  364. builder versus execution because it makes sense.
  365. For example, one beacon node might not be connected
  366. to a builder, one could be connected to,
  367. others could be connected to builder.
  368. So one of the execution nodes could resolve
  369. way before that, one of the beacon nodes
  370. could resolve way before the other beacon nodes.
  371. and that basically will not give us the optimal block value
  372. that we might be trying to produce.
  373. So maybe we should consider all these beacon URLs
  374. as independent block producers and race
  375. and apply our cutoff and race module over there.
  376. So these are the questions that came to mind.
  377. So I basically put that on the back burner
  378. and trying to push V3 out.
  379. Also, I looked into the Whisk spec,
  380. dig through it and sort of got into
  381. and understood, had some discussions with Lion with this.
  382. And yeah, that's mostly it.
  383. And did some couple of reviews.
  384. - Thanks, Gajinder.
  385. I'm curious, did anybody give any thought
  386. to this sort of fallback URL race at all
  387. and how it should be structured?
  388. I'm interested to know like the sort of trade-offs
  389. in the way that we're designing this.
  390. - I mean, I think Vouch is the only one
  391. which races multiple beacon nodes.
  392. Most of the other validators are sort of hooked to one node
  393. that's the basic assumption. Not really sure whether what the other clients' observations
  394. have been on the fallback node. Maybe I'll ask someone. But Vouch do race multiple beacon nodes
  395. to get the most optimal proposal. And they must be doing some sort of a wait for sure, because
  396. it is very possible or they are hooked up onto the same system where there is the same builder
  397. attached to the beacon node. So it might be totally a homogeneous kind of setup for the
  398. beacon nodes. They are attached to the same kind of builders. So it could be that. But if there is,
  399. for example, you know, you send, you registered to one particular node, your registration to one
  400. particular beacon node failed, and it will alter the timelines and the value of the block that is
  401. producing, so which will anyway impact the final blocks that validator should be weighing to see
  402. which one it should pick for block proposal. And obviously they will come at different times, so
  403. you can't really have a simple say that okay first one resolves will win. I think our strategy where
  404. we say that there is a two second cutoff and post that the first if all in that cutoff,
  405. if not all resolved or a few error, then we'll basically race to see whoever resolved first.
  406. So basically we wait till the cutoff and then pick the winner. I think that is an optimal strategy
  407. to go for and I'm not sure how other clients are sort of implementing that.
  408. But even if we move our race to the beacon node,
  409. where again, we'll use the same race
  410. against builder and execution.
  411. But I think it would still make sense
  412. with regard to having fallback nodes,
  413. that we should race them
  414. and we should basically start proposal flow parallelly on,
  415. we should at least start the produce block flow parallel
  416. on all of them and basically again, cut off and race.
  417. And for publishing the blocks again,
  418. we should broadcast rather than saying that, okay,
  419. we'll send to one and then we will,
  420. if it fails, then we'll fall back to the other nodes.
  421. I think these are the things that we can do
  422. and we'll need to sort of have a different kind of treatment
  423. for how fallback beacons are treated in validator.
  424. So I'm thinking maybe I'll introduce a method
  425. where basically the HTTP client will raise all the URLs
  426. with some cutoff and some threshold,
  427. with some cutoff and timeout.
  428. So we can get this very generic interface
  429. that we can easily plug into any calls that we want to run through this mode.
  430. Okay, sounds good. Yeah, I was just curious because with Vouch, I'm not sure if with Vouch,
  431. if how much the timing matters comparatively to the value of the blocks. There
  432. There is one Lido node operator who is running Loadstar with Vouch, or at least I asked him
  433. to, but I don't have any data back yet to see how we perform against the other clients
  434. on vouch yet.
  435. But that would be a very interesting metric point to see how often Loadstar is actually
  436. viable for them on Vouch.
  437. I mean if he knows the specific of how Vouch is running the proposal flow, I think
  438. it would be nice to sort of you know discuss it with them.
  439. Otherwise, we can definitely dig into the code.
  440. But if somehow someone can give us the knowledge of how Vouch is doing it, why not?
  441. Okay.
  442. Okay, thanks, Gajinder.
  443.  
  444.  
  445. Okay, thanks, Guginder.
  446. All right, let's move forward with NC. How's it going?
  447. Hey, guys.
  448. All right. So last week for the ePBS, on the ePBS side, I think we focused on the inclusion list design in terms of the discussion.
  449. And there's a lot of inclusion list design out there.
  450. You have a forward inclusion list, you have same slot or same block,
  451. top of block, bottom of block design.
  452. But nonetheless, in terms of engine API,
  453. I think they pretty much share the same specs.
  454. So I wrote that up.
  455. Here, I'll send out the link.
  456. So for anyone who is interested,
  457. any feedback is welcome on this engine API part.
  458. Right, and then for the rest,
  459. regarding the validator spec, builder spec,
  460. I think, you know, the Prysm folks are still,
  461. you know, picking their brains
  462. and there's just a lot of like, you know,
  463. small details that need to consider.
  464. That's all on the ePBS.
  465. And then for 6110, I did an analysis on the pubkey cache
  466. in the current code base of Lodestar.
  467. Right.
  468. Personally, I think that like, I don't,
  469. there is really no need for the unfinalized
  470. index to pubkey cache.
  471. Not only because the beacon API,
  472. it doesn't really use the index to pubkey,
  473. like beacon API only uses the other way around,
  474. pubkey to index.
  475. But also like for any single use case out there,
  476. they're using indexed pubkey,
  477. like it always require like an active validator.
  478. So there's no need for them finalized cache.
  479. I saw like comments from Gajinder.
  480. There's one point saying like non-finality.
  481. This is something that still need to think about.
  482. But nonetheless, like I still, you know,
  483. wanna get some more input from Lion
  484. and hopefully we can settle down with the refactoring
  485. with the pubkey cache design,
  486. and I can go ahead and start the coding.
  487. Right, that's pretty much for me.
  488. - Thanks, NC.
  489. Awesome.
  490. We'll take a look at that in a bit.
  491.  
  492.  
  493. All right, let's move forward with Cayman.
  494. Hey, y'all, so last week I got libp2p updated
  495. to the latest version.
  496. Fairly straightforward update this time.
  497. Basically, there was a lot of package renaming.
  498. And now js-libp2p is a monorepo.
  499. So if you decide to contribute, just keep that in mind.
  500. It's all in the libp2p/jslib2p repo.
  501. Except for the ChainSafe maintained packages,
  502. gossipsub, noise and yamux.
  503. So now that that's in, that unblocks Tuyen
  504. and some of the things he's doing with gossipsub
  505. and yamux and doing basically any other upgrades.
  506. We could not upgrade any of those dependencies
  507. without upgrading libp2p first because of all the breaking
  508. changes.
  509. And the other thing I worked on was adding a boot node CLI
  510. command.
  511. It's functional, but I just wanted
  512. to see if I could do a little more cleanup on it
  513. and see if we can reuse more code between the beacon node
  514. CLI initialization process and the boot node initialization
  515. process.
  516. So hoping to get that polished this week.
  517. And the other thing I was working on
  518. was digging into Yamux.
  519. So currently Yamux is still slightly slower than Mplex.
  520. And I'm really puzzled why.
  521. But I'm basically trying to add all the little tweaks that
  522. are added to Mplex, little performance tweaks,
  523. to see if that brings things back to a comparable place.
  524. But so far, I have not gotten it,
  525. at least in these very naive comparison tests,
  526. to be one to one.
  527. It's still a little slower.
  528. Like, I don't know, like 5% slower, 10% slower,
  529. something like that.
  530. And I think I was able to see a memory leak just
  531. in the comparison tests.
  532. So sending a bunch of messages, I
  533. was able to see memory rise somewhat
  534. and took a heap snapshot.
  535. But I'm having problems viewing the snapshot.
  536. So as far as I know, there's only a single tool
  537. that can visualize a Heap snapshot,
  538. and that's the Chrome DevTools.
  539. Anyone who knows any other tooling that
  540. can help show a Heap snapshot, please tell me.
  541. But my Chrome DevTools is basically just spinning.
  542. It's basically waiting.
  543. It says building the dominator tree.
  544. >>Can you try the Brave DevTool?
  545. In the past, I used to have the same issue.
  546. It cannot be viewable by Chrome DevTool,
  547. but Brave DevTool somehow works.
  548. >>Brave DevTool.
  549. OK.
  550. Thanks.
  551. Excellent.
  552. So yeah, this week, I'm going to finish polishing up
  553. the boot nodes PR and keep digging into the Yamux,
  554. see if I can squash this memory leak,
  555. and hopefully add some little performance tweaks
  556. to try to get it up to the level of Mplex.
  557. And that's it for me.
  558. Thanks, Cayman.
  559.  
  560.  
  561. All right, let's go with Nazar.
  562. Thank you.
  563. The last-- can you guys hear me?
  564. So last week, I was working on making
  565. Prover with Web3.js 4 version.
  566. There were some issues, so I opened one PR to fix those.
  567. Now it is working fine.
  568. And while I was doing that, I found out
  569. that we have a bug in the browser logger.
  570. So it was not logging anything in the browser.
  571. I fixed that and along when I was doing,
  572. I saw that there are no test,
  573. unit test for like environment logger and browser logger
  574. and other some of the logging stuff.
  575. Because of that, we were unable to identify it earlier.
  576. So I wrote the unit test for the logger package.
  577. And yes, and currently I finished up the Lightclient demo PR.
  578. It's ready for the review.
  579. If someone is available, can we do it?
  580. So that is like using the Prover now
  581. instead of the Light client directly.
  582. And yeah, currently I'm working on a PR
  583. to decouple the beacon and validator in the simulation test
  584. so that we can mix up the way we can mix right now
  585. the execution and the content,
  586. so we can mix up execution beacon
  587. and our validator in the simulation test.
  588. So this PR will be done today.
  589. It's almost gonna finish and next week,
  590. like during this week,
  591. I have planned to start working on
  592. integration with the MetaMask.
  593. I started that draft earlier.
  594. So I already have something to work on.
  595. I will complete that draft and make it
  596. like some first presentable demo
  597. of our prover working inside the Metamask.
  598. Yep, that's all from me.
  599. - Thanks, Nazar.
  600. And we have Matt.
  601.  
  602.  
  603. All right, so a couple things that I worked on this week
  604. were got blast PR done and integrated into load star.
  605. The both of the, or actually all of the rest
  606. of the remaining PRs that were up in the blast repo
  607. all got approved.
  608. So just CI and a couple other things there.
  609. Looking at the metrics in Lodestar.
  610. So it was running slower with four LibUV threads
  611. because we had 10 worker threads
  612. or eight worker threads, depending on originally.
  613. So once I boosted the worker pool
  614. up to match the number of threads that we had,
  615. it definitely seems like it runs a lot smoother.
  616. A lot of the metrics are within five, 10%
  617. of what they were originally by that.
  618. Most of them are a little bit better.
  619. Nothing is really like knocking the socks off.
  620. Like there's a couple of things that look good,
  621. like the number of aggregated keys,
  622. number of signature sets,
  623. and a few other of the blast metrics look much better.
  624. Things like head drift
  625. or the block epoch transition times
  626. look pretty good as well.
  627. Block production times are doing okay.
  628. The long live peers are doing better.
  629. there are a few things that look like they're doing a little worse.
  630. So I don't know if that's just how the implementation happened,
  631. or if it's the library itself.
  632. But overall, it seems like it's either a net or a little bit
  633. better by a fraction.
  634. But the big thing is the CPU time has dropped by 30%.
  635. So whereas if we have 64% CPU on the unstable large,
  636. it's running 48% on feature two.
  637. So there's a lot of opportunity in there
  638. just by opening up resources.
  639. It is a little bit heavier on memory,
  640. but there's opportunities there as well, I think.
  641. And so it could use some more tuning,
  642. but it's getting a lot closer
  643. and it's starting to feel like it's working okay.
  644. But obviously I leave that to everybody else
  645. to just take a look and make sure I'm not missing something
  646. because a lot of these metrics are still relatively new.
  647. And then I also have been playing
  648. with the semi-space and new space.
  649. I deployed several different versions over the week.
  650. And it definitely, the more new space we have,
  651. the better it performs.
  652. We're average garbage collecting 200 gigs roughly
  653. on main net and spending 10 to 12%
  654. of time doing network processing garbage collection
  655. by increasing the new space.
  656. The more we increase it up to that threshold
  657. of what our average garbage collection is currently
  658. on the scavenge side,
  659. it basically just continues to reduce
  660. the garbage collection time, pause time.
  661. I'm gonna deploy one more where I'm gonna go way over
  662. to 512, which should be double of what we're actually doing
  663. for garbage collection to see how that affects it.
  664. I've been paying attention though,
  665. mostly to the smaller nodes to find out like,
  666. because they are actually only doing 30, 40, 50 megabytes
  667. of garbage collection on the scavenge side.
  668. And even though we're setting it to 64, 128 or 256,
  669. it's really not affecting the performance.
  670. So like setting it much higher
  671. than what the scavenge quantity is,
  672. doesn't seem like it affects it on the small one.
  673. So I'm also expecting that same behavior
  674. on the large instances
  675. by setting the garbage collection too high.
  676. But I just wanna just prove out that that theory is correct
  677. by deploying it and making sure
  678. that the assumption is correct.
  679. And I'll do another write up.
  680. I put up some information in there just for posterity.
  681. And then this information is all gonna directly tie over
  682. to the worker thread when we deploy that.
  683. So it's applicable to both whether network is on main thread
  684. or on the worker thread.
  685. And the rates are gonna be the same
  686. because I was actually playing with this
  687. a couple of weeks ago as well with the new space.
  688. And I was seeing the same thing with the same numbers
  689. is why I chose those numbers this week
  690. without the worker thread,
  691. just to make sure that like the space sizing
  692. was commensurate with, you know,
  693. when it's on main thread as it is on a worker.
  694. So it'll apply to both situations.
  695. And then I spent a little bit more time trying to research.
  696. Basically what I did was, is I went through GitHub
  697. and I just looked for the error zero.
  698. And I found it in a couple of interesting places
  699. that I think is really what's driving what our issue is.
  700. And then I was gonna try to spend a little bit of time
  701. this week looking at that, see if I can actually hone down.
  702. But I think it's coming from the serialization,
  703. deserialization in order to cache
  704. and whether we build or don't build in NX in the Monorepo.
  705. Because it basically it's part of the serializer and like during startup and basically it's deserializing and serializing the
  706. like the cache state to see if we've changed is I think what's happening.
  707. I might be totally off base but it's the next thread that I found when I was looking through a bunch of different of the errors.
  708. I posted a few of them into our Discord channel just to see for posterity there as well.
  709. And then hopefully pulling on that thread, we'll find something.
  710. And then I also spent a little bit of time, not a ton, but I did spend a little bit of time,
  711. Blind had put together a, not a POC, but like a couple ideas on a branch in order to how to update the
  712. the blinding blocks and it's a little bit different than the implementation I was using
  713. and he's got some really good ideas in there. So I studied what he did and then brought that into
  714. my own branch and I'm going to go through and continue processing our chunking off bits and
  715. pieces of what he had done and bringing them in and then merging my code with his code and see
  716. if I can get that to run. And then I'm going to start with unit tests on just the transition
  717. functions first instead of trying to build the whole thing and see if I can piecemeal it
  718. and narrow down some of the complexity because it is there's it's it's fairly difficult PR
  719. what I realized last time. So that's going to be probably the big heart of it is anything that's
  720. necessary in order to try to get the little bits and pieces for blast. But again I would love
  721. came in if you want to just tear into the metrics on feature two and like be brutally honest like
  722. this looks good this looks shit let me know what it is so that I can also like kind of under you
  723. know borrow your eyes in order to see what you see so that I can kind of train myself in order to
  724. to understand and make sure that I'm looking at the right things as well that'd be super helpful
  725. and to you as well anything that you guys see would be amazing because I want to make sure
  726. that I'm looking at that correctly. And, and then I'll focus on the blinding box,
  727. because I know that's going to be something that's going to go into 110 or 111.
  728.  
  729.  
  730. All right, thanks, Matt.
  731. Makes me wonder, if you guys have an opinion on whether or not three feature groups is enough
  732. for you to do like all your testing on. That's cool. But if you if you do feel like we are
  733. consistently like running out of server groups to like test things on, let me know I can inquire
  734. with infrastructure to set up like a feature for something. When we're not testing things,
  735. you can also use beta as well. So as just like another way I have been. Okay, perfect. Cool.
  736. Like because I and as a matter of fact, like I was testing 64 128 and 256. And then I put BLST
  737. up also and I took one down so like I was using three this week. I'm going to be taking some of
  738. those down so three will free up. But yeah, I would definitely felt like I was monopolizing.
  739. So I apologize. I'm going to take those down now that I've got a couple days of data on them.
  740. I can I can basically pull them down.
  741.  
  742.  
Add Comment
Please, Sign In to add comment