philknows

Untitled

Jun 28th, 2023
11
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 20.86 KB | None | 0 0
  1. Everyone, welcome to the June 27th stand up.
  2. Good to have you guys here. Let's get on to some points.
  3. Okay, let me pull mine up here.
  4. I had made a small proposal for a patch release that may include, let's see here, 5714 and
  5. 5708.
  6. They're more so just like the annoying type of logs that we're printing from, I think
  7. was like the duplicate...
  8. seen duplicate blocks, I believe,
  9. and also the node is syncing,
  10. or the syncing logs.
  11. So I was wondering if you guys are on the same page with this,
  12. like, should we push out a...
  13. a patch release with some of these PRs here,
  14. and should it include anything else?
  15. Does 1.9 include gnosis, Shapella?
  16. - No, it doesn't.
  17. Not yet.
  18. - We could sneak that in.
  19. - Sure.
  20. That sounds like a good idea.
  21. - Also, there is a bug in our metrics.
  22. There is a one-line PR that came and merged.
  23. We should add that in too.
  24. - Got it.
  25. Do you have the PR number by any chance?
  26. - No.
  27. - No, okay.
  28. No problem, I'll look for it just in case
  29. it doesn't capture here.
  30. Any other ideas for things that should go in there?
  31. I'm looking for, I guess, more like things
  32. that are not large changes or anything,
  33. but are annoying to users.
  34. - I mean, we could include a fix for the beacon node
  35. not shutting down in few cases,
  36. and then we can just revisit it for 1.10 maybe
  37. and do a proper fix.
  38. Yep.
  39. That sounds good.
  40. That was the issue that SeaMonkey was reporting, right?
  41. Yeah, I think it's the same thing I already noted down in the issue.
  42. It happens in few cases.
  43. And I think it's just safe to just exit the process explicitly right now.
  44. I think this has the potential of have collateral damage.
  45. So if it's not super urgent, best to do later?
  46. Or what do you think?
  47. Like, imagine by some chance there
  48. is something that causes the process to exit too early,
  49. and then we don't persist something,
  50. and then the node goes into a bad state?
  51. I don't know.
  52. I think there is more risk on that PR than the others.
  53. Yeah, but the thing is, so we would just
  54. do it after all the close sequence is
  55. done in the beacon node.
  56. But yeah, I guess it could have some side effects.
  57. But on the other end, I mean,
  58. the beacon node was exiting way earlier
  59. for several months, I guess.
  60. I'm not sure how long that other issue was there, but yeah.
  61. - Yeah, my feeling is it's pretty safe.
  62. 'Cause like you said, it's after all the things
  63. are supposed to be closed.
  64. - Yeah.
  65. I guess it will mostly be annoying for people that don't use a process manager because they
  66. would have to press control C twice.
  67. Because usually, I mean, Docker and other like system view would anyway have a timeout
  68. after 30 seconds and then just force close the process.
  69. Okay.
  70. In that case, we could definitely consider it and make a decision as we're getting closer
  71. to that PR getting merged.
  72. Any other ideas for inclusion for 1.9.1?
  73. Otherwise, we'll just go with that for now and then discuss any additional stuff async.
  74. Okay.
  75. So we had a plan basically to after we released 1.9 to look into including node 20, then eventually
  76. upgrading libv2p while dealing with some of the network stuff which with line line has
  77. been dealing with with Ben on.
  78. Do we have any sort of observations or concerns, I guess about node 20?
  79. Cayman had deployed that on feat one and Tuyen had some notes here that I can read out as
  80. well.
  81. Okay, so just for the AI bot I guess I'll just read out what Tuyen wrote.
  82. Set on feature one mainnet node, the first garbage collection pause time rate is greater
  83. than equal to 50% higher than stable mainnet while the second garbage collector pause time
  84. rate does not look correct scavenges greater than 10 million percent.
  85. Whoa.
  86. Okay.
  87. But it's a bug that we fixed.
  88. Okay.
  89. In the metrics.
  90. Got it.
  91. Okay.
  92. Seems feature one main net is able to process 50% more attestations 12 K versus eight K
  93. per slot.
  94. The metric till become head in feature one mainnet is consistently better.
  95. It's 2.4 seconds versus 2.6 to 2.9.
  96. Looks interesting only the garbage collection pause time rate number looks strange.
  97. Is there anything else people want to add from what they observed on feature one?
  98. So that metric, that 10 million, it should be removed six zeros on the end.
  99. So it should be 10%.
  100. I did do like a very quick comparison between between feature one and unstable on our
  101. stable on mainnet and did see like some improvements, I believe, on some of the tracking that he
  102. did for...
  103. Let me just pull it up here so I can read it out.
  104. The event loop lag, I believe, looked better on feature one.
  105. There was less head drift.
  106. I think some of the Prometheus grade duration
  107. and API response times were better.
  108. Did anybody notice anything like that?
  109. I feel like there are definitely some performance.
  110. - Yeah, I was looking more for like,
  111. I was trying to find, I don't know,
  112. errors that were happening that weren't happening on node 18.
  113. And I didn't see anything.
  114. So my feeling is that if there's no--
  115. like if nothing's breaking and we're
  116. wanting to move to node 20, then we should probably do it.
  117. I guess it also is good that the metrics look happy
  118. and there are no performance progressions.
  119. Sounds good.
  120. Okay, great work on that. What else do I have on here? I guess, yeah, you spend most of the week
  121. on the Node 20 stuff. If you have any updates on the P2P or anything on the network thread, guys,
  122. feel free to just give us a quick update on those.
  123. >>My branch is ready to test.
  124. I've fixed conflicts and merged in the, I guess,
  125. the latest unstable.
  126. Maybe I need to merge it again today.
  127. But yeah, I guess I can snag feature one
  128. and deploy the new libp2p there.
  129. it. Right. Okay. Um, what else? Um. We in also had, um, an update to include. Um, we're
  130. which was the implemented deterministic long-lived subnet.
  131. But just for the recording here, he wrote,
  132. The main change is to always connect
  133. to exactly two subnets per node instead
  134. of based on number of validators,
  135. which reduced the subnet mesh peers a lot,
  136. hence the I/O lag issue.
  137. So of course, this is important for home stakers,
  138. and then he would be able to enable
  139. a new feature flag, deterministic long live
  140. a test for the next release.
  141. And then he's also debugging something with Nethermind
  142. as well.
  143. Any other points to add to that or to planning?
  144. Looking forward to that fix, or I guess that change
  145. that Tuyen’s doing to the long-lived subnets.
  146. That should really help intermediate level stakers,
  147. I guess, do less work.
  148. - Yeah.
  149. Cool.
  150. So if there's no additional points for planning,
  151. We can go into updates then.
  152. Okay, today we'll start with Lion today.
  153. Hello.
  154. So I keep focusing on a bunch of spec things.
  155. Basically, more of what we speak last time.
  156. And thank you all for the feedback.
  157. I shared with the researchers and I think we're drafting an EIP already.
  158. So yeah, that's all for me.
  159. Awesome. Thanks, Blaine.
  160. Next, we have Gajinder.
  161. Thanks, Phil.
  162. So basically, I...
  163. So there was an issue on DevNet 6 last time that I also talked about where Lodestar was not able to
  164. sync blobs by range. And I thought it was the issue of Lighthouse because they were returning
  165. 32 blobs instead of 24 blobs that were required in a particular epoch. But similar issue was
  166. observed with the clients as well. So I went debugging and figured out that, you know,
  167. so in our max request, we were not passing. So basically, we were not having the factor of
  168. multiplying the count by blocks per slot, because count is generally number of slots
  169. that you are requesting. So there was a mismatch. And that was causing us noticing,
  170. but this PR resolved the issue. Another thing, I mean this resolved the issue in the sense that
  171. we started moving forward, but then we never actually sync to the head. And the reason
  172. then I debugged again and figured out the reason for that is that, so in this entire range,
  173. the chain wasn't finalizing because only a few nodes were up, mostly Lighthouse. And since the
  174. the chain was not finalizing.
  175. So let's say for more than 30,000 slots.
  176. So we would basically stall after syncing about 15,000
  177. or 11 to 15,000, basically, you know, we would stall.
  178. And then I started looking into the logs,
  179. all the request responses were going in and out.
  180. That was not a problem.
  181. So, but what I figured out was that,
  182. so we would start, we would add a peer
  183. and start syncing from it.
  184. and over some period of time, it would disconnect.
  185. And then we'd reconnect with that peer or some other peer,
  186. and it would start syncing again from the last finalized,
  187. which was like epoch 16.
  188. And because whatever jobs we create,
  189. we create with respect to finalize.
  190. So whenever we connect with a new peer and start syncing,
  191. I think we start syncing from finalize itself.
  192. So we need to sort of figure out a solution for this
  193. where the chain has not finalized for, let's say,
  194. so many 30,000 slots.
  195. And we have to figure out, okay,
  196. whether the new peer which is joining in,
  197. can we put it on the chain that we have already synced
  198. instead of starting it from the last finalized
  199. that we have in our local chain.
  200. So I think we might need to tweak our sync process
  201. for syncing for the scenarios where,
  202. I mean, it's a net scenario, but other clients can do it.
  203. So, and in test nets, this scenario can occur frequently
  204. and it has been occurring.
  205. So we basically should tune up our sync process
  206. to sync to enable us go through this scenario as well.
  207. There was no other error I could figure out,
  208. just that, you know, it starts syncing whenever,
  209. eventually, basically the peer disconnects
  210. and we'll start syncing with new peers
  211. from the same finalized epoch that we had in our local.
  212. So this is, I will be,
  213. that is something that I have in my head to work on.
  214. But all of the things were sort of resolved for DevNet 6.
  215. Now DevNet 6 is going to be relaunched as DevNet 7.
  216. So no real work needs to be done over there,
  217. unless something else comes up.
  218. And for DevNet-- so the earlier DevNet 7 is now DevNet 8.
  219. And basically, it will be launched in two weeks' time.
  220. And two PRs for DevNet 8 are already in,
  221. and one PR I'm working on.
  222. And I also did PEEP and EEP presentation
  223. for direct changes, basically, build and edit.
  224. I think the recording will be out soon.
  225. Thanks, Gijinder.
  226. Okay.
  227. We will move forward now with Nico.
  228. Hey, so I was investigating the issue I mentioned last time about the doppelganger protection.
  229. So I found that we definitely have a chance there that we have false positives.
  230. I basically noted down the issue on GitHub.
  231. Yeah, I also checked what other clients do, how they implemented it.
  232. And I think our implementation is pretty similar to Lighthouse.
  233. So basically the issues that we check for attestations in blocks, and those can happen
  234. in the next epoch, which we then check
  235. and that causes the false positives.
  236. So I'm not sure how to properly resolve this
  237. other than just increasing the wait epoch time
  238. by one more epoch, which would be not that great
  239. in terms of UX.
  240. The other thing I wanted, or I also checked
  241. is if we can maybe implement zero downtime,
  242. doppelganger protection, which would be possible, I think,
  243. if we just check the local attestations
  244. that the client produced.
  245. And if there's an attestation in the previous epoch,
  246. we could skip doppelganger protection for that validator.
  247. I think that should be secure,
  248. but yeah, I still have to look into that
  249. and how to implement that.
  250. So there was one issue that the registration of signers
  251. is sync at the moment and we would have to make it async,
  252. which just causes a few downstream issues in the code.
  253. So I haven't really further looked into that,
  254. but I think in theory it should be secure to do.
  255. I think it's even more secure than always waiting to epochs
  256. because if you would start two validator clients
  257. at the same time, one with a DB and one without one,
  258. doppelganger protection would not work
  259. because both would idle and then both would start
  260. a testing at the same time.
  261. And with that approach,
  262. since you can only connect to one database
  263. or the database has a lock mechanism,
  264. this would not be possible because two validator clients
  265. cannot connect to one DB.
  266. So I guess it would even be an improvement
  267. unless I missed something
  268. in terms of the security around this.
  269. Yeah, so besides that,
  270. I looked at the latest speaking API spec,
  271. created some issues for that.
  272. Yeah, and just checked why the spec tests are failing.
  273. And yeah, so this week I really want to focus
  274. on the whole region topic.
  275. And yeah, that will be mostly it, I think.
  276. Thanks Nico for posting up those implementation things
  277. that were missing there.
  278. I have previously had people ask me
  279. how to get started on Lodestar.
  280. And we've always had trouble finding people who--
  281. or issues that were easy enough for people
  282. to get started or tackle.
  283. So definitely if any of you guys spot anything where it's like,
  284. hey, this would be a good intro for somebody
  285. to pick up Lodestar and to start understanding it.
  286. We definitely need more of those help wanted
  287. or first time issues tagged.
  288. So thanks for that.
  289. All right, moving on, we have Cayman.
  290. - Hey, yeah.
  291. So last week, we put in a few PRs to get Node 20 ready
  292. to put it over the weekend.
  293. And now that we've got consensus on it,
  294. I will put in another PR to bump all of our CI to node 20
  295. and also to bump our Docker builds to node 20.
  296. Other than that, I was doing some research,
  297. just getting caught up on the discv5 spec
  298. And some improvements that are happening there.
  299. There's a discv5.2 that is in progress.
  300. Right now we're on 5.1.
  301. So looking at--
  302. just getting prepared.
  303. I don't think we need to implement anything right now
  304. because the spec is still not fully formed.
  305. Don't want to do it too soon and then have things change when
  306. And there's no real benefit to us.
  307. And other than that, I was looking into some--
  308. I was looking at other libp2p issues,
  309. working on--
  310. still trying to convince these guys
  311. to use TypeScript for some of their lower level libraries.
  312. I know some of you guys are subscribed to that issue.
  313. And there's a lot of back and forth.
  314. But the maintainer just reviewed my PR
  315. to convert to TypeScript.
  316. So there's movement.
  317. And he said he doesn't hate it.
  318. But that his review does not mean that he checks off on it.
  319. So it's still in an intermediate area.
  320. Yeah.
  321. So yeah, this week we'll deploy the libp2p branch today
  322. so that we can start getting some metrics
  323. and see how that goes.
  324. But I'll be working on that.
  325. And if I get to it, also trying to add
  326. IPv6 support, which is almost all there.
  327. Just haven't actually hooked it all together.
  328. So that's it for me.
  329. Thanks, Cayman.
  330. Speaking of IPv6, I think Lighthouse
  331. has some boot nodes in IPv6 out there right now.
  332. - Wonderful, okay.
  333. - Yes.
  334. I will try to find the reference for you.
  335. I remember reading that somewhere, but yeah.
  336. - Great.
  337. - And also speaking of boot nodes, John,
  338. we've been wanting to actually deploy some boot nodes
  339. for consensus and execution to help with the diversity.
  340. I can talk to you more about that later as well on a 101.
  341. - Sounds great.
  342. - Cool.
  343. - Are we compatible?
  344. Because Barnabas and Pari were asking me about this.
  345. - IPv6 compatible?
  346. No.
  347. Everything is not deployed or published yet.
  348. All right, do you have anything to add, Dizar,
  349. since you've been back?
  350. No?
  351. If not, we'll move on.
  352. - Yeah, I am back since yesterday, starting on PR.
  353. Hopefully we'll finalize that PR today for review.
  354. It will add some test cases, end-to-end test cases
  355. for rover package, and it will introduce a new package
  356. called testutils, which will clean up a lot of code
  357. which were scattered around testing
  358. in different packages into one package.
  359. So if you guys came across any useful thing
  360. that you think can be helpful
  361. or where you reuse during testing for other packages,
  362. then you can include that in this new package testutils.
  363. And apart from that, I had one on one with line
  364. in the morning and we decided on the direction for the prover further.
  365. And the direction is that after finalizing this end-to-end testing for the prover, we
  366. will mark this public and that will make sure to release, like the first release of the
  367. prover package.
  368. And then I will use this first release into our light Client demo that we have, showcasing
  369. the light client functionality. So that will help to minimize or reduce a lot of code inside the
  370. light client demo and will help to troubleshoot and actually use this prover package in real
  371. production case in the demo. And once that is done, then I will start working on MetaMask snaps
  372. to create a snap as a proof of concept for the prover. That will be the discussion
  373. initiation for MetaMask team. For production we are not sure if that is the right way,
  374. like MetaMask snaps is the right way or not or it should be part of the MetaMask. But
  375. once we have a snap then we can initiate this discussion with their team on it and then move
  376. forward. That's something I will be doing this week. Tomorrow I will take off. If you
  377. guys know, there's a Christmas-like event for Muslims tomorrow, Eid-ul-Adha, the slaughtering
  378. Eid. So I will take off tomorrow. Thank you.
  379. Thank you, Nizar. All right, and we have Matt.
  380. I got the blst package finalized and fixed the bug.
  381. So it's in the newest format that Gajinder was reviewing.
  382. And it seems like it's working nice, got it deployed to Feature 2,
  383. and metrics looked good for everything that it had touched.
  384. and also turned on worker and the metrics improved even further, which is great.
  385. So it's looking nice. We actually did get it deployed to mainnet as well today.
  386. Mainnet feature 2, Tuyen was using.
  387. And so we did get it deployed there so we can see how it's doing on mainnet.
  388. And then also picked up the database duplication ticket
  389. and dug into state and SSZ and whatnot.
  390. And I was hoping to get that done for today,
  391. but it's actually kind of a tricky little problem there.
  392. But going smoothly.
  393. And then I'll probably do another round of review
  394. with Gajinder this week on the next part of blst,
  395. and then finish up that PR and probably pull the next one,
  396. possibly the one that you put in the channel just after,
  397. because it was similar state stuff came in.
  398. So I think maybe I'll...
  399. I think it was mimicking Lighthouse, what they're doing.
  400. I'll check that out, though.
  401. Cool, thanks, Matt.
  402. John, do you have anything you would like to bring up at all
  403. for the team while we have you here?
  404. Not at this time.
  405. Okay.
  406. Cool, guys.
  407. I have nothing else other than ETH Waterloo was awesome.
  408. I think they, I think 50% of the hackers there were actually new to the Web3 ecosystem, which was pretty cool.
  409. So getting people excited and hoping to get more people to look at other parts of the ecosystem outside of just the DAP stuff.
  410. But yeah, I think it'll be pretty cool, you know, to see these hackathons attract people
  411. away from other parts of tech and into blockchain.
  412. I did have a conversation with, I don't know if any of you guys know Sebastian from HopperNet.
  413. He was kind of interested in some of the validator privacy stuff and introduced him a little
  414. bit to like, the what Nym and Chainsafe were doing, and how they implemented that into
  415. Lighthouse and sort of got them caught up on that. There might be some potential discussions
  416. about having like having us get involved in terms of trying to implement this spec, perhaps
  417. or testing it out and gathering some data. I believe I talked to Cayman a little bit about
  418. this already. But, but yeah, validator privacy is something people are looking to tackle.
  419. I guess not just in protocol, but also, also in the dapp layer, with like stealth addresses
  420. and stuff like that. But, but I guess the concern for us would be more so the privacy
  421. the protocol layer. Seb was the guy who did this experiment to sort of correlate validators
  422. and IP addresses and stuff and was basically able to list out, you know, a bunch of validators
  423. IPs correlate them and even potentially DDoS some of them. So he's actively looking for
  424. a way to improve privacy at the protocol layer if anybody was interested. And he's also working
  425. on them I think like a private RPC stuff. I think that's what Hopper net does.
  426. Cool. Anything else to bring up for stand up this week.
  427. Next Tuesday is July 4.
  428. America Day.
  429. Merca.
  430. All right. Well, you guys have have fun on your Independence Day long weekend. I believe
  431. Chain Safe, technically, like especially for the Canadians also will have the third off. I believe
  432. it's the Monday. Yeah. July 3rd. The Canadians will celebrate Canada Day. So.
  433. And then is Chain Safe days on the 30th?
  434. Yeah, so technically, yeah,
  435. chain safe days on.
  436. Yeah, so there's quite a few.
  437. Days to enjoy yourself as well.
  438. But knowing us will probably
  439. be around here and there so.
  440. Yeah, I'm gonna be working, yeah, hi.
  441. I'm planning on.
  442. I'm planning on taking tomorrow off.
  443. That way I.
  444. Along with this are not for.
  445. Not for the same reason.
  446. Cool. All right.
  447. Sounds good, guys. See you guys in the metaverse then.
  448. Cheers.
  449. One last thing. Protocol discussion on Thursday, I forgot. So we're going to talk about like
  450. clients and see the prover. Yay, Nazar! See the prover. And yeah, very cool. Come on down,
  451. everybody. Be cool. Yeah, let's do it. Cool. Thank you. Take care, guys. Have a good week.
  452. Bye-bye. Bye-bye.
  453. [BLANK_AUDIO]
  454.  
Add Comment
Please, Sign In to add comment