Advertisement
philknows

Untitled

Jul 11th, 2023
11
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 22.83 KB | None | 0 0
  1. Okay, so welcome to the July 11 stand up. We're just missing the Nazar, but Nazar left some notes here. Just going to quickly read it out here so that we get on the recording. Finished up and created Lodestar test utils package. If anyone found a utility to be reused, please put that in the package. Finished up the end to end test for the prover, made the prover package publishable and working on the feature to track execution engine status without having dependency on the ETH_ namespace. So those were his updates. Going into some other announcements for me last week. I did send you guys a link to the Protocol Guild survey for you guys who are in Protocol Guild. That's something that Trent is requesting us to fill out by the end of this week. So please get to that if you guys can. Also, there is a proposal to change some pretty big things about eligibility requirements.
  2.  
  3. There should be a PR in the Protocol Guild repo for that for you to take a look at and chime into.
  4.  
  5. So that's that for protocol guild.
  6.  
  7. I was in the meeting earlier with the DevOps guys. Thanks Jon, for coming to this meeting as well. We do have a couple issues that they just need some clarification on for continuing some of their work. So under the development repo, we have issue 154, which is the deployment of Lighthouse and Prysm. There was some discussion in there about what the problem is in regards to having a Lighthouse node permanently connected to a lodestar node. The problem is because we don't have backfill enabled, Lighthouse does downscore us when it requests blocks from before the checkpoint. We just need to figure out a way to get around the downscoring events because they will penalize us for not having for basically not being able to.
  8.  
  9. If we sync like from Genesis, then they will not attempt to sync from us and they will not downscore us.
  10.  
  11. Okay, that's a good solution. I'll put that into the issue after.
  12.  
  13. The meeting.
  14.  
  15. And then they can continue on that.
  16.  
  17. Unless you have an additional question, John, about that issue.
  18.  
  19. I think that pretty much solves it.
  20.  
  21. And then we think from Genesis and that solves it.
  22.  
  23. Yeah, because then we'll have all the blocks, basically. Does that affect our speed, our deployment speed?
  24.  
  25. Yeah, it does take significantly.
  26.  
  27. Right.
  28.  
  29. So is that a good solution then.
  30.  
  31. Or is that just solving it?
  32.  
  33. Because I guess it's the only other option that we have now.
  34.  
  35. Okay, there's, I guess, an intermediate amount of development work needed to enable backfill sync for Lodestar. Yeah, that's going to take like, what, two months? And Syncing Lighthouse from Genesis will be one week, so definitely faster.
  36.  
  37. Okay, cool. I just wanted to verify.
  38.  
  39. Okay. And then one other issue that was brought up from that meeting was the memory leak monitoring. So we're constantly getting these alerts in the Lodestar alert channel in which there were some questions about the maximum heap. It's much different now when we enable the worker thread. So there was a question here that was posed, I believe it was to Tuyen, but how is the best way.
  40.  
  41. We should sort of detect the memory leak growth?
  42.  
  43. I'll post the link here just so that there's context.
  44.  
  45. Oh, yeah. Thanks, Jon.
  46.  
  47. So the current expression for the memory growth alert, do you think we should also watch for max RSS and keep use simultaneously? There's a big difference between when we enable and when we disable the network.
  48.  
  49. Right. We could also get back to them.
  50.  
  51. When we have a little bit more thought, but just wanted to bring this issue up.
  52.  
  53. I mean, the definition of a leak is a sustained increase in memory and that's what the metric should track.
  54.  
  55. Yeah. In the case of useWorker is true. It's actually not a leak, but we.
  56.  
  57. Use more memory basically consistently.
  58.  
  59. But it does not increase forever. It just spikes and stay high.
  60.  
  61. Got it. Okay.
  62.  
  63. That'S, I believe, all I had.
  64.  
  65. To bring up from the DevOps call.
  66.  
  67. So we'll move on to some of the planning stuff now. There were some updates, I guess, in some of the research for the network thread. There was a nice post, I guess.
  68.  
  69. From.
  70.  
  71. Matt about some of the topics.
  72.  
  73. He discussed with Ben. If you do want to summarize it for us, that would be super helpful.
  74.  
  75. About what you observe when talking to Ben about the network thread. Sure.
  76.  
  77. So my initial hunch was that it was cache misses was what's causing the latency that we're seeing, but it's actually it's one layer up at the kernel level and it's page vaults. It may because the RSS is increasing just because we're using the worker thread. So it's basically adding a whole other set of heap pages that are being accessed simultaneously. And because of the contact switches, this is where we're starting to get into the weeds. And a lot of what I was doing this week was just researching how some of this stuff has worked in order to be able to talk intelligently with Ben because it's a pretty complex topic of what's actually happening. But he gave some pretty good breadcrumbs of how to run the perf commands in order to try to figure out what you suss out what is actually causing it. Like what's the root cause?
  78.  
  79. There's three times as many page faults, which is at the virtual memory layer. So the application is queuing virtual memory and then the page tables are getting overloaded just because we're loading more stuff in memory actively and as there's context switches between the threads, it's basically it's having to update the page tables and that's what's causing all the page faults. The root cause is because we're using more, but what is the actual resolution going to be is still out of grasp at the moment. And that's kind of the goal for this week is to try to figure out what are the next questions to ask and how to phrase them so that we can kind of start nipping at the edges in order to figure out what we can do. In order to make it better. I implemented just something to try to see what was going on one of the suggestions and I passed in higher NewSpace to the worker thread and it seemed to reduce latency a little bit, but it was more determined on how much bandwidth is basically going through that particular worker at the time.
  80.  
  81. So it was kind of a fake out, sadly. So I'm not really sure why we're seeing so much more bandwidth going through.
  82.  
  83. Some versus others, but that's anyone have anything to add to this update.
  84.  
  85. On the network thread investigations?
  86.  
  87. When you say bandwidth, you mean like memory bandwidth off the wire or bandwidth between work, communication between workers?
  88.  
  89. No, network bandwidth. Like I O come in and out of the network card. There was like four gigs a second of input output versus two gigs a second on the one that seemed like it was performing better. So that's why the metrics look better. It was just doing less work.
  90.  
  91. 2Gb/Second of bandwidth. Okay. Actually 2mb/second I mean.
  92.  
  93. It's on our dashboard.
  94.  
  95. Sorry.
  96.  
  97. Yeah, good stuff. Just keep it up. It takes a while, but I think with the end you'll figure out.
  98.  
  99. So I've been researching because he had also mentioned about C groups and that was something that I really wasn't familiar with and how that interacts. So I've been reading a little bit to try to figure out what they are really was the first step and then where they're coming from and if there's any way in order to tune them and if we're running unconfined. So systemD and Docker both introduce C groups, they're control groups. It's a way to prioritize resources on I don't need to ramble and stand up but that's the next thing. There's a couple more things to start nibbling at and I'm going to put them in that thread as I get more information and try to figure it out.
  100.  
  101. Awesome. Thanks for your work on that this week. Well, I'm going to finish up with Lion on this but we're into our quarter three sort of goals here and one of the main things that sort of sticks out from what we decided in Thailand was that there would be a lot of resources dedicated to performance type upgrades for this quarter. So we're definitely on the road towards that with these sort of large impending issues that we're looking to fix here.
  102.  
  103. So going into that I do want.
  104.  
  105. To propose looking at planning a little bit more v1.10 and just seeing as we've merged in quite a few things already in just the last couple of weeks. Libp2p 0.45, some of the Node 20 stuff, would like to see if there is any opposition to maybe aiming. Towards something for beta either by the.
  106.  
  107. End of this week or at the.
  108.  
  109. Beginning of next week and whether or.
  110.  
  111. Not that includes something specific that you're working on. I already had a thread open on discord for release planning and threw in a couple things in here already. If there's anything that you guys want to add or talk about, feel free to just interrupt me.
  112.  
  113. When is 1.10?
  114.  
  115. I would like to aim to have a beta by the end of this week so we can throw it into testing over the weekend. That's generally like a good rhythm I feel like that we have.
  116.  
  117. And that.
  118.  
  119. Way we can take a look at an RC.0 results probably by next stand up and just see what's going on with it.
  120.  
  121. Cool.
  122.  
  123. There's no opposition to it. The milestone is in there. I'm probably going to have to end up moving. A couple of things I'll discuss with you maybe offline Nico about some of the feature request things. I feel like there's always stuff that we want to put in there, but we never get the time to do it. And then obviously, the larger stuff gets in, and then that's definitely the catalyst for the releases. Similar to the hard works, where there's just like, one or two large things that we love to get in there, and then it's like all these optional things that never make it in sometimes, so we'll just have to push those along. It's just more important to get the performance stuff in there. Does anybody have anything else that they want to sort of bring up or discuss for planning at all?
  124.  
  125. No good. But I'm down with having a release now.
  126.  
  127. Now? As in like today?
  128.  
  129. No, in other times I was at a position that let's wait for these things. I think let's put these small things out while we work on the background on the other big ones.
  130.  
  131. Yeah, things seem pretty stable and I don't think anything that we're even anticipating adding by the end of the week is going to be a risk. So it's a good time.
  132.  
  133. Okay, sounds good. So let's look at, I guess, 5704, which was the subnet stuff Tuyen was working on. Let's try to merge in the last couple of things here. I think Nico also wanted to add 5746, the closed Peer Manager in there. And then once those two or if you guys identify anything else that can be squeezed in, let's just push out something and let it run.
  134.  
  135. Can we target the end of the week for cutting the release candidate?
  136.  
  137. Yeah.
  138.  
  139. We can definitely do that too. I saw that there was also, like, your IPV6 as well that was merged. Absolutely fit that in there too.
  140.  
  141. The upgrade to discv5.
  142.  
  143. Sorry. I think I cut out. I think my connection was bad. Can you repeat that?
  144.  
  145. Oh, yeah. No, I think I also saw that you merged IPV6 as well. We just got to upgrade that. The discv5 too. I think we could squeeze that in at the end of the week as well. Potentially.
  146.  
  147. Yeah.
  148.  
  149. Anything else for planning?
  150.  
  151. Okay.
  152.  
  153. Otherwise we'll continue that on the release planning thread on Discord and we'll target.
  154.  
  155. End of the week for an RC. All right, let's go straight into updates.
  156.  
  157. Thanks Afri, for joining us. Also, did you have anything you wanted.
  158.  
  159. To bring up for the team here while you're here?
  160.  
  161. No, but thanks for asking. I was checking in with Danny later. I just wanted to make sure everyone is fine. I was on vacation for three weeks, so just make sure. Still standing before I check in with Danny. But it's looking good so far.
  162.  
  163. Cool. Yeah. We haven't exploded yet, so everything's under control still.
  164.  
  165. Are there any updates for Protocol Berg? Anything you wanted to? Really excited, by the way. I'm just curious if there's anything.
  166.  
  167. What do you want? So, I mean, with regard to lineup, it's really tough. We have so many high quality applications, like even though we don't have enough spots for all the straight accepts. So we actually need to reject people that are like perfect on point, best speakers, best topics, and it will be really tough for us as an organizer. And maybe next time we should do two days, but other than that, I don't really have updates. You should plan to come if you can, and if you want to come to Berlin as a team. I also have this offer up for the other teams and you want to do some onsite stuff, be our guest. We might do rent like a workshop room in a co working space. So if you want to sit down as on site, as team or a subset of a team, you can let me just know and I make sure to block a room for a day or two.
  168.  
  169. That's the start.
  170.  
  171. Nico, are you going to come the week before or the week after?
  172.  
  173. Yeah, so I planned the whole week, so I will just be at the office, I guess. Or we can go to the side events.
  174.  
  175. Okay, cool. There's going to be a few of.
  176.  
  177. Us there and we're going to be pretty cool.
  178.  
  179. Yeah.
  180.  
  181. Also make sure you fill out that spreadsheet so that we know who's coming, what their plans are. We can coordinate especially because it's hard to get accommodations generally in Berlin.
  182.  
  183.  
  184.  
  185. Okay, let's continue on to some updates.
  186.  
  187. Anything interesting from the other half of the verkle trie stuff there?
  188.  
  189. I guess we'll have to sort of rebase the verkle branch on Shanghai. I think I looked a little bit into Lion’s work over it and maybe it's not that difficult and I'll see maybe if we can try to get into the next launch relaunch of the testnet. That is my hope. But apart from that, in the other half of the verkle call, what was being discussed was about the gas cost regarding transition and all that.
  190.  
  191. Right.
  192.  
  193. So nothing that really impacts the CL side.
  194.  
  195. Awesome.
  196.  
  197. Yeah.
  198.  
  199. Feel free to go on with any of your updates. Anything about devnet seven?
  200.  
  201. I had a light, relatively light last week and because I was traveling but I was tracking what was happening in devnet seven. So far we are saying I'm not having any issues. Then basically I did a PR for definite regarding Attestations validations attestation update. So that PR. There's some discussion going on and thanks Lion, for the suggestion. I'll implement the gossip validations per spec. I have a question over that. So the transition to the Deneb validations will be based upon current slot and not really attestation slot, right?
  202.  
  203. Yes, that's what it is.
  204.  
  205. It is idly the head states slot that will actually go into process Attestations, but we can assume the head state is current state because I have a reference of chain in the Attestation validation.
  206.  
  207. You don't get the fork from the head state, you get the fork from the topic.
  208.  
  209. Yes. That way then it will be pretty easy.
  210.  
  211. Yes.
  212.  
  213. All right.
  214.  
  215. So apart from that, I've been again reading upon EIPS and also actively reviewing the BLST node API update, node API PR. So I am now onto the second PR by Matthewkeil and basically maybe gone halfway through it and we'll continue and we'll try to so this week what I'll try to do is try to run the current Lion’s verkle branch on the current verkle test net Constantinople so we'll figure out what to do next after.
  216.  
  217. All right.
  218.  
  219. Thanks, Gajinder. All right, let's move on with Nico.
  220.  
  221. So I was looking at a bunch of different issues. So one was reported by a user, which I think is not an issue in Lodestar itself. So basically this was about using the Loadstar API package to get the state from the Beacon node and basically get the tree root. So looks like that's an issue. I found in Prysm that their state API returns invalid response. And while debugging this, I also found looks like confirmed bug in Lighthouse, which also in the new archive implementation, they implemented as also bug. So there's basically a different result when you calculate the hash reroute from the state is different from the value that's in the block, basically. So, yeah, this issue, I think, seems to be solved. There's nothing from our side to do besides that. I also found that the other issue where the Beacon node does not shut down is not yet resolved.
  222.  
  223. But I think I got the issue for that. I think this makes sense if someone else can quickly review this, if that makes sense. The PR that I put up, and I'm still testing that because last time also I thought it's fixed, but it was not yet fixed. So definitely want to give this another one or two days of testing. Other than that, I also tried to figure out that issue I opened for a while where some users reported that their big node takes really long to find peers. So I think the issue seems to be range sync and that we just get disconnected a lot and get a lot of disconnects when doing this block spy, range APIs or network requests. But yeah, I'm not yet sure what's the root cause of this. Yeah, and other than that, I'm mostly focusing around understanding how regen works, mostly mapping out components around region that affected basically that trigger region and that consume that to wrap my head around how it all works together and where we can improve it.
  224.  
  225. So, yeah, I guess this week I will look at some minor issues that we or that I put up for one nine that we should address in version 1.10. And yeah, then just continue focusing on the region topic.
  226.  
  227. Great. Yeah, sounds good to go. Okay, next is Tuyen. Hi.
  228.  
  229. So I rebased the PR to subscribe to the PR subscribing to 2 subnet per node.
  230.  
  231. So there's a new flag and the.
  232.  
  233. Flag name is deterministicLongLiveSubnet. By default it forms, but when it's true, we will subscribe to exactly two subnets based on the node ID and it will change per 256 epoch. So when we enable this flag, it will reduce a lot of traffic because right now we subscribe to random subnets based on the connected validators. But with this flag, we just subscribe to two subnets. S,o I would like to include it in v1.10 and as it in the node, one of our mainnet nodes. Other than that, I work on verifying signature set with same signing root. Have some discussion with Matthewkeil, and I think Lion will review the PR soon too, so please have a look.
  234.  
  235. Other than that, I work on the.
  236.  
  237. Multi address report, trying to catch the path in the constructor when we receive a string. Cayman, please have a look. I think I received some comments from Nazar already. We'll try to address that tomorrow.
  238.  
  239. And.
  240.  
  241. Next I will work on prioritizing signature set from API.
  242.  
  243. That's it for me. Awesome.
  244.  
  245. Looking forward to that one. Looks like there's a lot of great improvements in that one. All right, moving on. Let's go with Matthew.
  246.  
  247. Kudos to Tuyen. That was an awesome multi signature PR that's going to make a huge improvement. And to Lion for updating the dashboards, that was nice as well. And I think that's something that would be a great focus for the performance. Actually, we should put that into next quarter. It's just making our dashboards easier to understand and a little better organized. And I think that would be awesome because we have a lot of telemetry there, but it's sometimes a little hard to pull the data off of it. And this week was I didn't get as far as I wanted with the blinded and non blinded blocks. So first I wasn't quite as far along as I thought I was at the beginning of the week, and it's a lot bigger than I got. A lot of the stuff brought into the other sections of code as far as Regen and backfill and the other places, the API, the two repositories are all built out, but there's just a little bit of testing that needs to be done there.
  248.  
  249. And I was focused with Ben, so that was a lot of the time. Was just trying to make sure that process was happening. I also spent some time with Gajinder. Thank you very much. Again, got the first PR approved.
  250.  
  251. Yay.
  252.  
  253. Done.
  254.  
  255. Love that.
  256.  
  257. And the second piece is actively going through, so hopefully that will be able to get brought in soon because that's going to make a huge improvement. It's going to be great. So that's going to be a focus. I want to try to get the blinding and unblinding over the hump. I just really didn't spend enough time on it this week with the other stuff. And then kudos to Nico as well. That was cool to work on that SC stuff. So very cool.
  258.  
  259. Cool things all around.
  260.  
  261. Thanks, Matt. All right, and we got Cayman.
  262.  
  263. Hey, y'all. So there's an update on the multi formats TypeScript saga. Is my connection unstable?
  264.  
  265. You're good?
  266.  
  267. Okay.
  268.  
  269. Yeah.
  270.  
  271. The update is basically we're going to have in person or a call with the current maintainers, and basically we're trying to get this library converted to TypeScript, but the maintainers want to use JavaScript. So basically we're going to get on a call and figure it out and whatever happens will happen. So I was in some discussions about that last week. Other things I have a PR open for, I think I should have just merged it for a bug that was introduced with the new libp2p. They were emitting some objects that don't conform to their types. So there was an unhandled error. Uncaught error that was reported by a user. So we got that fixed. And then the other things, I'm working on fixing the end to end tests for the node 20 update. It seems like some of the error types or errors being thrown are different between node 18 and node 20.
  272.  
  273. So I'm kind of figuring that one out right now. And other than that, merged to the discv5, IPV6 support, and I've got a local branch where I'm integrating that should get that pushed this week and hopefully get that added to Lodestar for v1.10. So I'm going to be pushing those two things across the finish line this week.
  274.  
  275. Good stuff, Cayman. All right. And we got Lyon.
  276.  
  277. Hey, all, how are you doing? So last week, finally open PR for testing Whisk. Now the spec is executable, so that part is done. I'm working now on testing the POC that was rewrote in Rust. Hopefully that's if we can do a test net and that will be done, then from them on, will we just do politics and see what we can do. So the library that Whisk is using, it's written originally in Rust. That's why the original proof of concept is in Rust. There is another one written in Python for the specs, which was incredibly slow. So I changed to use another crypto backend that's faster. Fun fact, there is another guy that jumped out of nowhere. I don't know where this guy come from, but he wrote the whole library in Go. So now a Go implementation is possible and at some point someone will have to be taken care of.
  278.  
  279. Probably take the rest one, change the back end to BLST and get some bindings going, which is grueling work. So we said we will not do that until there is tentative inclusion of the feature somewhere. Then, yeah, we continue with the Max EV proposal. I think we figured out last week how to do execution layer partial withdraws. I'm not sure if you guys remember from the last conversation, but the conclusion was that was necessary. The original designs were ugly. I think we found one that it's not ugly. So good stuff. Yeah, that's it.
  280.  
  281.  
  282.  
  283.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement