Untitled

Recording in progress.
Okay, so I'm doing this from my phone today, so we'll see how this goes.
Wasn't able to charge my laptop in the car, but figure this out. Anyway, I think Nazar is stuck in something and also Gajinder is in his virtual call, so
that's why we're recording this.
Okay, to get started today here.
Let's talk a little bit, I guess, about some of the issues that we're getting on the Big Boy testnet. I think Lion has really outlined it pretty well in some of the issues.
Um, but I don't know if there's, um, additional discussion that you might want to have while we're all on the call here about this, but, um.
But, yeah, if there's a any sort of comments.
In regards to what line has written up in issue
5855 and then 5857.
Please feel free to go ahead.
By the way, today... I was just going to add...
Go for it.
Oh, yeah, I was just going to add, I think,
Lion already messaged, but they're doing another DevNet,
I guess, replicating the mainnet environment
with the same amount of validators and allocation, I think.
And that should be happening today at some point.
But yeah, go ahead, Cayman.
So I thought that the reducing cache beacon state size issue is very interesting.
Yeah, it's the huge kind of change that can give a huge result, but it's also a huge,
like a really big change.
So I'm thinking about other options
that are way less invasive and are kind of patches.
And we may not have to do this whole crazy thing.
Because it's really-- if we can represent pubkeys
and withdrawal keys efficiently, that's basically--
I mean--
OK.
- Yeah, we could have more gains, but it's not that bad.
- So, pubkeys and withdrawal keys.
Right now, we don't even support forking
of the pubkey cache, do we?
- That's a separate thing.
And like in reality, on the happy case,
all of this data is structurally shared.
So it's not that bad to be honest.
Like we know in terms of heavy forking,
it's gonna die, we're gonna die anyway.
So yeah, so a single state is very big,
but you usually only pay it once.
[silence]
What I'm worried and I do not understand yet is
why did memory went up to 12 gigs?
That doesn't have anything to do with the state being big.
This is a problem of the cache.
Yeah, and I think, Tuyen, did you also see on one of the servers as well, like that we're
nearing 12 gigs of memory somewhere?
Remember reading that as well.
And I'm not sure if that has anything to do with it.
No, I just see it in the BigBoy net only.
I think there are more than 20 checkpoints there.
Maybe that's the reason.
Okay. Well, we should be getting some more results and to sort of see how it does as well with the network worker enabled as well.
Um, Lion already passed all the flags for them to run on the new V1.10. So, um, we should hopefully get some more data shortly.
With this new test that but, um, yeah, in terms of the, um, the sort of next steps to resolving this problem.
I don't know if there's anything else that you guys want to add to this discussion about
how to resolve it, but if not, we can continue forward and take that async.
Okay.
[silence]
Okay.
All right, so we'll wait out and see the results also of the next test or DevNet
that they're putting up later today.
The next thing on my list here was just to quickly discuss some of the memory leak stuff.
I guess most of it was potentially resolved through upgrading to Node.js 18.17.
I guess we haven't confirmed that yet.
If that's correct, right, Tuyen?
Yeah, I think there's no leak there.
However, the RSS is 12 gig.
It's way bigger than current version.
So I'm not sure if we, if it can run on normal instance,
'cause I just test on AX-41 only.
Maybe work to deploy to the whole group.
- So I think, 'cause I deployed a bunch of different SHAs,
and I think I found between which two commits,
something changed, and I think it might be
the native fetch.
So it's possible that we could back that back out
until whatever that is gets fixed
'cause that is a known issue.
So it's possible we might be able to revert that
and not see the memory issue, like the gains.
- But a fix for that was already in April, I think.
So if you look at the release notes
I linked in the private chat,
there was, they did a patch that was just not yet
in Node.js basically.
- So we can't say conclusively that it was that, right?
- Yeah, I think so.
Right, well, I mean, we tested it.
It's not happening on node 20.
And it was also not leaking on the version of node 18--
was it 18.17, where they patched undici.
So I think we can--
>>I did see two other leaks, though,
that were also relating to 18.16 and 25
that were posted on the node board and the issues board.
And looking at feature one large, it's leaking,
but the feature one medium seems like it's not.
So it's like, basically I put a bunch of commits
on different servers on that group,
and some are leaking and some are not,
like the older ones are not.
So it's somewhere in between there.
So we have kind of like a breadcrumb
of which commit is initiating it.
So I think we'll be able to resolve it.
- And is it also leaking on node 20?
- Let me check, I've got both versions.
- I checked in my git Tree and it's not like that.
For all instances, I think earlier,
my source is still in git Tree as M1V,
but that's still run node 18.
- Is that 18.17 or?
- 18.16.
- By the way, if that's related to native fetch,
we would also see it on the validator client.
So it does it leaks in validator old space and main thread large objects.
So both were climbing and I'm looking at beta.
I'm going to have to do a little more research, but like it's kind of hit or miss.
Some of them are leaking, some of them are not.
So I'm going to, but I don't want to analyze it now while we're on the call.
[silence]
So I have beta group and feature one have the same sets of SHAs deployed,
but one of them is on node 18, one of them is on node 20.
[silence]
Okay, is there anything else being tested on feat two and feat three right now?
[silence]
Yeah, feature two, I think, is where I put out BLST.
So I got that brought in.
And then I think Tuyen had something on feature three.
[INAUDIBLE]
OK, cool.
Yeah, let's see if we can get to, I guess, the bottom of this
and figure out what if it is that commit
and see if 18.17 will fix this.
But this kind of goes into the conversation of just how much
longer we want to support Node 18 as well,
which I think Nico made a comment about we
should keep supporting it until Node 20 is LTS,
If anybody has any objections to that, please speak up.
I tend to agree also. As long as it's LTS, I think we should.
I mean, it's the official node.
Yeah. I mean, it's still on the main page,
and people are still using it, so it makes sense.
But yeah, I think the swap over to Node 20 LTS
This is, I think, sometime in October, I believe.
But--
>>And I apologize.
It's feature one is where I have BLST,
and feature two is where we’re looking for the leaks.
That backwards.
>>OK.
Great, yeah.
Just as long as--
>>I've got kind of an opposing idea
that I think we should move to node 20
and ask people to use node 20 if possible, as soon as possible.
But I mean, supporting node 18 is fine,
but there's no reason why we should be using node 18,
especially if it's causing us more headache.
- Yeah, but I think that's pretty in line
with my view as well.
Just not like hard dropping or updating it like that
in the package JSON, we couldn't force it, right?
- Right.
But I think we did that pretty well with the last release.
I think, yeah, it's in the announcement release notes
anyways.
- Right, we asked people to use it.
And if they use Docker, which is the
recommended installation,
then they're gonna get Node 20 anyway.
- In our docs, is it's, what does it say?
Because I guess like that would probably be the last thing
we would need to update to really be like,
hey, we don't really want to support 18 anymore.
Or at least like, we don't recommend you to run it.
- Yeah, only reference right now is really
in the package, Jason.
Else, everything else is on 20.
Cool.
Well, I mean, I don't see any harm in putting 20 into that,
unless there's some objection to it.
But--
>>I think we should keep it as 18 until--
because that reference will force hard and force
not using node 18.
So if you try to install lodestar with node 18,
it will throw an error.
Which I think is not what we're wanting.
I think we want it.
Yeah.
Still be allowed.
Yeah, that is how I feel as well.
Like where like it doesn't error on 18,
but like we run everything on 20 as far as our fleet and our images are on 20,
but it just doesn't error trying to install on 18.
Would it be smart for us to run every--
I guess we wouldn't be testing any further things on 18 anyway,
but we also wouldn't know if anything broke, right?
I guess when we're doing release and stuff,
nothing runs on 18, really.
But either way, I think we have it pretty good, correct for now.
and we'll just keep an eye on when we should actually
change the package that is on them.
But it sounds like to me, everything is fine the way it is.
Okay, next up on my list, I have just,
I guess discussion points on the interoperability
with other clients, I think we just posted something today
for Nazar to look into, but we have had issues,
I guess, with interoperability with a bunch of other clients
and Nico made a point that right now,
a lot of our fallback stuff is on the VC side,
and which is not, I guess, fully compatible with some,
like you wouldn't be able to use like a Lodestar VC,
sorry, a lighthouse VC on like Lodestar as an example. Is this the strategy that we want to
stick to? Like how do we want to go about, I guess, increasing interoperability? Because I
think that that is an important thing for us to be able to do.
Um. Does anybody have any points? Um, for this. Otherwise, um, like we should try to maximize the priority or interoperability we need to make to make that happen, especially with with lighthouse and prism being the two most popular clients right now.
I mean, I think there are three most common setups.
So one is if people run a solo staking rig and a rocket pool, for example, then they
might use a different VC.
The other is DVT, where we have only the Lodestar VC right now that runs with a lighthouse
Beacon node, which there were no issues so far.
And then we have the, I guess, big operator issue where they want to use Lodestar as fallback,
I guess.
And I mean, the issues, at least I've noticed so far, really this missing attestations are
not as attestations aggregates.
So Lighthouse and Nimbus, and I think, I'm not sure Teku as well, maybe they can't
produce aggregates with Lodestar for some reason.
And then, yeah, I think the bigger issue is really with this fallback logic that we have
that on the validator side, because other VCs assume this is done by the beacon node.
So yeah, you basically miss the block if the MEV boost relay fails, because they only request
the blinded block and expect the beacon node to do the fallback behavior.
So yeah.
I guess based on what you're seeing, Nico,
is there a lot of people actually doing
this sort of setup which requires us to be compatible
with all these other clients?
'Cause I'm trying to gauge how important
something like this is in regards to fixing or figuring out.
(silence)
Like I think it's just the one guy on Discord right now
who's running this type of setup specifically
unless I'm missing a group of people here.
- I am also running it, but yeah.
But these are at least from my side the observations.
So I also have set up on Goerli now running Lodestar
with all four validator clients or with all four others.
- Right.
- But yeah, all but Prysm.
I think Prysm
, you cannot make it work.
So they have their own stuff going on Prysm.
So I'm not sure if they are even compatible
with any other clients
'cause they use a completely different API to communicate.
Yeah.
So it sounds to me like this is quite a difference
that would require some investigation
and even like quite a bit of work, it sounds like to me,
to make it work, especially with Prysm.
'Cause Prysm like 40% of the network right now,
or something like that.
(mouse clicking)
- I'm not sure if they have a flag or something
that you can use REST APIs,
but I think by default they use gRPC or something.
- Yeah, I think so.
Yeah, I mean, we should definitely try, in my opinion,
try to be as compatible
with the other clients as much as possible.
Some of it, I guess, will require more work than others,
but that's what I would like to see is that those issues
in regards to interoperability
are resolved in the near future.
But that's just my opinion on it.
Does anybody else want to add anything to that?
But I think like in terms of like the people
that were really trying to lure into using Loadstar
would be like larger node operators.
And a lot of them have expressed
that their setups are also multi-client.
And if we're gonna try to target those,
we should try to be as compatible as we can.
That's sort of my rationale for it.
- Yeah, I agree.
I mean, it helps the lodestar adoption case
for those people.
So we should definitely be supporting it.
And it seems like if we get tests,
then we can kind of ensure that we're not regressing.
So I think it makes sense.
Um. Okay Yeah, I think that, um,
issue specifically with the, um.
I think, um. Well, we'll get
bizarre to look into this. I
think if he's. On the call
there. Okay Oh, yeah, he's
I'd like to see looked into just a little bit more so that we can try to resolve
the issues with other clients if possible.
Okay, I have one question like reading on the comment in the discord.
This fallback mechanism which Nico is referring that on other implementation is on the beacon node
side and in our case it's on the validator side.
Does not it part of a spec or what?
>> No, I don't think so.
I think it's just a feature that all the clients have kind of copied from each other or independently
found to be useful.
It just seems that everyone else is doing it on the beacon node side.
And if we are the only one difference from others, is there any like particular rationale
behind when we implemented it?
I think the rationale would be that the more work we can hoist to the validator, the less
we're having to do on the beacon node side would be my interpretation of it.
But I also didn't implement it.
I think that was Gajinder. So you plan to ask him.
Okay, I will check with him.
But also, I think there will be spec enforcement. So the v3, I think for produce block enforces that
about I think it's not yet merged. But I also saw that Gajinder mentioned some points there.
I think maybe there are some rationales also for why we implemented that fallback in the
the validator client. Okay. Um, yeah, I guess we'll need to refer to get Gajinder to continue
that. Um, I don't have any
further to do with that. But is
there any other additional
points? Or questions in
regards to this?


If not there,
um. I just want to get a
update on anybody who's been
working with the network worker
being done in relation to that.
But I don't have the latest update
on what the status of the network worker thread
is, if anybody has anything to throw in here.
Was adjusting the memory, both with and without,
by bumping the new space.
And it actually does drop event loop,
whether a network is on the main thread or on the worker thread.
So that is definitely a good solution.
But it came up when we started to see the leak.
So now that we've kind of narrowed the leak down,
I'm going to deploy with the new space update on 20
and see how it runs with that now.
- All right, great.
- And maybe also one thing,
so I also posted something in our issue
that we have with the network worker,
what we might consider because I saw some benchmarks
that look pretty bad for worker threads.
And it might be just the case
that we are just using them wrong here.
And it might be better to use a child process instead,
Because from what I've read and also what I see
from node maintainers, when they mentioned the drawbacks
and advantages of worker threads,
it's mostly that they should run short-lived tasks
basically that are blocking the main thread
and are CPU intensive.
And this is not really the case for the network.
We have a lot of IO and it's a long-lived process.
And I think what it comes down to, as far as I understood it,
is really just the OS allocating resources.
So there's the main difference
between a thread and a worker.
But yeah, not sure.
Also I noticed that Ben mentioned NodeCluster a lot of times
in the emails that he wrote.
And yeah, that uses a child process as well.
So maybe we could give that a try.
I at least wanted to give that a try and get some metrics.
Yeah, but this doesn't satisfy me.
Like, we're just playing crazy guessing games here.
I want to understand why a worker thread would work better or worse than a fork at a fundamental level.
Do we have the answers?
Yeah, that's a difficult question, actually.
I researched a lot there.
And I think it really comes down to the OS allocating resources.
Because it's another process, basically.
and the thread has shared memory with the main process and so on,
which I think is also not great in our case.
And why does it have an impact?
Yeah, you really need to, I'm probably the wrong person to ask here.
So they should be on separate threads though,
like they should be able to schedule a separate course.
If it's like if you're using child threads and workers, it should be on it should schedule
them separately, as opposed to trying to interleave them because they're, it's a separate PID,
like it's still under the same PID. So it will tend to like put them on different on different
cores. Worker threads does not have a PID like independent PID.
Exactly, correctly. But when you do a child process, it's a separate PID with all separate
spaces. And it also would load all the dynamic libraries and shared libraries as well. So it
may have additional memory overhead, but I don't know that for sure.
And I think there's slightly more cost in IPC, because it would use a Unix socket instead of,
I don't know, doing it over memory directly.
But I think I also saw that in one of Ben's emails
that it's basically on Linux.
It's not noticeable at least.
And I think we are not using any shared memory, right?
So at least from reviewing the code,
I didn't see that we are doing that.
No, we're not.
Because the WorkerThread library is serialized everything
to string and then just passing a string.
And no, but I think for the key stores, we use transferable objects, which are shared array
buffers. So.
Although it seems like since Tuyen brought the extra set timeout into place in the network
thread, it seems like it's actually brought loop time way down to where it would be expected to be
anyway. So I wonder if it's a moot point because it seems like the node, well, again, this is where
I'd cede to you guys. But it seems like it's doing what it needs to do at the moment, that
we don't have the crazy loop times like we used to have.
Okay, I just want to well, I want to know like where the status of that was and I guess sort of our next steps
um
Are in trying to
Improve it. Um does
Like how close are we to perhaps?
I guess understanding whether or not this is something that will be ready within i'd say within the next two minor releases
Or is it pretty hard to say still at this point?
I mean, it seems like it performs about the same now, whether it's on a network like on a worker thread or it's on main thread, as far as just loop times and memory usage and whatnot.
But, I mean, it seems like it to me.
I mean.
But I would leave that.
Yeah, that was that was kind of what I saw too,
is I didn't see any improvement and I didn't really
see any degradation, but.
I feel like we still need to do more testing.
Like I was saying.
I didn't realize that we had it.
We were testing on node 18.
So it'd be good to bump our fleet back up to up to 20
and test again.
And we should actually, now that you say that,
set the variable in Ansible to 20
so that whenever we deploy, it automatically goes in 20 now.
OK, great.
Yeah, we'll keep going down the path of more investigation
testing.
I think we have a bit of a roadmap here to see the benefits.
And if we do want to try something else,
we can further discuss some other options.
Does anybody have anything to add to the network thread
discussion?
Is there like a documented acceptance criteria or something
like we want to target?
Say, I think I'm hearing like, is the performance, there's no
performance increase, but there is no degradation. If that's
what we expected from this refactoring from start, and I
think it's okay. But if that is not what we were expecting, then
and maybe we take our time to dig further.
- Well, it does bring down,
like 'cause when you put everything,
network and main thread on one thread,
the loop times are substantially higher.
When you break it up, it looks like it, I think,
like, I mean, it's like 30 milliseconds versus 15 on two,
two loops.
So it does seem like it brings the overall loop time down,
but it doesn't really change like API response times
or because of the latency between the two,
I think is what's causing the,
like the each loop is performing better,
but the communication between the two
is making the like the API response about net zero.
- And when you refer like a loop time,
does it mean like event loop time?
- Yeah, exactly.
- But we wanted to, we introduced this worker thread
milestone into Lodestar
because we wanted to improve the performance.
So if it's not like reaching to that level,
then maybe we should like step back and think again
how we implement it.
and maybe there's some better way that we overlooked
because of a worker thread concept.
It wasn't like, looks promising at the time.
- It does.
- Right, and I think,
I feel like the landscape changed
when we moved to deterministic long-lived subnets
where we're only subscribing to two subnets
instead of possibly up to all 64.
And now it's just like, because we're doing so much less work,
whatever gain we thought we had or we thought
we were going to get is less important or not
as noticeable or less measurable because there's just
so much less work being done in general.
Yeah, we must test every single worker
with Subscribe all subnets.
Otherwise, it's not useful data.
Because as you say, otherwise, it's so idle
that it probably doesn't matter.
Yeah.
Well, that was what we were seeing.
We were testing it without Subscribe all subnets.
And it was like, well, it's kind of the same.
Yeah.
Like the feeling I have is if we don't test with subscribe all subnets and then we say, oh, the worker looks good and then we ship it and people use it with subnets and then lots of nice, that's not okay. That's not okay.
Yeah, we need to compare with the stable mainnet node.
Okay, sounds like we just need to gather some more data here then,
before we can make any sort of decisions on what sort of benefits we get from it.
But while I would love to have a more fundamental understanding of the differences between worker threads and
a fork process, I think it's worth it to do a test. So if someone wants to do it, please go for it.
Yeah, I mean, I definitely want to take a look at that. I also asked in, there's this other
worker pool implementation that's maintained by mostly Node Core devs. And there I also asked,
let's see if we get a good response there.
Okay, anything else to add to the network thread point?
Okay, sorry, was someone going to say something?
I was going to say that I implemented the similar structure in another project,
And I opted for detached child processes
to maximize the hardware performance.
Because if it is a child process or a worker thread,
in any way, there is a lot of dependency
between a main thread and a worker thread or charge process
created by the Node.js environment.
So if we clearly want to maximize,
utilize the performance of all available cores, we need to have a detached child process that
can utilize a full core. And when I implemented, I used a third-party serialization library,
which was performing around 300 MB per second for real-time serialization. So it did not have
any impact between transmission data between two child processes which does not have IPC
connection because there is one if you spin up a normal child process there is an IPC connection
created with the node environment which has an overhead on top of the child process.
So if we really really want to achieve full-scale performance of available cores then
Maybe we start looking to this pattern.
Okay, I will share some implementation details later on in the discord if someone wants to look
at. Yeah, yeah, that would be interesting to look at. Thanks for sharing, sir.
Okay, any other points for that?


Okay, let's do a quick round of updates. Let's start today with NC. How are you doing?
All right, hey guys. Um, okay, so for the ePBS side, so we had like a like a like a first
meeting with the Prysm folks last week. And it seems like you know, things still going pretty
slow. But at least like we started like splitting up work and we we have a weekly meeting setup.
So that's pretty cool. Um, and now on the other side, you know, to keep myself productive. So I
I started looking into the 6110 implementation on Lodestar.
Seems like there is a prerequisite on the pubkey cache.
So I need to do the refactoring and also, you know,
have like two new sets of pubkey cache
attached to the beacon state.
Still looking into it, not much update else.
So, right, that's all from me.
Are you and Lion also working on some other PBS type
implementation as well, or is that also important?
- Right, so right now we're only looking
into the PTC design.
There are obviously other designs out there,
but I think we're just focusing on the PTC.
- Got it, okay, thank you.


All right, let's move forward with Lion.
Hey, so last week we spent a bunch of time on Whisk, debating different optimizations
and doing more security analysis.
There is one that's very promising that could reduce state size from doubling to 33% more.
So I took it as far as I could, and now it's on cryptography team's hands.
see what comes out but it's exciting. And this week, yeah, I spent a bunch of time thinking
about the devnet, the big boy issues that we already discussed. So all good.
Thank you.

All right. Let's go ahead with Nazar.
Thank you. Based on my last week update, I created a EL provider proxy which shows 100
ETH to every account you try to connect, no matter how much balance it has, to test that
if our prover is working fine or not. And during my testing, I found out that prover
was not working. It was not verifying the balance and it was a surprise because all of our tests
were working fine. Yeah, I spent a lot of time, like a day on it figuring out what could be the reason,
but it turns out that it was Web3.js version 4.x which implemented RPC in a different way than
I was expecting. So I opened a PR to make our provider compatible with the Web3.js
4x version. And yeah, when I was doing it, there was a very weird TypeScript issues,
which is causing this PR to be delayed. But hopefully, it will be completed today. And then
I will open this, like finish this PR.
And I will add documentation section inside the readme to test how to test this like unverified
provider with our Lodestar prover.
And yes, then I will be working on the issue of hiding simulation test for different beacon
node and validator client configuration.
Yeah, that's all from me.
Thank you, Nazar.


Okay, next up we got Gajinder.
How are you doing?
Hey, Phil.
Hey, everyone.
So I've worked a little bit on Verkle.
basically I was successfully able to read local genesis after basically doing the changes
which basically differed a little bit in the types from what were implemented.
And I also attempted a sync but I was not able to decode the blocks that were served by
lighthouse. So then I extracted some beacon block JSONs from the lighthouse and tried to load
on our types. So there were a few issues that were discovered and I have sort of raised them.
And hopefully I will try to make local changes so that I can currently sync on Constatine network.
and I have raised the issues so that they can be addressed before the network is relaunched
so that when the network is relaunched we can load and sync it in a proper way. I will still try to
sync the current network in the current format so that you know on the next relaunch we are sure
that we can participate in the network. Apart from that, did finalized PR regarding fee recipient
and I did some mock test to see whether the fee recipient that is being passed
actually reaches the notifyFCU calls to the execution engine. So that should basically
give us quite a good amount of confidence in terms of our expectations with regard to fee recipient
and finalized, incorporated some of the changes that we reviewed on the Free the Blobs PR
and also tested it with Ethereum.js for DevNet 8.
So, currently seems like it works well with Ethereum.js.
No other EL DevNet 8 branches or images are out yet.
So when they will come, I'll test against them as well.
And did some reviews and some small PRs on, for example, execution,
engine straight tracking, and also helped EF dev guys run Lodestar as a boot node.
So they were having some issues with ENR and which basically we have seen these kind of issues
before and had added NAT flag for them. So I basically helped them use loadstar as boot nodes
and to basically then other nodes could sync from Lodestar.
Yeah, so I am currently planning to work on making sure that the race that we run between
builder and execution, we want to move it to beacon so that we can be compatible with
other beacon nodes and validators in terms of how they run the block production that should resolve
some of our interop issues that Nico has also seen. And I will continue working on syncing the
the verkle testnet.
Thank you, Gajinder.


All right, let's move forward with Nico.
Hey, so yeah, as I mentioned before, I looked a bit into the worker threads versus Child process
topic.
I'm still not satisfied there with my understanding, and it's really hard to find good information
on the performance.
but hopefully, yeah, getting some benchmarks there.
And maybe some responses on GitHub will help there.
Besides that, after the update last week,
I did to the boot nodes, I was reviewing that a bit.
So there was this one issue where a user,
I think it was Mika, I'm not sure who it was actually,
but yeah, there was an issue if you said,
connect to boot nodes and there was this passing issue
and then yeah, I basically just reviewed the code
to understand how that works.
Besides that, yeah, and I opened the issue
where I discussed the strategy,
how we should maintain boot nodes
better in the future maybe.
Besides that, there was some thing with DappNode
where some user asked if there's a possibility
to maybe disable doppelganger protection.
And because that's enabled by default,
so I'm looking into that.
Maybe we can provide a better option
how they can more easily disable it.
But yeah, I think there's still improvement
we can do on that end,
how we improve the implementation on Lodestar.
I think right now it's for all clients, they just wait two epochs or three before they start the testing.
But I think if we know that the instance did the attestation in the previous epoch, we can just start right away.
So I documented that in the issue.
And there's even security improvements in my opinion, if we do that.
Which I also wrote down there.
And yeah, so the plan for this week is I guess,
testing this child process stuff
and also updating the boot nodes.
And then I also want to further look into
the whole state cache and region topic
and review some more code there.
- Cool, thanks Nico.

All right, let's go with Matt.
- Hi, had a little bug in Ansible doing some deploy.
So I put up a PR to update one of the dependencies,
went switching between node,
put up a couple small PRs for just dashboard updates
and got those cleaned up and they got merged.
Investigated the memory leak issue a little bit
and then also was trying to dig more on the line zero,
but I ended up just adding that to the email
that I sent to Ben.
I did finally get that sent over to him
and asked the four questions that were pressing,
and then got BLST updated as far as in the BLST repo,
and then got the updated code into Loadstar,
and then updated the critical pieces in Loadstar.
So I basically took out the old BLST repo
in the state transition and CLI and all the other server side stuff. I only left the Herumi version
in the light client and in the prover because those are going to run client side and then
that was branching off of an older version about two weeks ago is where that work started.
So I got that standing and working and collecting metrics on it just to see how it is relative to
when all the subnets were subscribed and looks like it's brought down CPU usage by 40%
which is nice by keeping it using the libuv worker pool versus the separate workers. So
I also got all of the work that Tuyen did as far as the because there was a ton of work in the BLS
updating the gossip for re-verifying multiple of the same message,
really changed a lot of that code enough that it wasn't really possible to do a merge commit with
that. So I basically just had to separate it into two files and I'm just kind of like
manually putting the pieces together. So that's almost done and I'll be able to deploy that as
well in order to see how it looks with the rest of the metrics so we can see it. Before we did
that so it's kind of like with the metrics that we kind of are used to seeing and then with the new
metrics that we're now seeing now so we kind of like get a more holistic look at how it's actually
doing but it seems like it's working okay because it's like event the the loop is only 15 milliseconds
uh on the old version before we did any updates uh at all from two weeks ago so that's great
And then I also did some investigations with the new space and the semi-space,
both were promising. And that's actually what kind of like when I was messing with that is what highlighted the memory leak.
So I'm going to go back to putting that on now that we can kind of like get the memory leak out of there,
we'll be able to see exactly what the heck that that's actually doing.
And then my goal for this week is I want to...
I'm going to finish that merge, and hopefully I'll get that done today,
and we can see what that looks like.
And then Lion gave me some updates on the deduplicate payloads,
and I'm going to go back to working on that one.
I think the issue that I was having when I was having a bunch of errors
was it was random. I was basically regenerating
like a whole bunch of blocks and using a whole bunch of randomness.
And I think that was just pooping my CPU,
just trying to regenerate a whole bunch of random data.
In order, so I'm going to save those to like just a fixture.
So it's not actually generating the randomness every time
and then see how that does.
So I've got a couple of strategies there.
And then I'm also going to try to get the,
So the memory thing and the blast thing.
And I'll start on the deduplicate payloads,
and then I'll respond to Ben, because there's
going to be a couple of things that are probably
going to come out of there.
Thanks, Matt.


All right, let's move on to Tuyen,
and then we'll finish off with Cayman.
Tuyen, you still there?
Okay.
So, last week, I work on a issue that Lodestar has more than Max peer. I found that we have a configuration to close the server when we have max connection but we just count the inbound connection only.
So I end up with opening issue in just libp2p.
So Cayman, please have a look at that to see
if we should work on that with libp2p team or someone else.
Next, Nethermind has an issue of not able to sync
from EF checkpoint sync URL.
It is actually seven days out of that.
And so the change is just to print out an error
with more detail.
Not sure why other guys don't have the issue,
but Nethermind does, we'll ask them.
Next, I work on some gossipsub.
One is to update protobufs to protons.
The benchmark is good, but when I try sort the memory leak,
I thought it's the issue of protons,
but then I chat with Matt and he found it's our issue.
So we'll get back to that and may upgrade the proton
work version from V2 to V3 there
so that we can use the latest version of protons.
Also, when we update Lodestar,
there were some broken promise,
broken metric from Gossip Sub
we see some reject or ignore messages without the topic.
So, created a PR in GossipSub in order to see that.
Other than that, investigate memory still a little bit
and found that version 20 is good.
Next, I will try to wrap up my work on the index Gossip queue
just with Lion and he said,
As long as the performance is better,
maybe we can go with that queue for now,
and we can improve that later if needed.
So we will ask Lion to update this PR.
Also, I will look into Lighthouse
on how they can maintain having zero historical state.
Maybe we can apply that or not.
We'll look, study the code a little bit.
That's it for me.
Oh, also, there was an issue with the libp2p where we have some script that may attack
our node.
I have a branch for that.
And we've tested.
Just to update some flags at the work that was already done in TCP side.
Thank you, Tuyen.

Okay, and Cayman.
Hey, so last week I was working with Alex a little bit,
Alex at the libp2p team on this varint library
that we were--
like what the strategy should be for using varint
and across the different libraries
and how to unify on a single implementation.
I put out a PR that I'm hoping he'll review soon.
But we got roughly 10x improvement in decoding speed
and roughly 5x improvement on encoding.
So like that.
But it is only a small part of our total CPU time.
I think it's like somewhere around 3% to 5%.
It's still-- it'll be nice to get that down way lower.
Other than that, did some investigation
with Nazar on the type of JavaScript
that we're outputting through TypeScript.
And we kind of determined that we can bump up to ES 2021.
We were outputting ES 2019.
And that made things like nullish coalescing really ugly
in the output JavaScript, which I
don't know that it has any performance implications.
But it would be nice to just use the latest JavaScript
if we can.
So put out a PR for that.
And then one other thing I noticed
was that we still have our max mesh peer count at 9
when the spec suggested it should be up 12.
And we previously lowered it from 12 to 9
because we were having issues with performance.
And so I figure this is a decent time
to reinvestigate whether or not we can bump that back up.
But I think there needs to be testing done on that
before we merge it.
But I opened the PR just to get the conversation started.
And other than that, this week I'm
going to be bumping up to the latest version of LibP2P.
Again, fingers crossed, this should
be a relatively straightforward update.
But we need to be the latest version
to be able to get all the fixes and changes we're getting
in the Gossipsub library.
And also, if we are wanting to test the YAMUX, which
I would like to start testing again,
we also need the latest version.
So we'll be doing that.
That's it for me.
Thanks Cayman.
OK, any last minute points?
All right. Thanks, everyone.
We'll see you on Discord.
Take care, y'all.
Thanks.
Thank you.
Bye.