David Irvine - SAFE Network, Technical Overview of Sentinel

-========================================================================================================-
| David Irvine - SAFE Network, Technical Overview of Sentinel - Full Transcription (Clean) [beta 0.9.51] |
-========================================================================================================-

Alright... Sentinel!

So the purpose of the Sentinel is to take groups of messages, and confirm them. And we'll see what that means in a minute. We'll pass a single message out. So really it's a sort of-- it's doing a couple of jobs here. This confirmation is a couple of jobs.

So one task it's doing is: ensure minimum number of copies of a message are received. Now, we'll go into what that means in a minute. So ensure the minimum number of copies are received.

The second thing it needs to do is: ensure that each message is valid. So for us that means a couple of things. So, a valid message means it's signed by the person who claims to have sent it. And secondly, we need to check: is the sender valid with respect to his location in the network?

So, there's three probable bits to this. And we'll just see what each of them are individually. So the first bit for the Sentinel is the simple part.

So, we have a message, and from Node A, and that message claims to be [??] type, or Node type. Basically, there's one copy of that message. So this is an individual sending a messsage. So the Sentinel will take that message, and it has to confirm that this guy here is the sender of that message. It's a single message from a single guy. So he's not trying to get any group consensus here. There's no notion of group consensus for that message. So that message, if you like, has already been accumulated.

Now, what Sentinel must do is then check the key---check (if) the actual cryptographic key that was used to sign this is actually from that guy. So I'll send a message to the network, saying, "the group responsible for A, the Group near A, nearest A, send back the key for A." And we get the key coming back here from that group, and check (if) the message is validly signed. And then a message pops out. So we pop that message out. And that's held in an accumulator. So that's almost version 1, really: single message, from a single source. That we just go and check the key, and make sure that message is fine.

A lot of messages that we get on the network, though, come from groups. So when you've got a group sending a message-- this is where we are using sort of group consensus mechanism. So you've got some nodes sending a message, saying 1 2 3 4. And then, the Sentinel here... works. Internally, we have two accumulators. Now, the accumulator is a separate library, so we need to leave that out. But basically what the accumulator does-- you keep adding messages. So you would say that you would want to accumulate this for 4. So when you add the first message, it won't give you anything back. The second message... --When you add the fourth message, this will return, saying, "You have accumulated four messages." "What do you want to do now?" "Here's the data."

So one accumulator is to accumulate the messages: to make sure that we've got the right amount of messages that we require. So at this point we know that four nodes sent us this message. But we don't know-- there's two things that we don't know. 1) is it really them? (Is it really node 1 2 3 4 that sent that message?) And 2) should it be them? So if we can confirm that it was 1 2 3 and 4, can we also confirm that they are the nodes on the network responsible for this message that we got? So the message, we have to figure out if it was those guys that should have sent it. So, they will claim to be representing some group. And here you would hope the group would be something like 2.5, if that was the Claim.

So, what we do is, in the Sentinel, as soon as you get a group message, and you add it, you're going to add it to an accumulator for the messages. The first message that we add to the accumulator: so we check the accumulator, "Have you ever seen this key before? Like do we have any messages already from this source?" And if so that's fine. Then we keep adding until we've got the correct amount of messages. But if we haven't added to the accumulator before, it's not seen this key, that kicks off a Get Public Keys. So we want to get the Public Keys. And we call that "Get Group Key" I think is the name of the message. And this here answers both of the questions.

So what the network does at this point (I'll just scrub this to show you what the network does)---so over here, we are the Sentinel. And what we're seeing is: a message has come in with a target, or a group Claim of 2.5. So we send to the network. Now this is Routing's job that's outside Sentinel. But Sentinel is saying, "You need to do this for me." So Routing does, and this is implemented through a Trait.

So it's saying, "Get the group closest to 2.5." And this goes through the XOR network, and we'll get the nodes that the network believes is closest to 2.5---which may or may not be these guys, if they're hackers, or something. Or one [??] went offline in the meantime. And this is why we've got the difference between group size and quorum. As far as the Sentinel's concerned, there's just quorum. But Routing does the rest of it. So we're confirming that the network believes these are the right guys.

And we're also confirming that these guys give back the keys. So each of them, 1 2 3 and 4 will give back the keys for 1 2 3 and 4. And because of the way that the network works: that could happen. You could get 5 here. Or you could get a Z, Q, 1, 2 here. There will be slight differences in the keys you get back, just because of the, he, this node, 4 here, might be the [edge?] of that group, and every node sees a different group anyway, a slightly different group. So, when we get this back here, this matrix, we have to try and figure out... We know that we've got node 4, we know we need his key. So there's a copies key here, here, and here. So we're going to get maybe three copies of that key here. Nodes that three's key we might find that that---well that's three as well; we'll just put 3 in here---we might find node 3's keys slightly better, it may have filled four copies, because node 3 is quite close to the target.

And the closer to the target you are in these kind of networks---and again this is Routing's job---the more accurate that this thing's going to be. So the more that you show up, probably (as far as Routing is concerned but not Sentinel), the chances are you are closer to the actual target, than the ones who show up least. Trying to get a quorum in here is going to be problematic, I think. But if we are getting at least a key back, at least one copy of it, I'm pretty sure we're SAFE.

So this here does two jobs. It tells us that those are the right guys, and we've got their key. We have been able to get a key from this group as well. And I think that's what Ben has called "[flatting?]" keys, or something. So, once we get those keys back, we're back to the two accumulators.

So we've got an accumulator for messages, and we've got an accumulator for keys. So the message one will always be first. And that will kick off the Get Keys. But we might get all the keys back first, before we get just the number of messages. That could happen. It's just because of the instance of the network. So, the message accumulator will kick off the Get Keys, but one of these: as we are adding a message we will find out if the accumulator is true. So it's got the number that we've asked for. So if we've asked for four messages, when we're adding the fourth, this will come back true. Now that could be replaced with a [channel?], which I think is probably a better way than having an optional [return, over here?]. And also the same here, though, when we're adding keys, at some stage it'll return true.

So what we're saying is, we're basically doing a while thing here. While( Message + Keys). So while we don't have both of these true, we'll keep on adding stuff. As soon as one goes true, we remember, "Well, we've got enough messages." But even if we're adding more messages, it's fine. It doesn't matter. It just makes it a bit more accurate. But when both of them go true, we know that we've done our job.

Now, these messages, in this type of accumulator, coming from different places, are all equal. So basically there's no merging that happens when we're doing this normal group message. These messages are always equal. So, we don't have any machinery or logic in this thing to, say---this particular message type, you can't just straight accumulate it, and imagine that that's the message. You've got to merge it or something. There's no merge capability in this type of message. So, accumulator 1, you've only got one guy coming. We check the keys.

Sentinel, the second real part of Sentinel-- and I'm not saying there are three Sentinels here, or anything. I'm just saying there are three distinct patterns.

So the first one, single message: check the keys. Second one, a group of messages: get the group; get the keys; and confirm the group was correct. So this is Sentinel number two pattern.

The third pattern... and this one I think is a little bit debatable, and it may tie the Sentinel to Maidsafe maybe more than it should. I'm not sure. In terms of tying it to Maidsafe, we did have some issues with keys---cryptographic keys---and IDs. And I think that's OK. It's OK for us to say we are using this thing. This thing uses the sodium, or NaCL, the sodium library, for its cryptography.

The same way that sodium is saying to users, or salt is saying to users, "Cryptography is really hard. Don't try and do it yourself. We'll do it for you." I think it's valid for us to say, "This Sentinel thing is going to require cryptography. Cryptography is really hard. Let salt do it for you." If we give the option here to use any cryptography library you want, people could use a less secure one. And it's not a hard thing to use salt at all. So it's probably valid.

So, we'll just Sentinel 1) single source. Sentinel 2) group source. And Sentinel 3) let's call it "Refresh"---which if we remember back to DHT networks, they've all got this refresh thing. We have it as well, even though we don't call it refresh. We call it account transfer. But in the DHT world, it's just a refresh. And the refresh is responsible in a DHT type [the video feed cuts out here].

Network State. So the state of these kind of networks is a particular thing. And it's decentralized. So we know from Routing, no two nodes hold the same state. But the closer the nodes are together, the more state they have in common. The farther apart they are, the less state they have in common---to the point where there's 0 state in common. But just like every human sees its own rainbow, every node sees its own state. And the closer neighbors will share more state than the farther away neighbors.

So the state of the network-- the state of a DHT is a very particular thing. And it's very important that state is as minimal as possible, obviously. You're going to have issues if you have a very complex state.

But the third Sentinel design, the third part of it is really Refresh (in DHT terms). And you can consider it as the thing that is responsible for network state. And because it's network state here, the variables will not be all equal.

So, node A holds X = 12, node B could hold X = 13, node C: X = 10. And that's valid that that could happen. And that happens in all DHTs. What it means is, this guy here has probably received more messages, that puts this the state of this up to 13. But it could be the converse. It could be that this guy has received more delete messages, to have brought it down to 10. But they are in a group, and those messages inflate. And some of these guys will pick it up, and process it faster. So we can't assume that if this is an increasing number, that C hasn't got those other 3 messages that this guy's got. He's probably-- he could have received them. He's just not processed them yet. So it could be happening inside the computer. It could be in the wire.

Those three pieces of state are likely to be somewhere. They are also likely... to vanish, or just two pieces of state. That even (up) when the network settles, that you've got something like this: 12, 12, 13. And that's valid as well. That's a valid state of the network to be in, in terms of that particular account information.

So the third Sentinel thing-- and it's good to say this Sentinel is like most of the stuff we do. It's for P2P network. And it's a DHT type thing. And we can keep that in our mind that, at the moment we don't need to make Sentinel so generic that it works for everyone all the time. It can be generic enough just to work for us, just now, in this iteration. And the API might change and develop as we go along.

But the refresh stuff is interesting, because that is going to have to be merged. Because, A B C, they're all going to have potentially different values. They should be roughly close, but not exactly the same. And the refresh in the DHT network, we know what that means is: the network changed. One node went off. Another one had joined a group. Some things happened that the network has changed---which means we need to do a state update. And that's all we're doing here is a state update. So from the Sentinel's perspective, that's got to see state update information. And that's going to be quite an interesting thing. From Routing's perspective if you like, that state update information is all from your group. And it's worth noting, that we won't get state update information that isn't from our group. If we do, something's broke.

So the way that Routing is going to use this refresh, it's going to say it's going to have to come from our group. So what we haven't [???] very well in Routing, just now. So, as one of these refresh types comes in, and refresh isn't put, get, or post. It's a specific call that would happen, in Routing. And that there should be in the Routing interface. So from the Sentinel's perspective, it's going to get the data. And it's going to do some very, very similar work to the group Sentinel here. So it's going to say, 1 2 3 4, save me this piece of information. And it's going to do the accumulator stuff, the two accumulators, exactly the same as this group thing. And during that, it's going to say, "Get Group Keys." Now, if we just jump out of Sentinel for a second, the Routing node should be able to spot this particular message here: means, "From my Routing table, don't send a network thing for this. Who do we think that this group is? This is a specific type of message. We don't necessarily need to send a network thing here. If these guys aren't in my Routing table, something's going badly wrong." So slight differences from version 2 there, with that group key thing. Then when it comes to the accumulated data, these accumulators here can't really handle that. That's going to give us back (not one copy, which the accumulators always do in this anyway, but) the four copies. And we'll notice that they're probably not all equal. They're likely not to be equal. But we have a bit of a stipulation that those types need, say ordered or something, or [partially?]-- whatever trait you want to put on those types, that allows us to say, "We will merge this type of data, this refresh data. I.e. we'll always merge it."

Now, how we do the merging is, again, it's not really implementation, because it would affect the API, what we demand of this type of data. But for now for our immediate purposes, merge = get the median value for that particular key. So that can be handled easily, by requiring a trait on that type of data, that allows us to calculate the median value. And we'll just [see what we got?], because there's lots of ways you could do that. So this particular type of data, for refresh, what happens in the refresh call, we are going to have to say that the traits for data in these calls may be more extensive than traits for just the normal group call.

So, the refresh is a third design pattern, a third design consideration in Sentinel. And the difference from 2 to 3 is "get keys"---that's a net call. And here, that would be no network, from Routing's perspective (Sentinel doesn't care). Why I'm putting this here in terms of the Sentinel is: when Sentinel calls get keys, it looks like it needs to call two different types of get keys. Otherwise having the user of Sentinel know what to do here. The data has got here, for 2---it's got to be equal or partial equal. I.e. we're just using the quality. And here, we've got to have ordered or some mergable capability. Those are the kind of prime differences between that type, and that type. And those differences are enough that the API probably has to reflect different call types. So it's: add a single source, add a group source, and add refresh data---are probably different calls into the Sentinel. And the Sentinel "Get Keys" call, there's likely to be three "Get Keys" called here as well. These are all just one key we're chasing. We could probably use the "Get Group Keys" from the single guy anyway, and just pick his key out. That's likely to be OK. But here we're specifically saying we specifically want a group key here. And here, we're saying, "We want a key. But we don't care how you get this key. But we are going to give you a hint, because we are going to call something here "Get Keys" that the user of Sentinel can take a hint." For us in Routing, we would take the hint that we would go to our Routing table, for this refresh thing, to get the keys here for the refresh.

So, that's Sentinel. Sentinel's probably about the most important back code, as far as security is concerned, for us and the network. So we really need to be able to think deeply about this. But also, Sentinel is quite an important one to think iteratively. Like, what could we do just now, to get our thing going. And then how can we ensure that we've got the correct API for Sentinel, as time moves on. I strongly suspect we'll put an API in place, and change it quite dramatically as we try to find other uses for Sentinel. And other networks use Sentinel as well. I think that that will change. So from our perspective it's good, because we are seeing as a user of Sentinel where we need to be told different things. And I'm not sure how far we can map these into single calls, and map the get keys for instance into a single get key call. I'm not sure it's wise to even try. But I'm not trying to... stamp a way of implementing this. But these, for Sentinel what we've gone over here is the requirements of Sentinel. There's three particular design considerations. And each one of these design considerations has got a slightly different impact on the user, to the user of Sentinel, to the point where they probably need to know---that: add these things differently. And the calls that Sentinel's going to make---like "Get Key" calls or whatever---are also going to be able to be handled differently by the user. It could be a single "Get Key" call, and it's you know, whatever. I think being as explicit as possible just now is going to help us, and become more generic in an iterative way. Because there's some really nice patterns in here, and it's got some very nice security elements that other people don't seem to have ever used. The group consensus thing is pretty enormous.

So, that's Sentinel, basically!