the perfect transcription for CRUST

-=====================================================================================================-
| David Irvine - SAFE Network, Technical Overview of CRUST - Full Transcription (Clean) [beta 0.9.81] |
-=====================================================================================================-

So CRUST, is... a really good library actually. And it does a lot for us.

In the last sort of thing we were talking about---Routing and Routing's relationship with CRUST---it was kinda making it clear to people:

CRUST is an IP layer thing. It knows nothing about anything above it. And it knows nothing about Routing IDs. It knows nothing about what we are trying to do with it, or anything like that. So CRUST is an IP-based solution.

And, the idea of the CRUST is: it's for P2P networks. And that means, if there's a P2P network somewhere, and all of them have got their own addressing and Routing mechanism (all P2P networks have)... so, CRUST sits underneath it---and gives it the ability to communicate. So, the nice thing about CRUST, where we want to see CRUST, is basically that we've got some kind of network, here. And CRUST is sitting here, and just connects to that network. So it says, "I'm connected to the network."

And importantly, it's just connected to anybody. It doesn't care who it's connected to. It just says, "I'm connected to the network." "I'm probably connected to people that you don't really want to be connected to." "And you're going to ask me to drop these connections, and create new ones." "But my job has been to connect you back to the network."

So at the very bottom layer, at the IP layer, CRUST has said, "OK, you're connected." "Now what do you want to do?"

And for us to be able to use CRUST, we need to know some kind of way to start it---and for it to tell us, "Here's their connections." So we just need to know, what connections have we got? You got the like, some TCP thing, and maybe some UTP thing, and maybe some other stuff---maybe some [??], and whatnot. So we just need to know what we're connected to in the network.

And, for us to have a connect, where we want to connect to some person---with a Proto and an Address. And, then we want to know if we've lost a connection. And we want to know if we've got a connection. We're really not wanting much more from CRUST, than that.

And what we don't want to do, above the CRUST layer, the things that we don't want---is we don't want to know anything about Bootstrap information. We don't want to know anything about NAT-Traversal or otherwise. We don't want to know any of that sort of stuff. So, CRUST is basically saying, "Don't you worry about all these things in red." "I've got them covered." "I'm going to do all that for you." So, all that you need to know is, "I can get back into the network."

(It can tell you that you're connected to a couple of guys. It's up to you to chat to them and do some other stuff. And you can connect to other people-- obviously send messages, drop connections, and things like that, as well.)

But, CRUST just basically is saying to us, "We'll take the headache of all of that stuff---all the Bootstrapping stuff, all the connecting to where you are---off your hands." And that's, really, what we want CRUST to do. And how it does that is... we'll just over to [??] [??] actually, hahaha.

How it does that, is it's obviously going to have to keep a Bootstrap file. We've got (it), so that we can connect remotely. So we can connect to some guys remote. We've got... a Beacon, so we can connect on LAN---connect LAN or local PC. So, these are the ways that we can want to be able to connect back to the network.

And we have to consider, that there might be no Beacon. There might be not in the PC. They might be not in the LAN. So the Beacon might fail. And we might need-- we must Bootstrap then. We must try and Bootstrap. Or we are node 1.

So, just looking at CRUST from its very foundation, we are saying to CRUST, "If there is a network, you must connect to it. You must connect to that network." So when CRUST first starts, this is going to be empty. There will be no Bootstrap file, and nothing for it to connect to.

So, the very first network on a P2P network from CRUST's perspective---let's call it Crust 1. It's going to have no Bootstrap file. It's one of the nodes that we've started that we have to connect to. And we might have several of those. And maybe these are, what we have is hardcoded nodes.

So these are hardcoded nodes. So what that means is these nodes, for users of the CRUST library, you should pass at least these nodes through to the Bootstrap file, of CRUST. So if you've got some hardcoded nodes, these are also nodes. This means we have to be able to pass to CRUST: Fixed Listening Ports. So that, if those things restart, they restart on the same port that we originally had them on. And that's OK. That's pretty straightforward.

So we need the ability to have some hardcoded nodes, and to be able to pass these hardcoded nodes---either through the CRUST API, or install CRUST with a Bootstrap file that's got those hardcoded nodes in it. If the hardcoded nodes are in the Bootstrap file, we should identify them, as hardcoded, because of the way that the Bootstrap file should work.

So the Bootstrap file itself should work like this: If CRUST connection is 'direct'. (And I'll say 'direct.' We should come up with a word for direct.) But, if we can get to that connection via more than one CRUST node, then we can send it direct. Then put in Bootstrap. Then Limit Bootstrap to 1,500 Newest nodes. So we always want to have the Newest nodes.

And that is why we need to identify hardcoded ones as different. We either pass the hardcoded ones through every time, through the CRUST API, or by identifying them in this Bootstrap file as something different. I.e. they don't need to be the Newest nodes there. (I'm just going to try and block this sun a wee bit.)

So the Bootstrap file is really important. The ability to hardcode or pass through the Fixed Listening Ports to nodes is really important for these guys here.

Not only these guys---there's another situation where you want to be able to pass the listening port through. And that's where people have port-forwarded on the router.

So, if you've got TCP or UDP (it doesn't matter), you can always port-forward on the router. So if you've got TCP or UDP on your router, you can say, "Port X should go to IP Port." And IP being a local IP on the network. So you might Port 8080 always goes to 192.168.0.1:80 (just 80). And that could be TCP or UDP. It doesn't matter. So in that case as well, you want to be able to pass a Fixed Listening Port. Because, after you've port-forwarded something, you don't want to then change the IP address here. Because you would have to get into the router and change the table again.

So these hardcoded endpoints are not the only reason to be able to pass through the listening port to CRUST. So that's pretty important.

And just getting that right means that CRUST should always be able to connect back to the network it's supposed to connect to. (Unless there's a dramatic thing, like you physically aren't connected to the Internet. The router is off or something like that.) So that gives us an ability to connect back to the network we were on, which is pretty important.

The Beacon gives us the ability to quickly create test networks. And if we're using a local machine or a local network, it gives us the ability to test all this stuff really, really easily, just by switching the thing on, not doing anything else.

So, in terms of protocol handling, we've got TCP, and we've got UDP. So we need to make it connection oriented. TCP's OK. And we switch to UDP for that.

Then we need to have a NAT-traversal. So NAT-traversal, what we do here is hole-punching. And that's where we use a rendezvous connection, or just a connection with a longer time-out than usual. NAT-traversal with TCP, we have got uPNP. (And for uPnP we can actually take the Rust Bittorrent library, and it's got a uPnP file, and we can just put it straight into our thing.) You've got NAT PnP, which I think has died; I think NAT PnP has died. That was an Apple thing. And we've also got hole punching. And that's: see the paper. There's a paper on the ReadMe of the library, that explains hole-punching.

And then we've got NAT Detection. We should probably call NAT Detection something like: Bootstrappable... some kind of Bootstrappable thing. And both of these will use the same mechanism---so basically if A connects to B (using whatever it did). (On here, both of these also have port-forwarding.) So if A connects to B, then A should ask C, "Also connect to B = OK (We can Bootstrap)."

And the same process here-- now, there's a slight thing, that (I'll just take that line away, just so that that is the same), "If B address is Local && Bootstrap == OK." So if there's a local address, you can connect to him anyway. You're not through a router. (By definition, being a local address you're not through a router.)

So NAT Detection (or Bootstrappable), once that happens, that means you can write these to the Bootstrap List. So we've got this detection thing. We've got the traversal thing. And we do something with UDP to get it connection oriented. So we've got options here as well. (That should be UTP. And whatever that was I can't remember.) But anyway, we've got options here.

So this stuff we know all gets presented as a CRUST endpoint, which is Proto and then Address {Proto : Address}. That CRUST endpoint---endpoint refers to machine, a physical (one). (It could be hooked up on a physical machine. But anyway that's why it's a physical thing. ...It's a TCP stack somewhere in the world.) So this endpoint that refers to this machine could be a vector of {Proto : Address}. Because, you might have several TCP, several UTP, and all the rest of it. And when the upper layer says, "Connect," it might just pass this vector of endpoints.

(And we should probably not call these endpoints, because endpoints generally are a socket address. We should call it something else, what these are, because that's protocol and endpoint really. Because, that could be IPv4 or IPv6. IPv6 later. But the point with these talks is to discuss also the future.)

But this is what CRUST should do, I think. And then it's a case of: if we started to add another protocol, for instance a Named Pipe---that already is connection oriented. We don't need that. (And that's not applicable, as well.)

But CRUST should be able to add protocols as we go along. And, an [??] [??] [??] UTP here-- we might UTP, because it's pretty good for messages across the network. But UDT might actually be better for very high speed local networks and high speed data transfers. So there's no reason to imagine that this is an either-or situation. We can add more and mroe protocols in here as we move along.

So when a CRUST node starts, it's going to have a list of protocols it knows it can deal with. So a CRUST node starts, it might know that it's got TCP and UTP. So on start, that's going to have to connect at least one of each protocol. So at least one TCP and at least one UTP thing would have to be started there.

But it's a P2P networking solution. And a lot of P2P stuff will be happy with just one connection each. But, "Trust??" Trust is the issue here.

And this is the Starbucks attack I was talking about. Where, if we know there's a Beacon thing for instance---and I start a Beacon node, and you come to the Starbucks that I happen to be sitting in---I know that you'll connect to me. I don't know what I could do to you after that. But I know that you'll connect to me. So that's, probably not good.

It's probably not important if you've also connected to X other nodes to begin with. And maybe I'm doing a parallel to get onto the network. I'll probably see what you're trying to do---because I would presume that you're asking everybody the same question. And even if you've encrypted it to me, I'll at least understand the initial question. That might give me something, which maybe isn't great. But it's not a showstopper, as long as we don't trust only that one connection.

So, CRUST seems to require one more setting. And that would be: the minimum connections per protocol. I can't see why you would want to connect to five of them, and ten of them. I don't think there's any reason for that. Because, what we're saying is CRUST is almost... you shouldn't care about the protocols. We're going to---CRUST as a library is going to connect for you. And in this case you would select: TCP, would be at number one. (Because, it's more efficient than UTP would be, in terms of general network. Not all the time, but for us it certainly is at the moment.) One more setting would be the minimum connection per protocol.

So we might say, we want that setting to be ten. We're going to Bootstrap off ten people---off at least ten in parallel, and get the information back. (And earlier, in our Routing layer, we would have to make a decision on, "Are we getting different information back from all of these?" "And how do we cope with that?" That's not CRUST's problem. That's Routing's problem.) That Bootstrap of a minimum number of ten---so it'll be a minimum of: that minimum number, or Bootstrap length. (Obviously if there's only five folk in your Bootstrap, you can only call on five of them. That's OK.)

So, I think that's... pretty much CRUST as a layer. I can't see any more to it. I think in the background in CRUST, we've got the ability to have this TCP where it's a single connection. And we should sort that, so that it's a single connection. UTP already will be a single connection. And we probably want that for every protocol that we have in place. But I think that's CRUST's job, that it's got to be as simple and straightforward as this.

The pig in the poke---the one part that I think is requiring some investigation per cache is: encryption. (And remember, when we're looking at CRUST, we should try and be not bothering with what Routing is doing. If an upper layer is doing an encryption that's fine. That is up to it.) But CRUST: should CRUST require encryption or not?

So, we've got TCP. And basically that's got some header info, that doesn't really expose us. We don't really need to encrypt that; it's just TCP. And we've got a payload. (Which, if the upper layer has encrypted that or not, that's great. That's up to it.) So for TCP, it seems to be a quite straightforward, "No, let's not encrypt it." "Let's pass all this responsibility to the upper layer." For UTP or protocols similar, we've got the header. (So again, it doesn't expose us---not a problem.) But then we've got: our header. And we've got the payload. Now, whether or not that's encryption... doesn't matter. This, here, exposes the protocol.

Now, if everybody was using UTP and their auntie, it wouldn't matter. That wouldn't make any difference, because it's just exposing UTP, here.

But if we were the only project that used UTP, then we would be exposed. So there's this part here, which is the encryption question. And when you start documenting it like this and looking at it, it kinda says, "That protocol needs to do something about this." "This needs to hide that." Now, it can't just encrypt that, because there's a size issue. It would be tough to encrypt the whole thing.

And my thought process here is: CRUST should do encryption internally---and invisible to upper layer. So CRUST would just fully encrypt that. (It could leave that alone; it doesn't matter. It's up for debate.) But in terms of hiding what we're doing, the upper layer: if it wants to hide stuff, it can encrypt that. And it would then also have encrypted this. But CRUST, I think at least has to internally encrypt that whole thing, except for the [UDP?] header. And that's one that definitely needs investigation.

Now, in terms of doing that, it's pretty straightforward. It's pretty easy. Because, every CRUST node is going to have a Bootstrap List. And it could easily write its private and public keys, there---and exchange the public key as its identifier. So, Crust 1 gives Crust 2 its public key. And they can start encrypting stuff. But that identity, that CRUST would be using---to pass keys about, and clarify keys---would never be exposed through the API. Because, it just causes confusion.

So that's one area I think of research per cache that you would have, as that encryption. Because, basically, we are saying CRUST will use random ports, random protocols. But if we've got that exposed UTP or UDT, it doesn't matter. If we've got a [??] of information which is going to give us away, then we need to look at ways of getting rid of it. And we can't use some pattern like XOR, or some weird thing. It has to be actually proper encryption. Otherwise pattern [??], we could use it in routers, and do packet inspection and stuff. So really, that encryption is an area to consider.

And, if we want to see how it's handled, we can actually look at that the Peter [??] guy's P2P thing, out of his documentation. Because, they use actually libsodium to do that, and it seems to be fairly quick.

So, I don't think it's a huge issue to solve. I think it's relatively straightforward to solve. But it's just good to spend a little bit of time on it, being focussed on that particular bit of investigation.

But that's CRUST---nice, neat library, impressive library.
And that's pretty much me.

"SWEET."