mixster

Distributed chat protocol

multiple inter-cluster simple text interchanger (micsti)
Or
CIRC (Cluster IRC).

Multiple clients act as servers (built in rather than separate functionality).
Each client connects to two servers to ensure successful message submission.
To remove duplicate messages, clients send messages with ID relative to themselves; receiving clients then remove duplicate messages using the ID.
Servers handled on a per room basis � channel owners and operators act as servers to ensure validity in sent messages.
Protocol used for small scale discussions where by running IRC server would be over the top or too much for a single server to handle. Also used where group IM protocols are impractical or too complex to produce a simple implementation.

Standard client submitted message format:
ActionType:whitespace values

The whitespace values will be separated into x number of strings (based on standard explodes using space, � �, as the delimiter) where by x is decided based upon ActionType.

Standard client received/server to server message format:
MessageID Senders_nick Senders_name Senders_host Action type:csv values

Senders_host would be a uniquely identifying characteristic, but not the actual IP of the client (optional, but should be used for security reasons) � it is the responsibility of the owner server to setup a system for scrambling the IP (ie using a simple hashing algorithm) and then onwards it is the individual servers responsibility to track the user via the host. In the case of multiple clients from a single IP, it is up to the owner server to produce the same hosts. The connection servers should preferably make no mention of the IP and instead group the client�s connection with the host after it�s made.

To allow multiple connections from the same IP, the Senders_name value would be one that remains static for the duration of the clients connection (where as the nick may change). The owner server should ensure unique Senders_name to prevent ambiguity in cases. It may be worth the owner server itself assigning Senders_name and instead using it as an id (ie as a hexadecimal connection number), but this would require the owner server being implemented with extra functionality, so by default the clients themselves should be able to set the value with the server only refusing due to duplicate.

Also, the format could (probably should for complex implementations) be expanded to allow a single server to host multiple channels by adding another value to both sent and received messages. The problem with this then leads to complex structures and much harder to implement end results as individual channel hierarchies have to be maintained. Way for multiple channels with ease would be binding different servers to different ports and then running multiple instances. One huge problem would be distinguishing channels � possibly the central routing server would have to handle it, but then that opens up new problems in regards to passing higher loads through the server.

Servers talk to other servers in the form of server to server discussions mentioned above, but the receiving server will contain a list of servers and so only execute server actions when sent from a server in case any users try to create false server messages. The only operation not permitted is for a server hosted by an operator to claim ownership (or other above rank) of a channel.

In the case of a standard client submitting a message, it will be sent to their parent servers who will distribute it to all other servers and then each server will send to their clients. I am unsure if the server to server message distribution should be a one to all or if they should be paired up. One to all would mean higher individual loads, but the pairing up produces needs for complex routing systems having to be built in as the servers are mainly dynamic. If it were a single connection like normal IRC protocol, then netsplits would be a problem � especially if multiple servers drop. The need for a routing system would still be required if using one to one server connections, but it wouldn�t be as hard to produce and if done then netsplit effects would be minimal.

Outside of standard servers, a separate routing server would be used to connect to various channels and so direct new clients of a server to a specific channel. This would be optional as clients could connect directly to an existing channels owner server, but due to the nature of the protocol it would remain unlikely for one to exist for extended periods of time.

To handle orphaned channels in the case of owner dropping out, the owner will be able to set an heir relating to a slightly higher than operator status. Upon the owners� death, the heir is promoted to owner and then they may set an heir as well. In the case of no heir being set, the operator with the longest time as operator will be promoted to owner. In the case of no more operators being present, the clients will be informed that the channel has died. This could be overcome by promoting clients to servers, but a temporary server would have to be setup, thus requiring a central server to be available and the clients would then be at risk as a potentially unknown client would gain their IP�s. Also, no record of existing clients would exist if all the servers died, so clients would have to be given information to host a server and so lose the security aspect.

This protocol does reduce various security features usually built into IRC servers, i.e. (partial) IP/hostname masking. However, this protocol is based upon mutual trust amongst channel staff and the clients� acknowledgement of this before connecting. Likewise, the channel staffs acknowledge that there is a security risk to them by having to announce their IP/hostname to clients. Where as security is impaired, it does allow channel owners more control over the channel without the hassle of running a full IRC server as they can forcefully kill users etc as well as remove some of the risk from themselves as far as attacks on the servers go as can normal channel operators.

One other flaw is the lack of direct messaging to people outside of channels due to the secluded nature of the channels. It could be overcome by sending messages through one of the central routing servers, but that would increase load for them by large amounts if it is assumed that external messaging would be popular. Another possibility is that inside a channel a private conversation can be started via a request and they then become a server pair for their own restricted channel, but this would still require them being in the same channel as the wanted contact.

Of course, the restriction on messaging would make it harder for spam attacks to take place, especially the request for information from the other client before starting queries.

Possibly one to one querying could be allowed by sending a request through the central server to another channel as this would only mean a slight load and it would then be up to the clients to handle it. Along with timers to prevent flooding (non-required), it would mean that mass private message spam attacks simply wouldn�t happen.

Regarding actual sending and receiving of messages, affects to download and upload would be minimal. The rate of twice as much downloading for clients is high, but the cost of downloading a message would be minimal and worth it in terms of reliability. If it is deemed unneeded, it could be removed fairly easily from any implementations as they should be built so that the client connects to a number of servers dictated by the owner server (which is strongly suggested to be two by default). Likewise, the reliability aspect could be furthered and when clients receive non-matching messages with matching ID�s it could request confirmation from servers. It could also ensure that it receives the same message from the required number of servers to reduce the chance of tampering. However, by default the protocol should display all messages with non-duplicate ID�s (first received having priority to subsequent ones).

The servers will also have to keep an updated list of clients in a non-unique fashion to ensure that the dropping of multiple servers in a short time doesn�t kill clients. It could be deemed unneeded and so removed with only the owners� server keeping track of all clients so that it can effectively distribute new users, however the result of multiple servers going down, including the owners, would result in killed clients.

This does also result in no long term netsplits as the clients will be told to reconnect to a server that is part of the network if they are removed from it via a server going rogue. Likewise, the connection to two servers means that two servers would have to separate from the main network simultaneously in order to properly remove clients from the network, but they could then regain connection by connecting to the channel again and the owner server itself should try to re-attach the lost clients.

Malicious activities could arise when a single user has multiple servers� setup for a channel, there by being able to alter any messages sent/received to clients who are assigned to said servers. The same setup could exist if multiple servers agreed to participate. This would be entirely undetectable unless the client had reason to suspect tampered transmission. In the case of only a single malicious server, the conflicting messages would be able to quickly raise suspicions. As well as just altering, the server could drop all messages to/from a client with similar consequences as to tampering.

Other possible malicious acts would be for an operator to flood joins to kill the owner and so promoting themselves or a fellow malicious operator to owner status. However, this would be prevented if the owner set an appropriate heir (who would hopefully be able to react to the malicious attack in time to prevent a similar fate). It is also unlikely as it would be difficult to delay an owner server enough to kill it assuming that it is run by standard internet connections where by the download speed greatly exceeds the upload speed.

In regards to action types, they will remain fairly simple and similar to general IRC actions:
Message: standard message to be sent to all clients; 1 argument being the text to transmit.
Action: message to be displayed as third person to all clients - same as CTCP ACTION in IRC; 1 argument being the text to transmit.
Request: ask for data from client/s � same as CTCP in IRC; should be able to handle 1 or 2 arguments, first being the data to be returned and the second being the client/s to request the data from (should be CSV list of nick names) � if no second argument set, it is assumed they want to request the data from the whole channel. Clients should handle this transparently for certain requests, such as standard PING and TIME requests, but others should require confirmation, such as IP/hostname to start a private query.
Mode: assign sent mode to the target; 3 arguments, first being the target, second the mode and third being the value. Values are always flags (ie 1, 0 or true, false), so it should be able to handle multiple mode changes at once (ie set m to 1 and n to 0) � in the interests of easy use, like IRC + and � should be valid values as well as 1 and 0.
Join: sent to owner server; 2 arguments, first being nick to join with and second being an optional message to go along with the request to join � message can be used to channel keys.
Connect: sent from owner to new client and servers to initiate connection; 0 or 1 argument/s � if none then it is a server request sent to the owner else the first argument is the IP/hostname to connect to. Clients should receive two connect messages and if not should prompt the owner for another one. Likewise, if a server dies and a client isn�t given a new server, it should also request a new one, but the owner itself should distribute out new servers.
Part: sent from client to server; 1 optional argument being the reason for parting. If not multiple channel implementation, this would be the same as quit.
Quit: Same as Part except sent to all channels (only for multiple channel implementation, but may as well be alias for Part in single channel systems).
Ping: Sent from a server to any client (including other servers) to see if it is alive; 1 argument being the servers current system time. If the Pong reply takes longer than a preset time (30 seconds for normal server, 2 minutes for client and 5 minutes to owner server is what I�m thinking) then it is marked as dead. In the case of a client, it�s simply removed; for a server its clients are reassigned a replacement server and/or new owner set if applicable. The ping timer should be reset upon receiving a message from them.
Pong: Sent from a client to a server to show that it is alive; 1 argument being the received message that came with the Ping.
Channel Query: Message sent to a single client. 2 arguments, first being the target nick and second being the message itself. Insecure as it still goes through the servers, but should not be displayed to the client side of the server unless it is directed to them.
Private Query: Message sent to a single client. 2 arguments, first being the target IP/hostname and second being the message itself. Secure in terms of no eavesdropper as the client should take on a server to client relationship with the target (and vice versa). Possibly not to be in an implementation as a channel query asking to join a specified channel would end up with same results.
Notice query: Like IRC notice, it is a message that should not be automatically replied to in any circumstance. If possible, implementations should be extended to kill whoever dares via electrocution or the like. Unlike IRC, it should not be able to be sent to a channel or IP/hostname and only a target nick.
Names: Like IRC, it returns a list of nicks of people in a channel. It takes only one argument, that being the target channel.
Names list: Resultant of Names - if the nicks break over to multiple lines, then each message will begin with the �page number� followed by a comma. No arguments but rather a space separated list of nicks and possibly page number if it spans multiple messages.

Unsure if action types should be numeric or not � numeric would mean that multiple languages could be implemented without any reference to English, however English based action types would make debugging bad output etc a lot easier. Would be possible to easily allow both to exist, but in the end numeric would be the better choice (if easy debugging is needed, the numeric codes could always be parsed before printing).

Assuming numerical, 5 digits would be used. First digit dictates level allowed to use (0 client, 1 operator, 2 heir, 3 owner, 4 central routing server or something along those lines � possibly add a lesser operator who takes a similar role to IRC channel operators rather than server operators). The second two dictates the group, ie 00 for modes, 01 for messages. The last two would then dictate which subtype to use, ie 00 for channel message, 01 for channel sent private message, 02 for query notice. As it should aim to be a light protocol, it seems unlikely that running out of possible numbers would be an issue.

As far as arguments go, there will be no escaping as spaces should be barred from channel names, nicks etc. It will be up to the server side to check validity of nicks before admitting them, but the client should also filter it to remove unneeded sent messages and raise an error.