- GitHub issues of matrix.org pieced together as one "story":
- I noticed in your blog post that you were talking about doing a postmortem and steps you need to take. As someone who is intimately familiar with your entire infrastructure, I thought I could help you out.
- Complete compromise could have been avoided if developers were prohibited from using ForwardAgent yes or not using -A in their SSH commands. The flaws with agent forwarding are well documented.
- Escalation could have been avoided if developers only had the access they absolutely required and did not have root access to all of the servers. I would like to take a moment to thank whichever developer forwarded their agent to Flywheel. Without you, none of this would have been possible.
- Once I was in the network, a copy of your wiki really helped me out and I found that someone was forwarding 22226 to Flywheel. With jenkins access, this allowed me to add my own key to the host and make myself at home. There appeared to be no legitimate reason for this port forward, especially since jenkinstunnel was being used to establish the communication between Themis and Flywheel.
- I was able to login to all servers via an internet address. There should be no good reason to have your management ports exposed to the entire internet. Consider restricting access to production to either a vpn or a bastion host.
- On each host, I tried to avoid writing directly to authorized_keys, because after a thorough peak at matrix-ansible-private I realized that access could have been removed any time an employee added a new key or did something else to redeploy users. But sshd_config allowed me to keep keys in authorized_keys2 and not have to worry about ansible locking me out.
- The internal-config repository contained sensitive data, and the whole repository was often cloned onto hosts and left there for long periods of time, even if most of the configs were not used on that host. Hosts should only have the configs necessary for them to function, and nothing else.
- Kudos on using Passbolt. Things could have gotten real messy, otherwise.
- Let's face it, I'm not a very sophisticated attacker. There was no crazy malware or rootkits. It was ssh agent forwarding and authorized_keys2, through and through. Well okay, and that jenkins 0ld-day. This could have been detected by better monitoring of log files and alerting on anomalous behavior. Compromise began well over a month ago, consider deploying an elastic stack and collecting logs centrally for your production environment.
- There I was, just going about my business, looking for ways I could get higher levels of access and explore your network more, when I stumbled across GPG keys that were used for signing your debian packages. It gave me many nefarious ideas. I would recommend that you don't keep any signing keys on production hosts, and instead do all of your signing in a secure environment.
- You thought there were 8, but now there are 9 (that's right, I see you watching me, I'm watching you, too). This is the last one, and I think it's the best advice I've got for you.
- 2FA is often touted as one of the best steps you can take for securing your servers, and for good reason! If you'd deployed google's free authenticator module (sudo apt install libpam-google-authenticator), I would have never been able to ssh into any of those servers.
- Alternatively, for extra security, you could require yubikeys to access production infrastructure. Yubikeys are cool. Just make sure you don't leave it plugged in all the time, your hardware token doesn't do as much for you when it's always plugged in and ready for me to use.
- Alternate-Alternatively, if you had used a 2FA solution like Duo, you could have gotten a push notification the first time I tried to ssh to any of your hosts, and you would have caught me on day one. I'm sure you can setup push notifications for watching google-authenticator attempts as well, which could have at least given you a heads up that something fishy was going on.
- Anyways, that's all for now. I hope this series of issues has given you some good ideas for how to prevent this level of compromise in the future. Security doesn't work retroactively, but I believe in you and I think you'll come back from this even stronger than before.
- Or at least, I hope so -- My own information is in this user table... jk, I use EFNet.
RAW Paste Data