The science of developing robust software is great and pretty consistent going back decades varying mostly in specific tools and tactics. Mainstream programming just doesn't apply it although more adoption in past decade of key techniques. Here's some computer science from the 1960's-1980's used in robust and secure system development (esp Orange Book B3 or CC EAL6) people might want to copy. I'm taking an empirical route where I reference techniques that were applied to many real-world projects with lessons learned in papers or studies that were consistent. All one can do with limited data & these aren't in order of importance.
1. Formal, non-English (eg math/logical) specifications of requirements or abstract design. English is ambiguous and misreadings of it caused countless errors, even back then. CompSci researchers tried formal specs with both English as a start and precise notations (eg Z, VDM, ASM's, statecharts) for clarity in specifics. Result was many inconsistencies caught in highly assured systems and protocol specs before coding even began.
2. High assurance stuff often used mathematical (formal) verification. Whether that worked or made sense was hit and miss. More on it later. Yet, virtually all of them said there was benefit in restrictions on the specs, design, and coding style to fit the provers' limitations. Essentially, they used boring constructs that were easy to analyse and this prevented/caught problems. Don't be too clever with design or code. Wirth and Hansen applied this to language design to bake safety & comprehension in with minimal to low loss in performance.
Note: Led to Nick P's Law of Trustworthy Systems: "Tried and true beats novel or new." Always the default.
3. Dijkstra's THE project showed that modular, layered design with careful attention to interfaces (and interface checks) makes for most robust and maintainable software. Later results confirmed this where each module must fit in your head and control graph that's pretty predictable with minimal cycles prevented all kinds of local-becomes-global issues. Many systems flawless (or nearly so) in production were built this way. Dijkstra correctly noted that it was very hard to do this even for smart people and average developer might screw structuring up a lot. Solid prediction... but still worth striving for improvement here.
4. Fagan ran empirical studies at IBM that showed a regular, systematic, code review process caught many problems, even what tests missed. Turned that into formal inspections with the periodicity and prioritizing tuned per organization for right cost-benefit. Was generalized to whole SDLC by others in high robustness areas. Improved every project that used it from then on. Exactly what parameters to use is still open-ended but periodically looking for well-known flaws with reference sheet always works.
5. Testing for every feature, code-path, prior issues outside of code base, and common use-case. All of these have shown repeated benefits. There's a cut-off point for each that's still an open, research problem. However, at a minimum, usage-based testing and regression testing helped many projects achieve either zero or near-zero, user-facing defects in production. That's a very important differentiator as 100 bugs user never experiences is better than 5 that they do regularly. Mills' Cleanroom process combined simple implementation, code review, and usage-testing for insanely-high, statistically-certifiable quality even for amateur teams.
6. By around 60's-70's, it became clear that the language you choose has a significant effect on productivity, defects, maintenance, and integration. Numerous studies were run in industry and military comparing various ones. Certain languages (eg Ada) showed vastly lower defects, equal/better productivity, and great maintenance/integration in every study. Haven't seen many such studies since the 90's and most aren't constructed well to eliminate bias. However, it's grounded in science to claim that certain language choices prevent common negatives and encourage positives. So, it follows to adopt languages that make robust development easier.
7. By the 80's or 90's, it was clear that computers were better at finding certain problems in specs and code than humans. This gave rise to methodologies that put models of system or code into model-checkers and provers to show certain properties always hold (the good) or never show up (the bad). Used successfully with high-assurance safety and security critical systems with results ranging from "somewhat beneficial" to "caught stuff we'd never see or test for." Back then it was unclear how applicable it was. Recent work by Chlipala, Leroy, et al show near perfect results in practice when specs/proofs are right and much wider application than before. Lots of tooling and prior examples means this is a proven way of getting extra quality where high-stakes are worth the cost and where core functionality doesn't change often.. The CompCert C compiler, Eiffel's SCOOP concurrency scheme, and Navy team's EAL7 IPsec VPN are good examples.
8. Static analysis, aka "lightweight formal methods," were devised to deal with specialized skills and labor of above. Getting to the point, tools like Astree Analyzer or SPARK Ada can prove absence of common flaws with little to no false positives without need for mathematicians in the company. Just a half dozen of these tools by themselves found tons of vulnerabilities in real-world software that passed human review and testing. Enough said, eh?
9. Software that succeeded with testing often failed when random stuff came at it, especially malware. This led to various fault-injection methods like fuzz testing to simulate that and find breaking points. The huge number of defects, esp in file formats & protocol engines, found via this method argues for its effectiveness in improving quality. It ties in with stuff above in that well-written code that validates input at interface and preserves invariants throughout execution should simply disregard (or report) such erroneous input.
10. Interface errors themselves posed something like 80+% of problems. This was noted as far back as the 60's in Apollo project when Margaret Hamilton invented software engineering, fault-tolerance, and specification techniques to fight it. Dijkstra and Hoare pushed for pre- and post-conditions plus specific invariants to document the assumptions of code during procedure calls. Modern version is called Design by Contract in Eiffel, Ada, and numerous other languages (even asserts in C). Many deployments and tests showed such interface checks caught many issues, esp assumption violations when new code extended or modified legacy.
11. Concurrency issues caused all kinds of problems. Techniques were devised by Hansen (Concurrent Pascal) and later Meyer et al (SCOOP) to mostly immunize against them at language level with acceptable performance. Languages without that, especially Java, later got brilliant tooling that could reliably find race conditions, deadlocks, or livelocks. Use of any method inevitably found problems in production code that had escaped detection. So, using prior, proven methods to immunize against or detect common errors in concurrency is A Good Thing. Note that shared-nothing, event-driven architectures also emerged but I have less data on them outside that some (NonStop, Erlang) worked extremely well.
The above are just a few things that computer science established with supporting evidence from real-world projects so long ago that Windows didn't exist. Anyone applying these lessons got benefits in terms of code quality, security, and maintainability. The rare few applying most or all of them, mainly high assurance community, got results along lines of space shuttle control code with extremely, low defects or zero in production. So, anyone wanting to improve quality or security of their software should apply what you can on this list. Experiments in cost-benefit of combinations of methods on various types of software would also give us useful data.
(High assurance focus)