Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Rust in Large Organizations
- **Initially taken by Niko Matsakis and lightly edited by Ryan Levick**
- ## Agenda
- - Introductions
- - Cargo inside large build systems
- - FFI
- - Foundations and financial support
- ## Attending
- - Joe, Microsoft, Seattle Rust Meetup
- - Tom at Mozilla, using Rust for sync
- - Lena at Mozilla, sync storage etc
- - Jack Moffit at FB, Libra team
- - Brian Anderson at Pingcap
- - acrichto
- - erickt
- - dtolnay, David Tolnay
- - Raj Vengalil, Azure IoT
- - cuviper, Redhat
- - Rain, FB
- - Jeremy F, FB
- - Manish
- - Ben, Google
- - Philip, Cumulo, Rust dev tools + infra
- - Remi, Qumulo
- - Sebastian, MS, pushing for Rust adoption from sec pov
- - Thomas Ekerd, MS, site reliability engineer
- - James, MS
- - Brandom Williams, FB
- - JR, Mozilla backend services
- - Phil
- - Will, crash ingestion mozilla
- - Stjepan, Ferrous system
- ## cargo
- - FB dev env -- backend services repo -- is mostly C++
- and Java. Very polyglot environment. Glued together with Buck,
- FB's Bazel.
- - Buck: Language agnostic. Supports Rust.
- - rustc drops in quite nicely, basically equivalent to C++ compiler.
- - wanted to use cargo but it just does too much to fit in
- - need to delineate parts of cargo that are desired with those that conflict with Buck
- - ecosys is big advantage for Rust but hard to separate from cargo
- - current scheme:
- - big cargo.toml including all the things used in internal repo
- - cargo builds artifacts that are presented to buck
- - buck can link against those
- - reasonably successful
- - but approaching 700 crates in transitive dep graph, getting very cumbersome to rebuild etc
- - plus pinned to a specific version of compiler (prebuilt artifacts)
- - works ok but build.rs build scripts are a big complication
- - specific cargo pain points:
- - build scripts
- - "features" feature
- - a lot of crates don't use features the way they're intended -- they're used for exclusive A or B choices
- - this creates the possibility to break the build
- - need some sort of "cfg" feature that represents forks of a crate
- - Google does a similar thing for fuschia
- - cargo builds 3rd party artifacts, normal build consumes those
- - problems:
- - handful of 3rd party artifacts depend on things built in tree
- - want to be able to do partial builds, e.g. w/o a feature, or just for some targets
- - developing for a new OS, so we compile some code for host, some for target
- - presently do 2 full builds, but it's a pain
- - don't have as much control over the flags getting passed to rustc as we'd like
- - dep flags + linker flags aren't as specific as we need to distribute deps that are needed for indiv targets
- - prototype using "cargo raise" (use "gn" (from Chrome) to generate ninja files)
- - based on a modification of cargo raise that generates bazel build files
- - has its own handling of build.rs stuff
- - rather than outputting build files, it outputs a json format that could be the basis for the proposed "cargo build plans" feature
- - would be good to know what inputs etc are needed, how this would fit for Buck
- - can Buck consume internal files?
- - gn is aware of the concept of a Rust target
- - cumulo build system:
- - doesn't use cargo, invokes rustc directly
- - cargo just builds json
- - build all deps as shared libraries, whether or not they want that
- - `.so` libraries, `.rmeta` files
- - hits a lot of problems
- - ran into problems, notably lack of support for build.rs -- have to reimpl cargo
- - building for 2 different targets
- - have own platform
- - linux target for procedural macros
- - need sometimes to pass flags that are target specific, build a target config map
- - would prefer to use cargo
- - does cargo raise support build.rs?
- - has some builtin support for build.rs?
- - not automatic: you declare purpose of build.rs
- - things that do rustc version detection?
- - sometimes you want to (e.g.) disable build.rs that supply native deps which come from bazel
- - why can't you run build.rs as part of the build tool?
- - fundamental problems:
- - no declared inputs, no declared outputs
- - buck/bazel etc has to know what files the build script is consuming, producing etc
- - also, they are arbitrary execution, which can be a security concern
- - proc macros have some similar concerns.
- - e.g., pest which looks at cargo source dir env variable and finds your grammar def'n file
- - doesn't fit well
- - one thing that was discussed years ago:
- - capability system for build.rs that restrict what scripts can do
- - e.g., read from this directory, write to that one
- - cargo can then audit/sandbox to enforce said rules
- - run build script in a sandbox
- - e.g. crossvm has an impl of this inside of chrome; all crossvm devices run in their own jail
- - nontrivial engineering effort
- - could do at a higher level, sandbox
- - jeremy: build scripts classified into 3 or 4 distinct types, is this complete?
- - doing codegen. read a file, bindgen, etc
- - gateway to some other library, using pkgconfig or something to find the library, or they build it from source
- - feature detection on rustc
- - "scary ones" -- database reads, timestamps
- - plausibly could address those use cases in other ways
- - feature detection is an obvious one, e.g. we had an rfc for compiler versions
- - version compat is a common thing
- - what version of rust are people using?
- - stable
- - "stableish" -- bootstrap
- - nightly
- - who here is using toolchains distributed by rust?
- - ms (partially), mozilla, libra
- - why a custom toolchain?
- - config.toml tweaks
- - use clang's version of some unwinding code
- - custom linker
- - panic=abort
- - custom targets
- - compliance reasons (wanting to build from source for security reasons)
- - bootstrapping + compliance
- - where to get initial rust version?
- - several attempts:
- - most successful is using mrustc at version 1.22 and building from there
- - ms, google did that
- - is there a possibility of long term drift?
- - builds are not *quite* reproducible at present, but almost
- - was a point where build w/ mrustc + build with toolchain had non-matching hashes
- - might have to tweak the paths
- - in principle it can be done, should maybe prioritize it
- - maybe have an approved "how to bootstrap from C" documentation
- - specific reason fb builds from source:
- - want to always have the option to apply a local patch
- - don't want to get stuck with a "we must have this patch yesterday" scenario and have to figure out how to apply patch then
- - in most cases, also building llvm, want to share llvm for cross-lang LTO
- - must have a newer LLVM than what rust ships with
- - some folks have cross-lang LTO working
- - but rustc doesn't want to produce bitcode files
- - pass the linker `/bin/echo`
- - pgo -- coming soon
- - fb uses after the fact binary rewriting
- - splitting out linker was a potential change to rustc or cargo that google wants
- - would be interesting to know "here is what must be passed to gcc to successfully link"
- - another option: give a python script as the linker
- - turns out servo does it, too
- - show of hands survey:
- - "who is interested in a common backend for 'those things'"
- - nobody knows what that means
- - buck needs a "fully specified dep dag", seems like a common thing for other build systems
- - seems like we have to do a few cases to work out the general rules first
- - rudimentary cargo build plan support:
- - gives a dag of rustc executions
- - but it's too low level for buck, also bazel
- - pressure: every once in a while people propose "rewriting cargo.toml" into the tree
- - so far resisted that
- - a possible outcome buck has thought of:
- - buck support for cargo.toml
- - ton of code that's open source for people (natch) don't want to build w/ buck out of tree
- - want ability to simultaneously maintain buck/cargo support
- - currently done by hand and horrible
- - internally even people want this for mac/win builds which buck doesn't support
- - google w/ gn does something similar, keeps cargo.toml in order to upstream it
- - in some cases can generate a cargo.toml file programatically
- - also imp't for IDE support
- - IDE support
- - RLS kind of working with buck
- - knowing laughter :)
- - problematic assumptions: e.g., searching the filesystem for cargo.toml, but it's millions of files
- - symptom of a larger thing
- - cargo is designed for managing rust code
- - assumes source tree is mostly rust code
- - but often rust is embedded in a large source tree with tons of non-rust
- - so having some "root for all rust code" where you search below is problematic
- - top-level directory not gonna work
- - always having to create artificial "root" directories
- - rust-analyzer avoids this by not baking cargo in as deeply
- - but still has this "top level directory" model that contains all the rust code which means a small amount of rust amongst everything else
- - generating a cargo.toml for 1 project works well, but when you have multiple targets that interact
- - cumulo has a ton of C and Rust code that must be all combined into one big final artifact
- - IDE support that avoids cargo is a must
- - current state of the art: ctags
- - cramertj: cargo.toml is basically the intermediate repr for specifying deps
- - are there other things one might want?
- - build system has its own custom language to do that description
- - can use that to generate cargo.toml files though for IDE etc
- - what changes might one want in a "non-cargo IDE language"?
- - maybe cargo would work fine
- - manish: does this also cause problems for clippy and rustfmt?
- - cargo.toml is also useful for this
- - who uses clippy? most folks
- - rustfmt? most folks
- - fb invokes it on individual files for that
- - libra uses cargo to build
- - "cacheability" (sccache) has gotten worse over time
- - procedural macros aren't getting cached (dylibs)
- - are other people doing anything with this?
- - ff has a distributed cache in the office
- - (buck does caching of everything)
- - native deps? also integrated into buck
- - assume that if a C dep changes, rust must be rebuilt?
- - `-lnative` is not very well-scoped (just to a directory, not specific libs)
- - problem: can't cache link steps as a result
- - maybe also part of the problem with sccache
- - in buck, each lib gets its own directory, sidestepping this problem
- - linker want:
- - ability to specify a specific mapping from link name to the native library
- - option to ignore link directories or transform
- - in buck case, if you have a dep on a native library, you get two options (`-lfoo` and full path to foo)
- - crate features, misuse thereof:
- - people seem to want option to have mutually exclusive features
- - want to have impls clone etc for testing but not in a release build
- - hacked up something using cargo features but doesn't work all the time
- - problems:
- - dev dependency `foo` with feature "testing"
- - sometimes testing gets turned on semi-randomly (???)
- - but you can also accidentally use "testing" in a normal tree
- - deps for build scripts leak through to the real graph, perhaps part of the "semi-random" behavior
- - designing from the wrong direction, perhaps?
- - a lot of requirements coming up that are "above and beyond" existing cargo spec and design
- - contra: goal is to have cargo co-exist with buck/bazel/etc, these are the features needed for that?
- - do we want to build another tool that is not cargo?
- - but everybody already has a tool and wants to use it
- - but how can we do minimal work so that integration of cargo + these other tools is smoother
- - working with rest of rust ecosys
- - de facto standard that crates.io + cargo have created
- - defined entirely by impl of cargo
- - only access at present is through cargo's impl
- - refactoring cargo into indep chunks with better interfaces might be the sol'n (and has been discussed)
- - cargo build plans, but they're not there yet
- - key thing: version resolution, very much in cargo's domain, would be good to specify
- - external dependencies + FFI?
- - can we use FFI to talk to rust?
- - want module boundary between rust things, using ffi
- - today: build scripts in cargo exist, common thing is to build+link to native libraries
- - one of the things that cargo raise does, you can describe the purpose of a build.rs (e.g., primarily to produce that 3rd party lib)
- - but you can translate that to a dep for that native library in your build system
- - summarize + action items?
- - cramertj wants to know what
- - dtolnay is working on a potential design ideas for a successor to build.rs
- - cargo metadata description to specify what it is doing, maybe replace build.rs?
- - just listing inputs would be a huge improvement
- - yes but we want something that's *easier* than build.rs today, to incentivize it
- - caching, can we improve it
- - some of it may be low-hanging fruit, e.g. on mac `.a` file has timestamps
- - but part of it is the growing popularity of procedural macros (`.so` are uncachable by sccache)
- - if linker were more predictable, sccache could handle it, but it's not
- - might be able to handle by separating out linking
- - how to translate cargo.toml etc?
- - buck today runs cargo, takes output with dep info + rlib files
- - but new tool goal is to determine from cargo metadata
- - no way of "definitively connecting" resolved deps with unresolved deps
- - cargo vendor tends to be a bit overagressive
- - lots of things people want, seems to vary between groups
- - when developing procedural macros, could do better job of noticing token stream output hasn't changed..
- - incremental
- - sccache sometimes handles that well (e.g. w/ build.rs)
- - related topic: distributed builds
- - sccache has support for that
- - but maybe sends whole dep folder, not always ok
- - would need more precise dep information to handle that (passing precise info for *transitive* dependencies)
- - `--extern` is precise, but transitive deps are still figured out by rustc
- - related: would be nice if, for rustc, could pass all the sources explicitly
- - in buck do you list all sources?
- - yes but a lot of globs :)
- - would be nice to have a tool that handled all the easy cases, with room for "extra" cases here and there
- - alex: interested in solving a lot of these issues and have thoughts
- - open to talking later about this stuff
- - a lot of small details, bug fixes, etc -- long road, no silver bullet
- - some kind of "enterprise cargo" place to hold this discussion(s)
- - a lot of needs boil down to:
- - quick fix combined with longer re-architecture
- ## FFI
- - two distinct languages invoking one another
- - sometimes linked into one process, sometimes cross process (RPC)
- - COM requires symbols to be ABI compatible
- - inline assembly, direct syscalls
- - "C parity"
- - FFI with C and C++
- - FB is doing C++ interop, as is Google
- - FFI beyond C or C++?
- - Java
- - syscalls
- - C# perhaps
- - (Ruby, Python)
- - Bindings to other languages are often mediated through a C layer
- - Increasing number of users -- C and C++ wanting to consume Rust APIs
- - Concerns:
- - unwinding
- - Qumulo: basically spent most of the last year preparing to do bidir FFI between Rust and C
- - fairly larger codebase in a dialect of C
- - rules you can impose on C side which helps sometimes
- - in one direction (Rust calling C) we have been able to use bindgen
- - but in the other direction (C calling Rust) we wrote a compiler plugin (uh oh) to generate C headers
- - Specification questions
- - concerned about cross-lang lto revealing a lot of interactions
- - Cross-lang thin lto
- - Dynamic testing and static testing
- - Have aliasing rules proven to be a problem?
- - FB: not so much. Mostly mediating rules through bindgen and trying to set things up to get compilation failures
- - Google: currently checking for changes
- - Google: pursuing a bit ways to annotate C and C++ headers so that can generate safe rust signatures from it
- - might be an interesting thing to standardize on
- - bindgen has a cumbersome mechanism for that (do)
- - would be nice to include small shim layers e.g. to translate to `Result`
- - FB:
- - C++ codebase in FB uses exceptions, have wrappers that captures and converts exceptions, this becomes a `Result` on the Rust side
- - manually annotating noexcept functions? basically all of them can
- - C headers are manually created with a `try { } except` block in C++
- - the code being interop'd is mostly C++ but have to manually write C APIs for it
- - build with panic=abort? no, unwind
- - also catching Rust exceptions at boundary?
- - C code doesn't call into Rust code that often
- - happy to make it abort though
- - but mozilla wants to handle panics, though it does it by translating it into a swift/java exception
- - usually the purpose is wanting to capture the call stack and report it
- - in theory could panic=abort if could capture java stack
- - FB sets a custom panic handler to report errors, then exits (could use panic=abort)
- - For COM FFI case? how handling virtual dispatch
- - manual adaptation with vtables and things
- - on Rust side, does that "look like" a trait?
- - active area of investigation
- - believe that (with proc macro support) can expose a trait that is actually a struct + vtable
- - similar to what GNOME projects are doing for glib bindings
- - mozilla does it for XPCOM, which is basically same thing
- - various bits of existing crates, but it's mostly nasty
- - Jeremy: one thing I've been thinking about:
- - standard set of library functions corresponding to C++ types
- - e.g. some way to use std-string from within rust code
- - good to have for templated types (unique-ptr, shared-ptr, and so on)
- - all types that can be directly used from Rust in some way
- - quite clunky today to have a C++ function that returns something Rust can use
- - on C side, it'd use the plain C++ types
- - but on Rust side, it'd invoke and do the right things
- - one of the pieces needed for C++ interop
- - instantiate the vec/string/other impls
- - should this part of bindgen?
- - missing part: manually instantiating separate things for each specialization
- - major topics of FFI
- - being able to "use header files" and get a "reasonably safe" FFI in Rust
- - what are building blocks we'd need to move things to user space?
- - template instantation list is one building block -- somebody has to write the tool, nothing needed from rustc
- - expectation is that there is always some work to manually bind
- - but what is minimal work we can do to make it easy to translate
- - annotations might be company specific -- fb vs google?
- - maybe? but can we collaborate?
- - different C++ dialects and patterns in use
- - what about from other languages, esp. around C++?
- - closest inspiration might come from Swift
- - rich bindings from Rust to C++ for hashmaps etc
- - because FB uses thrift for RPC mechanism (and sometimes FFI)
- - would be useful to be able to do tricks like that for hashmap and sets perhaps
- - some kind of tool for consuming a C++ header file to automatically produce an interface in Rust
- - complication in some environments: multiple allocators
- ## use of unsafe
- - ms: would like to know how to control use of unsafe in codebase
- - google: grep
- - servo used the compiler directives to disallow unsafe where possible
- - in some cases, allow unsafe within a specific file
- - integrate with review tool to draw attention
- - unsafe is really many things: sometimes simple, sometimes not
- - C++ code: all unsafe? not reviewed under the same standard?
- - more interesting question is unsafe in dependencies
- - auditing in crate graph in general is a problem
- - geometric growth of deps
- - how do you audit safe code?
- - would be great if there were some central place doing auditing (and getting paid to do it)
- - but we'd also need some mechanism to declare what's been audited etc
- - blessed crates and versions
- - let crates.io metadata include auditing
- - presumably want to know also things like 2fa, review policy, etc
- - attacks these days are very targeted in other ecosystems -- e.g., replacing specific versions of crates to attack specific targets
- - number of deps are in the hundreds, ranging from a few hundred to ~800 depending on project
- - in some cases, can pull in a frozen diff and not update
- - but not all
- - auditing of the compiler itself?
- - would prefer to have two implementations maybe
- ## "governance"
- - MS: do we know what's going into the compiler?
- - do we know what changes are going in?
- - FB: not been a big concern of ours
- - in some cases, had issues where things got stabilized or bug fixes that broke code
- - would like to be canarying the nightly compiler regularly
- - but having more impl's would increase confidence
- - ways to support?
- - contracting
- - full time hires
- - how can we give $$ to rust org?
- - need a foundation
- - money/resources for Rust CI
- - participating in crater?
- - working on a way to run crater and send back pass/fail
- - ecosystem support
- - filling gaps in ecosystem
- - supporting key crates
- - helping to file GSoc proposals?
- ## will we do this again? how to continue these conversations?
- - don't need super frequent updates
- - most helpful thing is to identify topics and spin off topics
- - try to provide feedback for roadmap
- - organize a regular meeting on zulip to talk about issues
- - quarterly maybe
- - we might want to consider f2f meetings in other conferences or at least in europe
- - maybe rustfest
- - key point:
- - don't want to alienate and separate enterprise from the Rust community at large
- - focusing on working groups and zulip for communication is a win
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement