Untitled

If you want a useful answer you'll need to post the code in question.  `__builtin_expect` can alter lots of different optimizations with different trade-offs…

Your assumption about compiler writers optimizing for their own architecture is probably invalid.  You can control exactly which architecture the code is tuned for (see the `-mtune` option).  There may still be a bit of bias in instruction selection, but for the most part the instructions are chosen automatically.

It also doesn't help that until recently (GCC 9, IIRC) there was no set probability for what `__builtin_expect` meant.  Sometimes you would see a slowdown if it failed more than around 1% of the time, other times it's more like 10%.  GCC recently added a `__builtin_expect_with_probability` and defined the probability for `__builtin_expect` to be 90%, I'd suggest taking a look at using that.  Unfortunately clang hasn't (yet?) picked it up, but in the meantime you can use a macro like [`HEDLEY_PREDICT`](https://nemequ.github.io/hedley/api-reference.html#HEDLEY_PREDICT), which has a few possible definitions depending on the availability of `__builtin_expect_with_probability` and `__builtin_expect`:

```c
#  define HEDLEY_PREDICT(expr, value, probability) __builtin_expect_with_probability(expr, value, probability)
#  define HEDLEY_PREDICT(expr, expected, probability) \
  (((probability) >= 0.9) ? __builtin_expect(!!(expr), (expected)) : (((void) (expected)), !!(expr)))
#  define HEDLEY_PREDICT(expr, expected, probability) (((void) (expected)), !!(expr))
```

I'd strongly suggest posting the specific code if you want a more concrete answer about what's happening.