As I have always understood it, AMD built their CPUs by reverse engineering Intel's instruction set and now pay Intel to use their instruction set, and Intel do the same for AMDs 64-bit instructions.
This is how windows can be installed on both types of CPUs without needing to purchase a specific build, such as a version compiled for ARM, and so all apps, games etc work in the same way, working interchangeable on CPUs...
However lately some things have been making me question some of this...
Firstly, I've noticed some games have been a bit laggy on my system (AMD) and after reading it turns out the game is optimised for Intel CPUs...
Also, OSX is sold on Intel CPUs but after discovering the hackintosh community it turns out it is possible but very hard to get OSX to run on AMD. This is because again OSX is designed for Intel...
After these things..
What does it mean to be optimised for Intel or AMD? How can it be possible to be different / optimised for one but not the other, if they are meant to be slot in replacements for each other? I.e both support same instructions etc.
They implement the same ISA, but with different performance characteristics because the microarchitecture is different.
e.g. see Agner Fog's microarch pdf for details, and other links from the x86 tag wiki. e.g. David Kanter's Haswell microarchitecture writeup vs. his writeup of AMD Bulldozer.
Agner Fog's instruction tables also show you exactly how fast each instruction is on each CPU. e.g. imul r64, r64/m32, imm32 is 6 cycle latency / one per 4c throughput on AMD Bulldozer-family. On Intel SnB-family, it's 3c latency with one per 1c throughput.
So when tuning for AMD, it would be worth replacing a 64bit multiply by a constant with a couple shifts / adds if possible. On Intel, it's maybe only worth it if you can get the job done in one or 2 shift / lea instructions.
AMD's designs also have a notably weaker cache hierarchy, and lower single-threaded throughput due to using pairs of cores that are permanently split instead of Intel's Hyperthreading dynamic sharing of resources between two hardware threads on the same core. IIRC, AMD is planning to change that for their next microarchitecture. Some of this is stuff you can't really "optimize for", it's just AMD being slower. :(
So they run the same code, because that's what it means to be the same architecture.
Some CPUs support ISA extensions (new instructions) that the other doesn't. e.g. XOP is AMD-only, while AVX2 and BMI2 are (so far) Intel-only, so code that wants to use more than a common baseline has to check for support at runtime.
Wikipedia's AMD Excavator article is not very up to date. Hardware has been out for a while now, but the article still says it's "expected to have" AVX2 and BMI2. Agner Fog hasn't tested it and updated his instruction tables yet, either.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With