x86 and the Weight of Compatibility
Imagine a city with a law that forbids tearing down any old building. The first downtown went up in 1978. The streets were narrow and the apartments only had two rooms each. New neighborhoods came every few years — a 1985 expansion that doubled every room, a 1996 multimedia district, a 1999 graphics quarter, a 2017 supercomputer suburb — and every single one had to be built on top of or around the old downtown. The 1978 buildings still stand. People still live in them. A pizza delivery driver who started in 1985 can walk into any of those apartments today and the door key still works. This city is called x86, and Intel and AMD have been adding floors to it for forty-eight years without ever knocking anything down.

The original downtown was the Intel 8086, finished in June 1978 by a team that included Stephen Morse and led by Jean Claude Cornet at Intel's Santa Clara headquarters. The 8086 had 81 instructions — small words the chip understood like MOV for "move bytes from here to there" and JMP for "jump to this address." None of this would have mattered if the chip had not won a contract that came along three years later. In 1981 IBM picked an Intel-derived 8088 for its first PC because Intel could ship volume parts and was willing to license the design. Once Lotus 1-2-3 and WordPerfect were compiled for the 8086 instruction set, that set became a contract Intel could never break. Millions of dollars of business software depended on it. Tear down the old downtown and every tenant in the city is homeless.
Intel learned the contract the hard way. In 1981 they tried to replace the 8086 with a totally new chip called the iAPX 432, designed without any 8086 instructions at all. The 432 was supposed to be the future. It flopped — it ran the old software either slowly through emulation or not at all, and the customers refused to move. Intel pulled it. The lesson stuck. Every chip since has been required to run every old 8086 program on bare metal, at full speed, byte for byte the same. The 80286 in 1982 added 16 new instructions for memory protection but kept the original 81. The 80386 in 1985 added 39 more for 32-bit addressing but kept all the rest. Each new release was a new neighborhood bolted onto the old city, never a replacement for it.

The neighborhoods piled up faster as the workloads got hungrier. MMX in 1996 added 57 instructions for moving short integers in parallel — Intel built it because Pentiums were getting beaten by dedicated audio and video chips, and the marketing department needed an answer for "why is my PC slow at video?" SSE in 1999 added 70 more for floating-point math, mostly because games like Quake and Unreal wanted to do 3D graphics on the CPU before consumer GPUs were a thing. Then SSE2 in 2001 came along when Intel realized SSE could replace the old x87 math unit entirely. The most surprising entry on the timeline came in 2003 — Advanced Micro Devices, the smaller competitor that had spent two decades as Intel's understudy, designed the 64-bit extension to x86 and called it AMD64. Intel had been pushing a totally separate 64-bit chip called Itanium that broke compatibility on purpose, betting the old city would not survive the move to 64-bit. AMD bet the opposite, kept the old downtown, and added new 64-bit registers on the side. Itanium failed within five years. Today every 64-bit Intel chip implements AMD's instructions, which is the only time in the company's history Intel had to copy AMD's homework.
Here is the receipt for forty-eight years of additions. The program below is a Rust table of every major x86 extension, the year it shipped, how many new instructions it carried, and the running total. Run it and watch the line at the bottom grow.
struct Extension {
year: u16,
name: &'static str,
added: u32,
reason: &'static str,
}
const EXTENSIONS: &[Extension] = &[
Extension { year: 1978, name: "8086 base", added: 81, reason: "the original 16-bit set" },
Extension { year: 1982, name: "80286 protected", added: 16, reason: "memory protection, rings" },
Extension { year: 1985, name: "80386 (i386)", added: 39, reason: "32-bit registers and addressing"},
Extension { year: 1989, name: "80486", added: 6, reason: "atomic CMPXCHG, BSWAP" },
Extension { year: 1993, name: "Pentium", added: 8, reason: "CPUID, RDTSC, CMOV" },
Extension { year: 1996, name: "MMX", added: 57, reason: "first SIMD on integers" },
Extension { year: 1999, name: "SSE", added: 70, reason: "SIMD on floats for 3D games" },
Extension { year: 2001, name: "SSE2", added: 144, reason: "wider SIMD, replaces x87 math" },
Extension { year: 2003, name: "x86-64 (AMD64)", added: 35, reason: "64-bit registers, AMD's gift" },
Extension { year: 2004, name: "SSE3", added: 13, reason: "horizontal adds for audio" },
Extension { year: 2006, name: "SSSE3", added: 16, reason: "shuffle and pack for codecs" },
Extension { year: 2007, name: "SSE4", added: 54, reason: "string and dot-product ops" },
Extension { year: 2011, name: "AVX", added: 149, reason: "256-bit vectors" },
Extension { year: 2013, name: "AVX2", added: 135, reason: "integer math on 256-bit lanes" },
Extension { year: 2017, name: "AVX-512", added: 716, reason: "512-bit vectors for HPC and AI" },
Extension { year: 2021, name: "AMX", added: 12, reason: "matrix tiles for AI inference" },
];fn main() {
println!("x86 instruction additions, 1978 to 2026");
println!(
"{:<5} {:<20} {:>6} {:>9} bars (one # per 10 added)",
"year", "extension", "added", "cum total",
);
println!("{}", "-".repeat(78));
let mut total: u32 = 0;
for ext in EXTENSIONS {
total += ext.added;
let bars: String = "#".repeat((ext.added / 10) as usize);
println!(
"{:<5} {:<20} {:>6} {:>9} {}",
ext.year, ext.name, ext.added, total, bars,
);
}
println!("{}", "-".repeat(78));
println!();
println!("notes:");
for ext in EXTENSIONS {
println!(" {} ({}) - {}", ext.year, ext.name, ext.reason);
}
println!();
println!("a 2026 Intel chip still decodes every one of these {} instructions.", total);
}Build it and read the bars sideways.
x86 instruction additions, 1978 to 2026
year extension added cum total bars (one # per 10 added)
------------------------------------------------------------------------------
1978 8086 base 81 81 ########
1982 80286 protected 16 97 #
1985 80386 (i386) 39 136 ###
1989 80486 6 142
1993 Pentium 8 150
1996 MMX 57 207 #####
1999 SSE 70 277 #######
2001 SSE2 144 421 ##############
2003 x86-64 (AMD64) 35 456 ###
2004 SSE3 13 469 #
2006 SSSE3 16 485 #
2007 SSE4 54 539 #####
2011 AVX 149 688 ##############
2013 AVX2 135 823 #############
2017 AVX-512 716 1539 #######################################################################
2021 AMX 12 1551 #
------------------------------------------------------------------------------
notes:
1978 (8086 base) - the original 16-bit set
1982 (80286 protected) - memory protection, rings
1985 (80386 (i386)) - 32-bit registers and addressing
1989 (80486) - atomic CMPXCHG, BSWAP
1993 (Pentium) - CPUID, RDTSC, CMOV
1996 (MMX) - first SIMD on integers
1999 (SSE) - SIMD on floats for 3D games
2001 (SSE2) - wider SIMD, replaces x87 math
2003 (x86-64 (AMD64)) - 64-bit registers, AMD's gift
2004 (SSE3) - horizontal adds for audio
2006 (SSSE3) - shuffle and pack for codecs
2007 (SSE4) - string and dot-product ops
2011 (AVX) - 256-bit vectors
2013 (AVX2) - integer math on 256-bit lanes
2017 (AVX-512) - 512-bit vectors for HPC and AI
2021 (AMX) - matrix tiles for AI inference
a 2026 Intel chip still decodes every one of these 1551 instructions.The bar for AVX-512 in 2017 jumps off the right side of the table — 716 new instructions in a single release, more than the entire chip had before MMX existed. That one row is a glimpse of what compatibility costs. A modern Intel chip is not 81 instructions wide anymore. It is over 1,500 instructions wide, and every one of them must be decoded correctly even when the program is from 1985. The silicon that decodes instructions on a 2024 Core processor — the part called the front end — takes up a noticeable slice of the chip's transistors and burns power every cycle, because it has to be ready for any word from any of those forty-eight years of neighborhoods.

There is a question hidden in the table. Look at 1989 — the 80486 added only 6 instructions, then the Pentium in 1993 added 8. The pace was slowing because the basic city was finished. From 1996 onward the additions are almost all SIMD instructions, which are wide parallel math operations for video, audio, and now machine learning. The shape of the additions tells you what the world was asking the chip to do. The 8086 was for spreadsheets. The 386 was for word processors. SSE was for games. AVX-512 is for training neural networks. Same downtown, new suburbs every time the world wanted a new thing fast.

The price of the no-demolition rule is dead weight. A chip designed clean in 2024 — like Apple's M-series or the ARM cores in your phone — only has to decode about 400 instructions total. A 2024 Intel chip has to decode 1,500-plus, including instructions nobody has written new code for in twenty years. The transistors that handle the rare old instructions sit there using power and warming up the package whether they get used or not. This is the weight that gives the lesson its name. The 1978 contract Intel signed with IBM was a promise to never make customers rewrite their software, and forty-eight years later that promise is still pressing down on every silicon design Intel ships.
The next lesson is about the company that took the opposite bet — that a small, clean instruction set with no historical baggage could beat the city of x86 in the long run.