Bun Was Rewritten in Rust by AI in Six Days. The Lesson Isn't Rust.

A single tweet from Jarred Sumner climbed the Hacker News front page this week with a number that made everyone stop scrolling: Bun's experimental Rust rewrite hits 99.8% test compatibility on Linux x64 glibc. Bun — the JavaScript runtime and toolchain that has spent years written in Zig — now has a parallel implementation in Rust. According to reporting from The Register, the merge added more than a million lines of code across thousands of commits, shrank the binary by several megabytes, fixed a handful of memory leaks along the way, and was generated in large part by Claude AI agents working on a cluster of high-memory machines over roughly six days.

Let that sink in. A million lines of systems code, translated from one memory-management model to another, landing at 99.8% test parity, in under a week, mostly by machines.

The internet did what the internet does and turned this into a fight about Rust versus Zig, about whether AI "really" wrote it, about whether 99.8% is impressive or terrifying. I want to skip that argument, because as a .NET freelancer who does not write a single line of Zig or Rust in my day job, I think the most important takeaway has nothing to do with the languages involved. It has to do with why this was even possible — and that reason applies directly to the C# code you and I ship.

What actually happened

Let me stick to what is well attributed. Bun's runtime was historically written in Zig. Sumner stated that version 1.3.14 would be "the last version in Zig," and a very large branch was merged that reimplements the core in Rust. The stated motivation, per the reporting, is memory safety: after years of chasing use-after-free, double-free and leak bugs in a language without a borrow checker, the team wanted compiler-enforced guarantees. Rust also has a far larger ecosystem and contributor pool than Zig, which matters for a project that wants outside help.

The eye-catching part is the method. Rather than a human team grinding through a million lines over a year, AI agents performed the bulk of the mechanical translation, validated continuously against Bun's existing test suite. The Rust build is canary-only for now while optimization and cleanup land. The 99.8% figure is specifically Linux x64 glibc; other platforms trail.

I am deliberately not claiming the AI did this unsupervised or that it is production-ready. Canary means canary. But the engineering shape of the thing is real and worth studying.

The headline is the AI. The lesson is the test suite.

Here is the part that should change how you work. A large-scale automated rewrite is only verifiable to the degree that you have an executable definition of "correct." Bun could attempt this — and measure it down to 0.2% — because it has a comprehensive test suite that doubles as a specification. The agents weren't trusted; they were checked, mechanically, thousands of times. The test suite was the contract. The AI was just a very fast contractor.

Flip that around. If your codebase cannot answer the question "is the new implementation behaviourally identical to the old one?" with a button press, then no amount of AI capability will make a rewrite — or even an aggressive refactor — safe. The bottleneck was never typing speed. It was always verification.

This is the lens I now apply to every "should we let an agent refactor this?" conversation. The honest first question is: what would prove it still works? If the answer is "we'd click around and hope," you don't have an AI problem, you have a testing problem, and you had it long before AI showed up.

Differential testing: the technique that makes rewrites measurable

The specific tool Bun leaned on is differential (or "characterization") testing: run the old and new implementations against the same inputs and assert they produce identical outputs. You don't need to understand why the code is correct; you only need the two versions to agree across a huge sample of cases.

This is trivial to set up in .NET, and it is the single most valuable thing you can do before letting anything — human or model — rewrite a critical component. Suppose you have a legacy pricing engine and a new one:

public interface IPricingEngine
{
    decimal Calculate(Order order);
}

public sealed record Order(
    IReadOnlyList<OrderLine> Lines,
    string CustomerTier,
    string? CouponCode);

public sealed record OrderLine(string Sku, int Quantity, decimal UnitPrice);

The differential test doesn't assert specific prices. It asserts that LegacyPricingEngine and RustyNewPricingEngine — or in our world, RewrittenPricingEngine — never disagree:

public class PricingParityTests
{
    private readonly IPricingEngine _old = new LegacyPricingEngine();
    private readonly IPricingEngine _new = new RewrittenPricingEngine();

    [Theory]
    [MemberData(nameof(GeneratedOrders))]
    public void Implementations_Agree(Order order)
    {
        var expected = _old.Calculate(order);
        var actual = _new.Calculate(order);

        Assert.Equal(expected, actual);
    }

    public static IEnumerable<object[]> GeneratedOrders()
    {
        var rng = new Random(20260629); // fixed seed = reproducible
        var skus = new[] { "A1", "B2", "C3", "D4" };
        var tiers = new[] { "standard", "gold", "platinum" };
        var coupons = new string?[] { null, "SAVE10", "FREESHIP", "INVALID" };

        for (int i = 0; i < 5_000; i++)
        {
            var lines = Enumerable.Range(0, rng.Next(1, 6))
                .Select(_ => new OrderLine(
                    skus[rng.Next(skus.Length)],
                    rng.Next(1, 20),
                    Math.Round((decimal)(rng.NextDouble() * 100), 2)))
                .ToList();

            yield return new object[]
            {
                new Order(lines, tiers[rng.Next(tiers.Length)], coupons[rng.Next(coupons.Length)])
            };
        }
    }
}

Five thousand pseudo-random orders, a fixed seed for reproducibility, and a single equality assertion. When this is green, you have evidence the rewrite preserves behaviour. The moment it goes red, the failing Order is your bug report — fully reproducible, no narrative required. This is precisely the feedback loop Bun's agents ran against, just at a vastly larger scale.

For property-based generation rather than hand-rolled loops, FsCheck (usable from C#) will shrink failing cases to the minimal reproduction automatically, which is gold when a generated diff between two implementations is large.

Memory safety is a feature you can buy without a rewrite

The other half of the Bun story is memory safety, and this is where I gently remind my fellow .NET developers that we already live on the comfortable side of this trade-off. The reason Bun is enduring a billion-line migration is that Zig, like C, does not stop you from using freed memory. The CLR has given us garbage collection, bounds checking, and type safety for over two decades. The class of bug that motivated this entire rewrite mostly does not exist in idiomatic C#.

Where it can creep back in is the unsafe and interop surface — Span<T> over raw pointers, stackalloc, Marshal, unsafe blocks. If you do reach for those for performance, that is exactly where a borrow-checker-shaped discipline pays off, and where I'd point an AI reviewer first. A representative example of the trade-off:

// Fast, but you now own the safety argument the compiler used to make for you.
public static unsafe int SumUnsafe(ReadOnlySpan<int> data)
{
    int total = 0;
    fixed (int* p = data)
    {
        for (int i = 0; i < data.Length; i++)
            total += p[i]; // no bounds check — a wrong length here is a memory bug
    }
    return total;
}

// The managed version the JIT already optimises well, and which cannot corrupt memory.
public static int SumSafe(ReadOnlySpan<int> data)
{
    int total = 0;
    foreach (var x in data)
        total += x;
    return total;
}

The lesson Bun learned the hard way — that unmanaged memory bugs cost years — is one .NET took off the table by design. Reserve unsafe for measured hot paths and keep it behind a parity test like the one above.

The trend behind the trend: the toolchain performance era

Bun is not an isolated event. It's the latest entry in a multi-year migration of developer tooling into systems languages: esbuild rewrote bundling in Go and made everything else look slow; Microsoft's own port of the TypeScript compiler to native code chased a roughly 10x speedup; Rust now underpins a generation of linters, formatters and package managers. The industry decided that the tools we run hundreds of times a day deserve native performance.

.NET's answer to this same pressure is Native AOT. If you ship CLI tools or cold-start-sensitive services, you can compile ahead of time to a single native binary with no runtime JIT and dramatically faster startup:

<PropertyGroup>
  <PublishAot>true</PublishAot>
  <InvariantGlobalization>true</InvariantGlobalization>
  <StripSymbols>true</StripSymbols>
</PropertyGroup>

dotnet publish -c Release -r linux-x64

The trade-offs are real and worth stating plainly: no runtime reflection-heavy scenarios without source generators, no dynamic assembly loading, larger build complexity, and you must test the published binary because trimming can remove code paths your unit tests exercised in the JIT world. But for a tool that has to start fast and ship as one file, AOT gets a .NET developer to the same destination Bun is chasing — without leaving the platform.

So should you let AI rewrite your code?

My honest position, sharpened by this story: the question is almost always premature. Before "can AI rewrite this," answer "can anything verify the rewrite." Bun's six-day miracle ran on top of years of accumulated tests. The AI compressed the labour; it did not remove the need for a specification. If you invest in differential and property-based tests for your critical components, you get two payoffs at once — safer ordinary refactors today, and the option to do a Bun-style automated migration tomorrow if you ever need it.

And there's a quieter point worth sitting with. The most impressive thing in that thread is not that a model wrote a million lines of Rust. It's that an organisation had a test suite good enough to trust the result to 0.2%. That suite is the real asset. The AI is rentable; the verification is not. Build the thing that can tell you the truth about your code, and the rest — language, tooling, even who or what writes it — becomes a choice rather than a leap of faith.

Takeaway: Treat Bun's rewrite as a testing story, not a Rust story. Add differential tests before any large refactor, keep unsafe behind parity checks, reach for Native AOT when startup and footprint matter, and remember that the test suite — not the agent — is what made a million-line rewrite measurable.

Sources: Jarred Sumner's announcement · Hacker News discussion · The Register coverage.

Bun Was Rewritten in Rust by AI in Six Days. The Lesson Isn't Rust.

Bun Was Rewritten in Rust by AI in Six Days. The Lesson Isn't Rust.

What actually happened

The headline is the AI. The lesson is the test suite.

Differential testing: the technique that makes rewrites measurable

Memory safety is a feature you can buy without a rewrite

The trend behind the trend: the toolchain performance era

So should you let AI rewrite your code?

Related Articles

Your AI Coding Agent Is Reading Your .env: The Case for a .codexignore

Context Rot: Why a Bigger Context Window Won't Save Your LLM Feature

Self-Hosting Your Git Forge: What Moving to Forgejo Means for Your .NET Pipelines

Want to stay updated?