Background

So, you spend all day making “performance improvements” to make your code go faster

But are you really making a difference? You might ‘feel like it’ but how can you really tell? Ultimately systematically measuring the difference using a benchmark is needed [3]. When you are doing optimization work, it is a level of proof that is nearly as important as proving correctness of your code with tests.

And, just like tests, you can have different levels of benchmarks

  • unit benchmarks
  • end-to-end benchmarks

Let’s evaluate both of these scenarios

Part 1. Creating ‘unit benchmarks’ using vitest bench

If you are already using vitest (it is a very popular test library), you might be happy to know that there is a subtool called vitest bench that is built-in.

Simply make a file like file.bench.ts and it will be invoked by vitest bench. API ref https://vitest.dev/api/#bench

But how do you create a vitest bench setup to compare two branches?

A simple bash script to help

We can create a script to run the build process on two branches, defaulting to current branch and main branch. I used bash, and it assumes that yarn build outputs to a folder called “dist” [1] [2].

Then it renames the built output to dist_branch1 and dist_branch2 for each branch. Then we can add these folders to our .gitignore easily.

It also puts the actual branchname in a .txt file in each folder, allowing the benchmark itself to report the branchname in the report output.

#!/bin/bash

set -e

CURRENT_BRANCH=$(git branch --show-current)
BRANCH1="${1:-master}"
BRANCH2="${2:-$CURRENT_BRANCH}"

if ! git diff --quiet || ! git diff --cached --quiet; then
  echo "Error: Uncommitted changes detected. Please commit or stash your changes first."
  exit 1
fi

rm -rf esm_branch1 esm_branch2

echo "Building $BRANCH1 branch..."
git checkout "$BRANCH1"
yarn
yarn build
mv esm esm_branch1
echo "$BRANCH1" >esm_branch1/branchname.txt

echo "Building $BRANCH2 branch..."
git checkout "$BRANCH2"
yarn
yarn build
mv esm esm_branch2
echo "$BRANCH2" >esm_branch2/branchname.txt

echo "Build complete!"
echo "$BRANCH1 build: esm_branch1/index.js"
echo "$BRANCH2 build: esm_branch2/index.js"

Then in your package.json you can have

{
  "name": "yourpackage",
  "version": "0.0.0",
  "scripts": {
    "build": "yourbuild",
    "prebench": "./scripts/build-both-branches.sh $BRANCH1 $BRANCH2"
    "bench": "vitest bench"
  }
}

Finally, you can run your benchmark like this:

# default: compare current branch against main
yarn bench

# or, set custom env variables to compare two arbitrary branches, branch1 and branch2
BRANCH1=branch1 BRANCH2=branch2 yarn bench

Then if you have a function in your code that you want to optimize, like this…

// src/index.ts
export function pow(n: number, exp: number) {
  return Math.pow(n, exp)
}

Then you can make a new branch with a genius idea that plain multiplying in a loop would be better

// src/index.ts
export function pow(n: number, exp: number) {
  let total = n
  for (let i = 1; i < exp; i++) {
    total *= n
  }
}

Then you can make a benchmark like this

import { readFileSync } from 'fs'
import { bench, describe } from 'vitest'

import { pow as pow1 } from '../dist_branch1/index.js'
import { pow as pow2 } from '../dist_branch2/index.js'

const branch1Name = readFileSync('dist_branch1/branchname.txt', 'utf8').trim()
const branch2Name = readFileSync('dist_branch2/branchname.txt', 'utf8').trim()

function benchPow({
  n,
  exp,
  name,
  opts,
}: {
  n: number
  exp: number
  name: string
  opts: {
    iterations?: number
    warmupIterations?: number
  }
}) {
  describe(name, () => {
    bench(
      branch1Name,
      () => {
        pow1(n, exp)
      },
      opts,
    )

    bench(
      branch2Name,
      () => {
        pow2(n, exp)
      },
      opts,
    )
  })
}

benchPow({
  name: 'pow',
  n: 2,
  exp: 10,
  opts: {
    warmupIterations: 100,
    iterations: 1000,
  },
})

That benchmark code is a little more verbose than it needs to be, but it is quite re-usable across projects

The resulting benchmark report clearly prints the branchname that is the fastest with some nice statistics

An example of this is here https://github.com/cmdcolin/simple_benchmark_example

Part 2. Creating ‘end-to-end’ benchmarks using Puppeteer

Creating end-to-end benchmarks are really IMO where the rubber hits the road. You have spent all day making microoptimizations, now it’s time to confirm it makes an impact.

With puppeteer, you can test against live real builds of your webapp. I recommend using production builds (not a dev server) and using localhost only stuff to avoid network variability. Note that I also said ‘simple’ but this setup is a little more involved generally

Here is an example setup I have used:

  • You create multiple builds of your (web-) app
  • Store each build in a separate sub-directory in the builds/ folder
  • Create this bash script, which runs hyperfine to measure the total time taken by the puppeteer script
#!/bin/bash


BASE_PORT=8000

rm -rf results
mkdir -p results
mkdir -p screenshots

## kill background scripts after finished
## https://spin.atomicobject.com/2017/08/24/start-stop-bash-background-process/
trap "exit" INT TERM
trap "kill 0" EXIT

X=$BASE_PORT
for i in builds/*; do
  npx http-server "$i" -p "$X" -s &
  echo "$X" "$i"
  X=$((X + 1))
done



declare -a commands=()
declare -a names=()
X=$BASE_PORT
for i in builds/*; do
  build_name=$(basename "$i")
  screenshot_path="screenshots/$build_name"
  commands+=("node scripts/profile_app.ts \"http://localhost:$X/\" \"$screenshot_path\"")
  names+=("-n" "$build_name")
  X=$((X + 1))
done

echo "Running hyperfine with the following commands:"
for cmd in "${commands[@]}"; do
  echo "  - $cmd"
done

hyperfine -i --export-json "$output_json.json" --warmup 1 --runs 8 "${names[@]}" "${commands[@]}"
echo -e "\n"

Then you can have your puppeteer script

// profile_app.ts
import puppeteer from 'puppeteer'

const WAIT_TIMEOUT = 30_000 // 30 seconds

const url = process.argv[2]
const screenshotPath = process.argv[3]
const browser = await puppeteer.launch({
  args: ['--no-sandbox'], // needed on my linux setup, not ideal probably
})
const page = await browser.newPage()
await page.goto(url)

const params = new URL(url).searchParams
try {
  await page.waitForFunction(
    () =>
      document.querySelectorAll('[data-testid="thing_to_wait_for"]')
        .length === 1,
    {
      timeout: WAIT_TIMEOUT,
    },
  )
  // create screenshots to confirm visually
  await page.screenshot({
    path: screenshotPath + '.png',
  })


await browser.close()

This can be invoked directly as a .ts file with node file.ts (since node.js automatically strips types now)! Optionally you can make puppeteer do user actions like click around, etc. to test realistic scenarios. It is good to confirm that you are visually testing the right thing by checking the outputted screenshots.

Sidenote: Agentically optimizing your code

Formulating tests and benchmarks like this can allow AI to start automatically or agentically iterating to find faster solutions.

You can just ask Claude code to “find optimizations”, and see if it comes up with anything that actually works. It’s not always that good at finding very impactful optimizations, but with a human in the loop you can guide it towards some interesting solutions.

You can even tell Claude to analyze .cpuprofile files that are generated from node --cpu-prof script.ts. See footnote here https://github.com/cmdcolin/simple_benchmark_example?tab=readme-ov-file#analyze-cpuprofile

Happy thanksgiving

Wild turkeys can run up to 25 miles per hour

[1] It is probably not absolutely required to use the compiled artifacts to run the benchmarks. The benchmarks by default for example can just read from the ‘src’ folder. However, using the compiled artifacts is a fairly ‘simple’ way to avoid collisions otherwise encountered from checking out the code from each branch.

[2] You might get errors if different sets of e.g. package.json libraries are used on the branch and main. In that case, you can install the union of the libraries on your branch temporarily (should only be needed on your “BRANCH2”)

[3] I say this as someone that has superstitiously implemented hundreds of microptimizations for it to have absolutely zero effect in a end-to-end benchmark. Conversely, these branch comparison tests have allowed me to ratchet back-to-back 5-10% improvements to achieve significant gains