Cloudflare error logs showing scanner probes

That’s my access logs. /option.php, /wp-content/plugins/..., /aws-secret.yaml, /storage/keys/stripe.json. Dozens of these per day. Every single day. Bots scanning the entire internet looking for leaked credentials or misconfigured servers.

Usually you return a 404 and move on. But I got to thinking… what if instead of slamming the door, I invited them in and served them a very, very slow cup of coffee?

What’s a tarpit

If you haven’t come across the term before, an HTTP tarpit accepts the connection, returns a 200 OK, and starts streaming a response. Except the response comes at about 13 bytes per second. The scanner’s HTTP client just sits there, connection open, waiting for the download to finish. It never finishes.

Here’s what makes this interesting on Cloudflare Workers specifically: you only pay for CPU time. Sleeping is free. An await new Promise(r => setTimeout(r, 1500)) consumes zero CPU. It just tells the runtime to wake you up later. So the scanner is burning a connection slot, a thread, and memory for however long it’s willing to wait, and I’m burning essentially nothing.

Four Web APIs

The whole thing is built from four standard Web APIs:

  • ReadableStream - the response body. Its start(controller) callback can be async, which is basically the entire trick
  • setTimeout / Promise - the sleep. Zero CPU between ticks
  • TextEncoder - strings to Uint8Array for controller.enqueue()
  • Response - pass the stream as the body, omit Content-Length, chunked transfer encoding. Client can never know when the body ends

That’s it. No special APIs, no WebSockets, no cron triggers.

The core loop

The tarpit is its own Cloudflare Worker, connected to my main site via a service binding. Stripped down, it looks like this:

const encoder = new TextEncoder();

export default {
  async fetch(request: Request): Promise<Response> {
    const body = new ReadableStream({
      async start(controller) {
        try {
          // generateChunks() is an infinite generator
          for (const chunk of generateChunks()) {
            const bytes = encoder.encode(chunk);
            for (let i = 0; i < bytes.length; i += rand(1, 32)) {
              const slice = bytes.slice(i, i + rand(1, 32));
              controller.enqueue(slice);
              // this is the whole trick. sleep 0.5-2s, costs nothing
              await new Promise(r => setTimeout(r, rand(500, 2000)));
            }
          }
        } catch {
          // client disconnected - enqueue() throws
        }
        try { controller.close(); } catch {}
      },
    });

    return new Response(body, {
      headers: { "Content-Type": "text/plain" },
      // no Content-Length = chunked transfer
    });
  },
};

So a scanner requests /.env and gets back what looks like a real .env file, just downloading very slowly. Bytes trickle in. DATABASE_URL=postgres://admin:hunter2@.... The client keeps waiting. The generator is while (true), so it never completes. They’re stuck until they give up or their timeout fires.

Making the bait good

A tarpit that sends random garbage is easy to spot. So I added format-aware generators. The worker checks what path was requested and picks the right format:

Request pathContent-TypeWhat they see
/.envtext/plainEnv vars with API keys
/dump.sqlapplication/sqlCREATE TABLE + INSERT statements
/config.ymltext/yamlService configs with credentials
/api/v1/usersapplication/jsonUser records with API keys
/wp-admintext/htmlDirectory listing with links to more tarpit paths

curling the tarpit

Honeytokens

Every generator is seeded with fake credentials that match the exact format of real ones. Not random strings, but structurally valid tokens that will trip regex-based secret scanners:

OPENAI_API_KEY=sk-proj-Hn4kT9xLm2pQwR8yJ...
ANTHROPIC_API_KEY=sk-ant-api03-Bx9mN4vK...
AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG...
STRIPE_SECRET_KEY=sk_live_51H7dK9mN2pL...
GITHUB_TOKEN=ghp_x8Kn4mR9pL2wQ5tJ...
SLACK_BOT_TOKEN=xoxb-123456789012-...
SUPABASE_SERVICE_ROLE_KEY=eyJhbGciOiJIUzI1NiI...
SENDGRID_API_KEY=SG.nOde15FAKEbut-REAL...

The SQL generator spits out INSERT INTO users (email, password_hash, api_key) statements. The YAML one does service configs with nested credentials. The HTML one embeds secrets in comments (<!-- DB_PASSWORD=... -->) and links to more tarpit paths, so crawlers that follow links just dig themselves deeper.

If their tooling has automated secret detection, every “finding” needs human review. You’re not just wasting their bandwidth, you’re wasting their analyst’s time (or LLM’s time).

Who gets tarpitted

In my main worker, before anything else:

import { shouldTarpit } from "./tarpit-paths";

export default {
  async fetch(request, env, ctx) {
    const url = new URL(request.url);

    if (
      shouldTarpit(url.pathname) &&
      !request.cf?.botManagement?.verifiedBot
    ) {
      return env.TARPIT.fetch(request);
    }

    // ... normal site handling
  },
};

shouldTarpit() is a regex list. Dotfiles, WordPress paths, PHP probes, config files, debug endpoints, path traversal attempts. Basically everything in the screenshot at the top.

The important bit is !request.cf?.botManagement?.verifiedBot. I obviously don’t want to tarpit Googlebot. Cloudflare’s bot management does the verification server-side using IP/ASN checks, not User-Agent strings. So a scanner spoofing a Googlebot UA still gets caught, because Cloudflare knows it’s not actually coming from Google’s infrastructure.

The architecture

The tarpit is a separate Worker connected via a service binding:

// wrangler.jsonc
{
  "services": [
    { "binding": "TARPIT", "service": "tarpit" }
  ]
}

env.TARPIT.fetch(request) is an in-process call, no network hop. But the tarpit deploys independently with its own config, so if I somehow break it, the main site doesn’t care.

The numbers

With the defaults (1-32 bytes per chunk, 0.5-2 second delays):

  • Average throughput: ~13 bytes/second
  • A 2KB “file” takes ~2.5 minutes
  • The stream is infinite so they stay until their timeout fires

A scanner with a 5 minute timeout burns one connection slot for 5 full minutes, receiving maybe 4KB of very convincing fake credentials.

My cost is a few milliseconds of CPU across a couple hundred setTimeout calls. Cloudflare Workers bills for CPU time, not wall-clock time. The tarpit spends 99.9% of its time asleep. The free tier gives you 10 million requests per month and each scanner probe is one long-lived request, not thousands of fast ones.

I don’t know how much of a dent this actually makes in the scanning ecosystem. Probably not much. But a 404 costs you a little and costs the scanner nothing, whereas a slow stream of fake credentials costs you nothing and costs them a thread, a socket, and analyst time reviewing bogus secrets. That tradeoff is good enough for me.