GraphQL authorization and the batching tax: where the schema lies to you

Alias batching: one HTTP request, many operations, one rate-limit hit

REST gives you one URL per thing, so authorization tends to sit on the route. A GraphQL endpoint is one URL for everything, and the request body decides which fields, objects, and operations you touch. That single shift relocates every authorization decision from the router down to the resolvers, and any resolver that forgot to check is a hole the gateway can’t see. Most GraphQL access-control bugs are this: the gateway authenticated you, so every resolver assumed someone else handled authorization.

What introspection gives away

The first move is always introspection. A standard query returns the entire schema: every type, every field, every query and mutation, including the ones the UI never calls.

graphql

query {
  __schema {
    types { name fields { name } }
    mutationType { fields { name args { name } } }
  }
}

That output is a map of the attack surface. The mutations the frontend never shows you (deleteUser, setRole, impersonate, internalRefund) are right there with their argument names. Disabling introspection in production is a mild speed bump, not a fix; the operations still exist, and field names are guessable or leak through error messages. Treat a disabled __schema as “they think this is the control” and keep testing the operations directly.

Field-level authz is where it actually leaks

The high-value bug is usually not a missing top-level check; it’s a single field on an object that returns data the object-level check should have gated. You can read a record you’re allowed to see, but one nested field on it exposes something you aren’t:

graphql

query {
  order(id: "1001") {
    id
    status
    customer {
      email
      paymentMethods { last4 billingAddress }
    }
  }
}

If the order resolver checks ownership but the nested customer and paymentMethods resolvers trust that you’d never have reached them without permission, you read another user’s payment data through an order you legitimately hold. Each resolver is its own trust boundary. The test is methodical: for every object you can reach, walk every nested field and check whether that field re-derives authorization or just rides on the parent’s.

Aliases and batching defeat rate limits

Rate limiting almost always counts requests. GraphQL lets one request carry many operations, so the counter sees one hit while the backend does a hundred units of work. Aliases are the trick: the same field, many times, under different names, in a single document.

graphql

mutation {
  a: login(user: "victim", pass: "0000") { token }
  b: login(user: "victim", pass: "0001") { token }
  c: login(user: "victim", pass: "0002") { token }
}

One HTTP request, three credential attempts, and the request-counting limiter records a single event. Scale that to hundreds of aliases and you’ve turned a brute-force-protected login into an unprotected one. Query batching (an array of operations in one POST) does the same at the transport layer. When I test any rate-limited GraphQL operation, the first thing I check is whether aliasing or batching slips past the counter, because it almost always does.

Nested-query DoS

A schema with cyclic relationships (a user has posts, a post has an author, an author has posts) lets you write a query whose cost explodes with depth:

graphql

query {
  user(id: 1) {
    posts { author { posts { author { posts { id } } } } }
  }
}

Each level multiplies the work. Without query-depth limits, query-cost analysis, or pagination caps, one modestly sized document can consume disproportionate backend resources. You don’t need to actually take the service down to report it; demonstrating that cost scales multiplicatively with a depth the server accepts is the finding.

The one structural fix

All of this traces to the same root: people put the security boundary at the gateway because that’s where it sat in REST, and GraphQL moved the meaningful decisions down to the resolvers. Object-level authorization has to run on the resolver that loads the object, every time, derived from the caller’s token, not inherited from whatever check happened to pass on the way in. Rate limiting has to count operations or query cost, not HTTP requests, or aliases erase it.

The transferable lesson: a GraphQL schema describes what’s possible, and the resolvers decide what’s allowed, and those two are only as aligned as the weakest resolver. Walk every field, assume each one is its own trust boundary, and the gaps surface fast.