Skip to main content

What's Overfetching?

· 3 min read
Alex Weinle
GraphQL Developer and AWS Architect

At the root of why people recommend moving to GraphQL is a concept called 'overfetching', and on the surface, it sounds pretty simple. In many APIs when you inquire about an object, the API returns everything it knows about it, because it doesn't really know what you're interested in. GraphQL (like SQL) lets you tell the API specifically what you're after before it makes the inquiry and returns the data over the network and back to you.

It's like if I walk into a restaurant and tell them to bring everything on the menu to the table, and then I pick my starter, main course and desert from that. The rest is all wasted. Naturally, I don't (usually) do that.

But overfetching is worth keeping in mind when designing queries or how your API will populate requests. Consider the following query by an imaginary waiter's handheld application.

query foodPricesForOrder {
meal1: menuItem(id: "Jxkjahdkfh==") {
id
description
longdescription
calories
price
picture
}
}

Sometimes, behind the scenes, you fulfil a query by grabbing all the fields (say from a dynamo or RDS table) and returning just what the query asked for. Even though we didn't minimize computation, we reduced the data passed across the network.

What if we omitted the field longdescription from the query? Then I can save quite a bit of network traffic, even though if all the fields exist in the same table, the actual query won't be noticeably quicker.

It's worth remembering how things used to be in the REST API; we would get all the fields of menuItem back. There was no choice. No biggie, you might think, but I'm guessing you already see the elephant in the room: that field called picture. You may have guessed already that it could be the base64 version of an image of our prospective meal.

We don't want that on our handheld waiter application, and how wasteful will it be to fetch and encode the image if it's in a whole separate directory outside the data store? Developer sadness ensues, and in REST, we'd have to avoid the problem by making you do different calls for the image of your food. Worse yet, notice that the REST solution might slow things down by forcing the client to do separate calls to the API for every dish on the menu unless a kind developer has made a specific list function for us.

GraphQL lets us handle this sensibly. Both are smart enough to let me ask for just what I want.

If we attach a resolver to the picture field in AppSync (the Amazon Web Service GraphQL API host), it's only triggered when that field occurs in the query. This is true if you're using Apollo or any other GraphQL server framework. Critically, as long as we don't specifically include that field in the query, we don't pay the price for it. For those implementing their GraphQL server through other means, this is a critical feature of avoiding overfetching.

I'm sure you've got the idea now and can think about which fields you should retrieve for your whole type (like MenuItem) or add as a field-level resolver.

Let's never over-order again!