Schema-first design, API-first development

ยท 1715 words ยท 9 minute read

I spend a lot of time in AWS, utilizing the web console, awscli, terraform, and SDKs, particularly boto3 and aws-sdk-go-v2. What I’ve come to appreciate is that in MOST cases, anything I can see or do in the console, I can see or do via any implementation of the API.

Conversely, I’ve come to be very upset when schemas are not well documented or published as code, or when there are actions which aren’t able to be taken programmatically. Recently, that frustration has been aimed at AWS Config, where I’m unable to find the well-defined schemas for different event types, and the Configuration Item properties don’t exactly match what I actually get back from the API.

So as I’ve been building out more recent projects, I’m putting more thought and effort into emulating the good and doing better than the bad from what I’ve seen out of providers like AWS. Things like making best-effort attempts at using common action prefixes (batch, describe, delete, list, update), and having all service endpoints and objects well documented for both human eyes and machine parsers.

I’m also drawing inspiration from API specs I’ve encountered in the wild which have cause me great frustration in trying to understand, and my own experiences with OpenAPI specification.

I wrote a boilerplate repository that tries to implement my current line of reasoning for how to best enforce a consistent way of development. The repository itself is meant to compile and run as an example, and to be easy to copy and change for new projects to use as a starting point.

Note - the boilerplate repository I write about here is described as of d343d75. It may have changed since then.

Frustrations ๐Ÿ”—

  • API endpoints that don’t have documentation.
  • Schemas that don’t have documentation.
  • Documentation which is outdated and/or incomplete.
  • Documentation which doesn’t align with reality.
  • Schemas which are not included in documentation.
  • Interfaces which include data which cannot be queried programmatically. i.e. Viewing data in the web browser shows me information I can’t get from the API.

My attempt at a solution ๐Ÿ”—

I want to build a project which has an API, web UI, and cli tool.

Goals:

  • All schemas must be documented.
  • All endpoints must be documented.
  • Documentation must be both human readable and machine readable.
  • All data access must be via the same defined interface.
  • No schemas will be produced by the API which are not defined.
  • SDKs must be able to be generated.

The result is andrew-womeldorf/connect-boilerplate is what I’m experimenting with. It’s a bunch of boilerplate around serving a User CRUD type service. My idea is to be able to clone/fork/copy this as a starting point, and replace the User service with your own service.

Schema First ๐Ÿ”—

All methods and schemas are defined as protocol buffers definition files.

Before any storage is chosen, business logic is written, etc… the expectation is for the change to be writen as a protobuf definition.

See proto/user/v1/*.proto for the definition files of the fake User service.

  • user_service.proto defines the methods available on the RPC Service.
  • user.proto defines the core User object schema.
  • {create|delete|get|list|update}_user.proto defines the Request/Response schemas for each method from the user_service.proto.

This roughly follows the 1-1-1 best practice, which I find keeps the files well organized and small enough to be human readable. Breaking from the 1-1-1 best practice, is that I keep the Request and Response types for a single method in a single file.


I don’t hold to schema first so tightly that, once defined, the interface can’t change. Often, once I start implementing the defined interface, I start to find flaws in my original interface, and will modify the interface as I go.

The key takeaway is that the protocol buffer interface is the definition, and everything publicly exposed by the project is defined as a protocol buffer definition first.

API First ๐Ÿ”—

I think my implementation of this has the right drive behind it, but still has some work to do on getting it properly organized.

The boilerplate project currently has three interfaces to the data:

  1. Connect API (internal/server/user_connect_handler.go)
  2. Web UI (internal/web)
  3. CLI (cmd/cli/user)

All three of these interfaces connect back to a single service implementation (internal/services/user). The service implementation builds around the generated types and methods from the protocol buffer definitions (gen/user/v1).

The Connect API, specifically, is totally generated. It will always be the most complete implementation of the interface, because it always expects the service implementation to have handled the entire defined API. In other words, it’s possible for the Web UI or CLI interface to be missing form fields or flags to handle some documented part of the interface, because those require handwriting the callers to the service implementation. But the Connect API will have the full spec implemented.

Therefore, if you see data coming out of the Web UI or CLI, it’s guaranteed that you can reproduce that data via the API, because the API will always have the full service implementation available to it.


The consequence of this is that business logic should only be defined once - in the service implementation. The interface layers (Connect API, Web UI, CLI) should be thin wrappers around calling the service implementation.

Sanitation, validation, transformation, data access, etc… should all be handled by the service layer.

Miscellaneous ๐Ÿ”—

Data Stores ๐Ÿ”—

The data storage layer should be tied to the service being deployed. So, for the User service, the data storage is defined in internal/services/user/store/.

My (unverified) intent is to define this boilerplate project as a monorepo for a monolithic deployment, but organized such that individual services could be deployed individually as microservices. I haven’t tried this repository architecture as a way to do that, but I think this would make it relatively simple to deploy each service on its own, or to break a single service out of the monorepo with relative ease. Therefore, the data storage should be defined per-service. My hope is that this would also force potential “best practices” around the data schema and separation of concerns.

Since this is boilerplate, I added two data stores for the User service - SQLite and DynamoDB. The sqlite store is driven by the sqlc project, so could very easily be changed to another supported relational databases like MySQL or Postgres. I included the DynamoDB store there because I wanted to demonstrate launching a testcontainer for mocking DynamoDB locally for testing purposes, rather than handwriting stubs/mocks/whateveryoucallthem.

The other reason for including more than one data store is to set the foundation for defining the data store as its own interface, and testing implementations against the interface. Part of the intent is to be a forcing factor for being able to make a future decision to change database types under the hood, and have all tests written against all possible databases.

The storage interface doesn’t get defined as protocol buffers, because it’s internal to the service implementation, and not exposed publicy; consumers of the API don’t (shouldn’t) care about how the service is implemented, only that the data they send/receive matches the documented interface.

Deployment Architectures ๐Ÿ”—

Depending on the project, I sometimes want a long-running server, and sometimes want to use AWS Lambda. So I’ve added build targets to handle both deployment types.

Similar to other decisions made, both the long-running server and the Lambda utilize the same implementation under the hood, but the startup process is different. In other words, the server code is implemented one time (internal/server/server.go), and then there are two different entrypoints to bootstrap a runtime based on the deployment required.

  • cmd/lambda/ for a lambda entrypoint.
  • cmd/cli/serve.go for a long-running server.

Each of these does their own setup/configuration of the server, so that could potentially be done differently. Or not.

Middlewares and Interceptors ๐Ÿ”—

Uh… This is a likely lack of experience speaking.

Connect can handle gRPC traffic and HTTP traffic. Its traffic is still handled by the h2c handler, just like other web traffic. Connect has a way of creating gRPC-style interceptors. But… if I add an HTTP middleware like I would for the Web UI, it also gets applied to the gRPC/Connect traffic…

So I’ve so far opted to forget about understanding interceptors, and keep all middleware as HTTP handlers. Time will tell if that decision will persist.

Thoughts ๐Ÿ”—

DevEx ๐Ÿ”—

Development feels somewhat tedious still. As an example, the process of adding a filter to the ListUsers method would include:

  1. Update the Request message in proto/users/v1/list_users.proto.
  2. Parse the filter in internal/services/user/op_ListUsers.go.
  3. Update the store interface, by either adding new methods or modifying the existing ListUsers method in internal/services/user/store/store.go and each store implementation.
  4. [Recommended] add the filter as a field in a web form in the web ui.
  5. [Recommended] add the filter as a CLI flag in cmd/cli/usr/op_ListUsers.go.

However, the interface defintion is a first-class citizen. Review of the interface is explicit, and machine readable, which enables tools such as the Buf CLI linter to detect breaking changes.

I’ve tended to write projects more like FastAPI, where the code is written first, and the documentation generated based on the code. The interface definition is an inference, and not treated as a first-class citizen in the same way as protocol buffers are. The documentation, in the case of something like FastAPI, will always align with the code. However, the guarantees to the consumer are easier to break by accident.

It’s not like this is the only way to build a project “API first”, either. But the goal of this architecture was to make the data-path very clearly singular. There should be no reason, for example, for the web ui code to be reaching into the user service code directly. The web ui code should always and only be accessing data from the user service via the same defined interface implementation which the CLI and the Connect API are using.

Codegen ๐Ÿ”—

I have two big frustrations with non-protobuf tooling:

  1. Every single codegen tool I’ve used for OpenAPI specs is terrible.
  2. OpenAPI specs are too easy to do incorrectly / not to spec.

Codegen tools for APIs defined by protocol buffers, be it gRPC or Connect, on the other hand, have always worked for me. Buf, in particular, has significant tooling built around this in their schema registry to make it as easy as possible to generate server stubs and client SDKs from protocol buffer definitions.