Concepts

Sentinel is easier to understand once you separate it into a few core concepts. These concepts explain how model access flows through Sentinel, where control lives, and why provider behavior is not treated as interchangeable.

Data plane

The data plane is the runtime path your applications use when they send inference requests through Sentinel.

It is responsible for:

authenticating the Sentinel key
checking endpoint restrictions
evaluating policy and limits
resolving the configured route target
forwarding the request to the selected provider lane
emitting request telemetry and audit signals

In simple terms, the data plane is where request-time decisions happen.

Control plane

The control plane is the configuration layer that shapes how the data plane behaves.

It manages:

tenants, projects, and environments
keys and endpoint restrictions
provider configs and provider secrets
routing policy
policy rules
budgets and rate limits
model catalogs and operational settings

In simple terms, the control plane defines the rules, and the data plane applies them.

Lanes

Sentinel supports multiple gateway lanes because not all providers behave the same way, and not all clients expect the same interface shape.

Sentinel currently includes:

an OpenAI-compatible lane for broad client compatibility
a native Anthropic lane for Anthropic SDK and route parity
a native Google lane for Google GenAI SDK and native route parity

Lanes matter because request shapes, provider capabilities, retry behavior, safety models, and SDK expectations differ across providers. A provider may expose a similar capability over HTTP without matching the exact behavior expected by its native SDK.

Policies

Policies are pre-execution controls that inspect supported request content and decide whether a request should proceed, be blocked, or be annotated before it reaches the provider.

Sentinel does not treat every route the same way:

text generation routes support meaningful pre-execution inspection
file and binary workflows expose narrower inspection surfaces
moderation-style routes can require special handling because the endpoint itself is part of the control pattern

The key idea is that policy evaluation depends on the request shape and the route being used.

Limits and budgets

Sentinel separates short-window traffic controls from spend controls because they solve different problems.

limits constrain request rate, concurrency, or token throughput over a short period
budgets constrain accumulated usage or spend over time

This distinction matters operationally. A rate-limit event is usually about protecting capacity or controlling burst behavior. A budget event is about controlling longer-running usage and spend.

Telemetry versus audit

Telemetry and audit are related, but they serve different purposes.

telemetry supports operational visibility, monitoring, debugging, and trend analysis
audit provides durable evidence of what happened, what decision was made, and why

Telemetry helps teams operate the system. Audit helps teams explain and prove what occurred.

Provider compatibility

Route coverage and SDK compatibility are not the same thing.

A provider lane may support a valid HTTP path without fully reproducing the exact behavior expected by that provider's official SDK. That difference can show up in request shape, error behavior, streaming behavior, or endpoint-specific semantics.

When that distinction matters, Sentinel docs should describe compatibility at the SDK surface level, not just at the route level.

Data plane​

Control plane​

Lanes​

Policies​

Limits and budgets​

Telemetry versus audit​

Provider compatibility​