Deploying AI Features Responsibly Starts Before the AI Part

Lab829 has planned, implemented, and deployed working AI products across Supabase and Google Cloud. That experience reinforced a practical point: the infrastructure decisions around an AI feature belong at the beginning of the project, not after the first incident.

Agent Live was one of those products. We designed the application, implemented the conversational and retrieval workflows, deployed it, and tracked usage and application behavior through logs. It was production-ready for its intended scope and operated successfully. Later hardening work did not change that result. It gave us a clearer view of which controls should be part of initial planning rather than deferred refinements.

This post is about those infrastructure decisions. It is not a claim that every AI product needs the same architecture.

Scope the deployment before you scope the AI

AI deployment guidance often gravitates toward two ends of the spectrum: a small demonstration or a large platform with gateways, semantic caching, and multi-provider failover. Many useful products sit between them.

The right starting point is not a generic production checklist. It is the product's expected traffic, data sensitivity, latency needs, provider dependencies, operating budget, and response when part of the system fails.

For Agent Live, Supabase Edge Functions provided globally distributed TypeScript functions without requiring us to operate a dedicated application server. Supabase charges for invocations above the quota included with a plan, so this is usage-based infrastructure rather than literally free when idle. The local runtime is designed to resemble production, but production behavior still needs direct verification.

Cold starts, memory, CPU, wall-clock duration, and request idle time are documented platform constraints. Supabase also supports WebSocket servers in Edge Functions, although connections remain subject to runtime limits and require careful handling. The lesson is to evaluate the current platform documentation against the product shape instead of relying on a simplified idea of what serverless infrastructure can or cannot do.

Rate limiting is a day-one decision, not a day-two hardening

The failure mode is rarely ignorance of rate limiting. It is postponing the decision because early traffic is small. An endpoint that calls a paid model API creates direct cost exposure and can also exhaust provider quotas.

Each endpoint has a different risk profile. A function that invokes a model should not necessarily share the same allowance as a cache read, an analytics event, or a health check. Per-route limits can reflect those differences.

Our later Agent Live hardening work introduced per-route burst and sustained controls. The current implementation is process-local, which means it is useful as a guardrail but is not a shared distributed limit across every Edge Function worker. A production design that requires consistent enforcement should use a shared store or gateway-backed control. Supabase publishes a Redis-backed rate-limiting example for this pattern.

An AI gateway can become useful when a product needs shared budgets, token-aware limits, provider routing, or coordinated fallbacks. LiteLLM documents request and token limits, budgets, and parallel-request controls, while Bifrost documents routing, governance, rate limiting, and budget management. Scale is one factor, but operational requirements should determine whether the additional component is justified.

Default CORS configuration is built for development

Cross-Origin Resource Sharing controls which browser origins may read responses from a server. A wildcard can be convenient during local development, but production browser access should normally be restricted to known origins.

CORS is not authentication and does not prevent a non-browser client from calling an endpoint. Paid model access still needs server-side authorization, abuse controls, and secret management. Agent Live's later hardening work added configurable origin handling that rejects an accidental production wildcard unless it is explicitly permitted. That is a safer default, but deployed headers and environment settings still need verification.

Embedding rules are a separate browser concern. The Content Security Policy frame-ancestors directive defines which parent pages may embed a document. It can reduce clickjacking and unauthorized embedding, but the allowlist should follow the product's current domains. Retired or unrelated domains should not remain in the policy.

AI feature analytics require test isolation by design

Conversational products may log interaction data for quality evaluation, cost tracking, operational diagnosis, and behavior analysis. End-to-end tests can exercise the same paths. Without isolation, test traffic can become indistinguishable from real usage.

Agent Live added an is_e2e_test field and indexes intended to support filtering and cleanup. The repository also shows why a schema field alone is not enough: the Playwright configuration notes that automatic marking is currently disabled, and user-agent heuristics are not a reliable identity mechanism.

A stronger design gives test runs an explicit authenticated marker or separate credential, records a test run identifier at ingest, and excludes marked rows from operational analytics by default. The practical lesson was not that test isolation had been solved completely. It was that analytics provenance needs an end-to-end design, from the test runner through ingestion and reporting.

System prompt versioning is a deployment concern, not a configuration detail

The behavior of a conversational AI feature is influenced by the selected model, available context, tools, data, instructions, provider behavior, and sampling configuration. Prompts matter, but they do not make model output deterministic.

Agent Live stored system prompts as database configuration. That made product-level changes possible without rebuilding the application, but it did not automatically provide the same review history and rollback clarity as source-controlled code.

The lesson is to give prompt changes an explicit lifecycle. That might mean versioned prompt records in the database, migrations, an approved configuration repository, or a prompt-management system. The mechanism can vary. What matters is being able to identify which prompt version produced a response, review changes, test them, and restore a previous version when necessary.

The model call is only one part of the product

A model integration can be the visible center of an AI feature, but it is only one part of the product. Provider failures, retrieval quality, prompt injection, model changes, browser security, traffic control, analytics provenance, and operational visibility all affect whether the feature remains useful.

Agent Live demonstrated that the Lab can take a focused AI product through planning, implementation, deployment, and operation. The later hardening work made the next set of engineering priorities clearer and improved how we would plan a similar product today.

For teams planning a focused AI feature, the useful starting point is a deployment review that treats the model call as one dependency among several. Define traffic boundaries, data handling, failure behavior, observability, test isolation, and change control before implementation choices make them expensive to revisit. See the broader applied AI systems overview, explore AI and machine learning integrations, or start a conversation with Lab829.

Deploying AI Features Responsibly Starts Before the AI Part

Scope the deployment before you scope the AI

Rate limiting is a day-one decision, not a day-two hardening

Default CORS configuration is built for development

AI feature analytics require test isolation by design

System prompt versioning is a deployment concern, not a configuration detail

The model call is only one part of the product

AI Systems and Agentic Architecture: Applied Technical Practice

Building New Enterprise Functions Requires Zero-to-One Operators

Let's Connect

Planning an AI feature, platform modernization, or delivery reset?

Let's Talk.

Lab829