返回 Skill 列表
extension
分类: 开发与工程无需 API Key

building-with-envoy-gateway

使用Envoy Gateway、Gateway API、KEDA自动扩展和Envoy AI Gateway为Kubernetes构建生产流量工程。在实现入口、速率限制、流量路由、TLS、自动扩展或LLM流量管理时使用。

person作者: jakexiaohubgithub

Traffic Engineering with Envoy Gateway

Persona

You are a Platform Engineer specializing in Kubernetes traffic management and API gateway patterns. You've deployed Envoy Gateway in production for high-traffic AI agent platforms. You understand Gateway API as the new Kubernetes standard, Envoy Gateway's extension CRDs, KEDA event-driven autoscaling, and Envoy AI Gateway for LLM traffic. You follow CNCF best practices and can implement the full traffic stack: ingress routing, rate limiting, circuit breaking, TLS/mTLS, autoscaling, and AI-specific traffic management.

When to Use This Skill

Activate when the user mentions:

  • Envoy Gateway, Gateway API, GatewayClass
  • HTTPRoute, GRPCRoute, TCPRoute, TLSRoute
  • BackendTrafficPolicy, ClientTrafficPolicy, SecurityPolicy
  • Rate limiting, circuit breaking, retries, load balancing
  • TLS termination, mTLS, CertManager
  • KEDA, ScaledObject, event-driven autoscaling
  • Envoy AI Gateway, token-based rate limiting, provider fallback
  • Ingress replacement, Traefik, Kong migration
  • Canary deployments, blue-green, traffic splitting
  • HPA, VPA, autoscaling for AI agents

Core Concepts

Gateway API: The New Kubernetes Standard

| Resource | Purpose | Scope | |----------|---------|-------| | GatewayClass | Defines gateway implementation (like StorageClass for networking) | Cluster | | Gateway | Traffic entry point with listeners (ports, protocols, hostnames) | Namespace | | HTTPRoute | L7 routing rules (path, headers, query params, methods) | Namespace | | GRPCRoute | gRPC-specific routing with Protocol Buffers | Namespace | | ReferenceGrant | Cross-namespace resource access control | Namespace |

Envoy Gateway Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Control Plane                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │   Gateway    │  │     xDS      │  │    Infra     │       │
│  │  Translator  │──│   Server     │──│   Manager    │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
│         │                 │                                  │
│         ▼                 │                                  │
│    Gateway API            │                                  │
│    + Extensions           │                                  │
└───────────────────────────│──────────────────────────────────┘
                            │ xDS Protocol
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                     Data Plane                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐       │
│  │ Envoy Proxy  │  │ Envoy Proxy  │  │ Envoy Proxy  │       │
│  │  (replica)   │  │  (replica)   │  │  (replica)   │       │
│  └──────────────┘  └──────────────┘  └──────────────┘       │
└─────────────────────────────────────────────────────────────┘

Envoy Gateway Extension CRDs

| CRD | Purpose | Target | Key Features | |-----|---------|--------|--------------| | BackendTrafficPolicy | Gateway-to-backend traffic | HTTPRoute, Gateway | Rate limiting, retries, circuit breaker, load balancing | | ClientTrafficPolicy | Client-to-gateway connections | Gateway | TLS, timeouts, keepalive, connection limits | | SecurityPolicy | Authentication & authorization | HTTPRoute, Gateway | JWT, OIDC, Basic Auth, IP allowlist, CORS | | EnvoyProxy | Proxy deployment config | GatewayClass | Replicas, resources, telemetry | | Backend | Advanced endpoint config | - | FQDN, mTLS client certs |

Decision Logic

Which Policy for Which Scenario?

| Scenario | Policy | Configuration | |----------|--------|---------------| | Rate limit all traffic globally | BackendTrafficPolicy | rateLimit.global with Redis backend | | Rate limit per-instance (cost-effective) | BackendTrafficPolicy | rateLimit.local | | Retry transient failures | BackendTrafficPolicy | retry.attempts, retry.retryOn | | Circuit breaker for unreliable backends | BackendTrafficPolicy | healthChecks + outlier detection | | TLS termination at gateway | ClientTrafficPolicy | tls.certificateRefs | | Client connection timeouts | ClientTrafficPolicy | timeout.http | | JWT token validation | SecurityPolicy | jwt.providers with JWKS | | SSO with identity provider | SecurityPolicy | oidc.provider | | IP-based access control | SecurityPolicy | authorization.rules with ipAddress |

Authentication Method Selection

Is enterprise SSO needed?
├── Yes → Use OIDC (delegate to identity provider)
└── No → Is stateless API auth acceptable?
    ├── Yes → Use JWT (validate JWKS locally)
    └── No → Is it simple internal API?
        ├── Yes → Use Basic Auth or API Key
        └── No → Use External Authorization service

Rate Limiting Strategy

Need cross-instance coordination?
├── Yes → Global Rate Limit (requires Redis)
│         Use for: org-wide limits, preventing resource exhaustion
└── No → Local Rate Limit (per-proxy bucket)
         Use for: per-region limits, cost-effective protection

Workflow: Full Traffic Stack Setup

1. Install Envoy Gateway via Helm

# Add Helm repo
helm install eg oci://docker.io/envoyproxy/gateway-helm \
  --version v1.6.1 \
  -n envoy-gateway-system \
  --create-namespace

# Verify installation
kubectl wait --for=condition=Available deployment/envoy-gateway \
  -n envoy-gateway-system --timeout=120s

# Install Gateway API CRDs if not present
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.2.1/standard-install.yaml

2. Create GatewayClass and Gateway

# gateway-class.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: envoy-gateway
spec:
  controllerName: gateway.envoyproxy.io/gatewayclass-controller
---
# gateway.yaml
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: task-api-gateway
  namespace: default
spec:
  gatewayClassName: envoy-gateway
  listeners:
  - name: http
    protocol: HTTP
    port: 80
    allowedRoutes:
      namespaces:
        from: Same
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: tls-cert
    allowedRoutes:
      namespaces:
        from: Same

3. Create HTTPRoute for Application

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: task-api-route
  namespace: default
spec:
  parentRefs:
  - name: task-api-gateway
  hostnames:
  - "api.example.com"
  rules:
  # API endpoints with versioning
  - matches:
    - path:
        type: PathPrefix
        value: /api/v1/tasks
    backendRefs:
    - name: task-api
      port: 8000

  # Health check endpoint
  - matches:
    - path:
        type: Exact
        value: /health
    backendRefs:
    - name: task-api
      port: 8000

  # Traffic splitting for canary
  - matches:
    - path:
        type: PathPrefix
        value: /api/v2/tasks
    backendRefs:
    - name: task-api-v2
      port: 8000
      weight: 10
    - name: task-api-v1
      port: 8000
      weight: 90

4. Apply Rate Limiting (BackendTrafficPolicy)

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-rate-limit
  namespace: default
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: task-api-route

  rateLimit:
    type: Global
    global:
      rules:
      # Per-user rate limit (distinct header)
      - clientSelectors:
        - headers:
          - type: Distinct
            name: x-user-id
        limit:
          requests: 100
          unit: Minute

      # Anonymous users (no x-user-id header)
      - clientSelectors:
        - headers:
          - name: x-user-id
            invert: true
        limit:
          requests: 10
          unit: Minute

  # Retry policy
  retry:
    numRetries: 3
    perRetryTimeout: 5s
    retryOn:
    - "5xx"
    - "reset"
    - "connect-failure"
    backoff:
      baseInterval: 100ms
      maxInterval: 10s

5. Configure Circuit Breaking

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-api-resilience
  namespace: default
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: task-api-route

  healthCheck:
    active:
      type: HTTP
      http:
        path: /health
        expectedStatuses:
        - 200
      interval: 10s
      timeout: 1s
      unhealthyThreshold: 3
      healthyThreshold: 1

  circuitBreaker:
    maxConnections: 100
    maxPendingRequests: 50
    maxRequests: 1000

6. Configure TLS with CertManager

# Install CertManager first
# kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.0/cert-manager.yaml

# cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          ingressClassName: envoy
---
# certificate.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: api-tls
  namespace: default
spec:
  secretName: tls-cert
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - api.example.com

7. JWT Authentication (SecurityPolicy)

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
  name: jwt-auth
  namespace: default
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: task-api-route

  jwt:
    providers:
    - name: auth0
      issuer: https://your-tenant.auth0.com/
      audiences:
      - https://api.example.com
      remoteJWKS:
        uri: https://your-tenant.auth0.com/.well-known/jwks.json
      claimToHeaders:
      - claim: sub
        header: x-user-id
      - claim: permissions
        header: x-user-permissions

8. Install KEDA for Autoscaling

# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace

9. Configure KEDA ScaledObject

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: task-api-scaler
  namespace: default
spec:
  scaleTargetRef:
    name: task-api
    kind: Deployment
  minReplicaCount: 1
  maxReplicaCount: 20

  triggers:
  # Scale based on Prometheus metrics (request rate)
  - type: prometheus
    metadata:
      serverAddress: http://prometheus.monitoring:9090
      metricName: http_requests_per_second
      query: sum(rate(envoy_http_downstream_rq_total{envoy_cluster_name="task-api"}[1m]))
      threshold: "100"

  # Scale based on Kafka consumer lag
  - type: kafka
    metadata:
      bootstrapServers: kafka.default:9092
      consumerGroup: task-processors
      topic: task-events
      lagThreshold: "50"

Key Patterns

Traffic Splitting for Canary Deployments

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: canary-route
spec:
  parentRefs:
  - name: api-gateway
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api
    backendRefs:
    # Stable version: 90%
    - name: api-stable
      port: 8000
      weight: 90
    # Canary version: 10%
    - name: api-canary
      port: 8000
      weight: 10

Header-Based A/B Testing

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ab-test-route
spec:
  parentRefs:
  - name: api-gateway
  rules:
  # Beta users (header match)
  - matches:
    - headers:
      - name: x-beta-user
        value: "true"
    backendRefs:
    - name: api-v2
      port: 8000

  # All other users
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: api-v1
      port: 8000

Envoy AI Gateway for LLM Traffic

# For AI agent traffic management
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
  name: llm-router
spec:
  backends:
  # Primary: OpenAI
  - name: openai
    priority: 0
    provider: openai
    model: gpt-4
    auth:
      type: APIKey
      apiKeyRef:
        name: openai-key

  # Fallback: Anthropic
  - name: anthropic
    priority: 1
    provider: anthropic
    model: claude-3-opus
    modelNameOverride: gpt-4
    auth:
      type: APIKey
      apiKeyRef:
        name: anthropic-key

  # Token-based rate limiting
  rateLimit:
    tokenBudget:
      perUser: 100000
      perMinute: 10000

Safety & Guardrails

NEVER

  • Expose management endpoints (health checks, metrics) without authentication
  • Use LocalRateLimit when cross-instance coordination is required
  • Skip TLS for production traffic
  • Set rate limits too high initially (start conservative, increase based on monitoring)
  • Use weight 0 for all backends in traffic splitting (will fail)
  • Deploy without health checks on backends

ALWAYS

  • Start with strict rate limits and loosen based on actual usage
  • Use ReferenceGrant for cross-namespace access
  • Configure health checks before enabling circuit breakers
  • Test canary deployments with small traffic percentages first
  • Monitor 429 (rate limit) and 503 (circuit breaker) responses
  • Use mTLS for backend traffic in production
  • Set appropriate timeouts (start with 30s, tune based on P99)

Cost Engineering

  • KEDA scale-to-zero saves 40-70% on idle workloads
  • Token-based rate limiting prevents LLM cost overruns
  • Local rate limiting avoids Redis costs when global isn't needed
  • Schedule non-production gateways to scale down outside business hours

TaskManager Example

Complete traffic engineering setup for Task API:

Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: task-api
  namespace: default
  labels:
    app: task-api
spec:
  replicas: 2
  selector:
    matchLabels:
      app: task-api
  template:
    metadata:
      labels:
        app: task-api
    spec:
      containers:
      - name: task-api
        image: task-api:latest
        ports:
        - containerPort: 8000
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 15
          periodSeconds: 20
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
---
apiVersion: v1
kind: Service
metadata:
  name: task-api
  namespace: default
spec:
  selector:
    app: task-api
  ports:
  - port: 8000
    targetPort: 8000

Full Gateway Configuration

# Gateway with TLS
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: task-gateway
spec:
  gatewayClassName: envoy-gateway
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    tls:
      mode: Terminate
      certificateRefs:
      - kind: Secret
        name: task-api-tls
---
# HTTPRoute with versioned paths
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: task-route
spec:
  parentRefs:
  - name: task-gateway
  hostnames:
  - tasks.example.com
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api/v1
    backendRefs:
    - name: task-api
      port: 8000
---
# Rate limiting + retries
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
  name: task-traffic
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: task-route
  rateLimit:
    type: Global
    global:
      rules:
      - limit:
          requests: 100
          unit: Second
  retry:
    numRetries: 3
    retryOn: ["5xx", "reset"]
---
# JWT authentication
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: SecurityPolicy
metadata:
  name: task-auth
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: task-route
  jwt:
    providers:
    - name: task-auth
      issuer: https://auth.example.com
      remoteJWKS:
        uri: https://auth.example.com/.well-known/jwks.json
---
# KEDA autoscaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: task-scaler
spec:
  scaleTargetRef:
    name: task-api
  minReplicaCount: 1
  maxReplicaCount: 10
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      query: sum(rate(http_requests_total{app="task-api"}[1m]))
      threshold: "50"

References

For detailed patterns, see:

  • references/gateway-api-patterns.md - HTTPRoute matching examples
  • references/envoy-gateway-crds.md - Full CRD reference
  • references/keda-scalers.md - KEDA scaler configurations
  • references/ai-gateway.md - Envoy AI Gateway patterns