返回 Skill 列表
extension
分类: 开发与工程无需 API Key

otel-demo-architecture

OpenTelemetry演示系统架构参考,用于调试服务依赖、端口冲突、内存问题、遥测缺口和性能问题。在分析故障、理解服务拓扑、排查通信问题或调查OTel演示环境中的系统瓶颈时使用。

person作者: jakexiaohubgithub

OpenTelemetry Demo Architecture

Quick reference for the OpenTelemetry Demo microservices system. Focus on service dependencies, critical paths, and common failure patterns.

For detailed observability queries: See the Observability Query Guides section below for comprehensive metrics, traces, and logs references.

Service Dependency Matrix

| Service | Language | Depends On | Protocol | Memory Limit | |---------|----------|------------|----------|--------------| | frontend | TypeScript | ad, cart, checkout, currency, product-catalog, recommendation, shipping, image-provider | gRPC | 250M | | checkout | Go | cart, currency, email, payment, product-catalog, shipping, kafka | gRPC, HTTP | 20M | | cart | .NET | valkey-cart, flagd | - | 160M | | product-catalog | Go | flagd | - | 20M | | recommendation | Python | product-catalog, flagd | gRPC | 50M | | shipping | Rust | quote | HTTP | 20M | | payment | JavaScript | flagd | - | 120M | | ad | Java | flagd | gRPC | 300M | | email | Ruby | - | - | 100M | | currency | C++ | - | - | 20M | | quote | PHP | - | - | 40M | | fraud-detection | Kotlin | kafka, flagd | TCP | - | | product-reviews | Python | product-catalog, llm, postgresql, flagd | gRPC | - | | accounting | .NET | kafka, postgresql | TCP | - | | frontend-proxy | Envoy | frontend, flagd, flagd-ui, image-provider | HTTP | 65M | | image-provider | nginx | - | - | 120M | | load-generator | Python | frontend-proxy, flagd | HTTP | 120M |

Critical Service Paths

User Request Flow:

Internet → frontend-proxy:ENVOY_PORT → frontend → [cart, product-catalog, recommendation, checkout]

Checkout/Order Flow:

checkout → cart (gRPC)
        → currency (gRPC)
        → payment (gRPC)
        → product-catalog (gRPC)
        → shipping (HTTP) → quote (HTTP)
        → email (HTTP)
        → kafka → [accounting, fraud-detection]

Telemetry Flow:

All services → otel-collector:4317(gRPC)/4318(HTTP) → [jaeger, prometheus, tempo, opensearch]
                                                     → grafana (visualization)

Observability Stack Details

Metrics Pipeline

Services → otel-collector (OTLP) → prometheus (scrape/remote-write) → grafana

Traces Pipeline

Services → otel-collector (OTLP) → [jaeger, tempo] → grafana

Logs Pipeline

Services → docker json-file → alloy → loki → grafana

Log Configuration

All services use json-file driver:

  • Max size: 5M
  • Max files: 2
  • Auto-rotation

Observability Query Guides

Detailed reference guides for querying observability data:

Metrics Guide

See metrics-guide.md for:

  • Complete Prometheus metrics catalog - All 260+ available metrics organized by category
  • Service-specific metrics - Application metrics for each demo service
  • HTTP/RPC metrics - Request rates, latencies, error rates
  • Runtime metrics - Go, .NET, JVM, Node.js, Python runtime instrumentation
  • Container metrics - CPU, memory, network, disk I/O
  • Feature flag metrics - Flag evaluation and impression tracking
  • OTEL Collector metrics - Pipeline health and throughput
  • Common label patterns - Service identification, filtering, aggregation
  • Query patterns - Request rates, error rates, percentiles, top-N

When to use: Building dashboards, investigating performance issues, analyzing resource utilization, monitoring service health.

Traces Guide

See traces-guide.md for:

  • TraceQL syntax and patterns - Complete query language reference
  • Resource attributes - Service, host, process, runtime identification (27 attributes)
  • Span attributes - HTTP, gRPC, database, application-specific (150+ attributes)
  • Event attributes - Exceptions, feature flags, business events
  • Intrinsic attributes - Duration, status, kind, instrumentation
  • Service-specific attributes - Cart, product, payment, shipping, ad service patterns
  • Common query patterns - Error investigation, performance analysis, feature flag impact

When to use: Debugging distributed transactions, investigating latency issues, understanding service dependencies, tracing business transactions.

Logs Guide

See logs-guide.md for:

  • LogQL syntax and patterns - Complete query language reference
  • Available labels - service_name, container, project (5 labels)
  • Service identification - All 16 application and infrastructure services
  • Common query patterns - Errors, HTTP requests, performance, business events
  • Log parsing - JSON parsing, pattern extraction, label filtering
  • Aggregations - Count, rate, percentiles, error percentages
  • Multi-service analysis - Cross-service errors, communication tracing

When to use: Investigating errors, debugging application logic, analyzing request patterns, root cause analysis.