Spring AI - Quick Reference

Full Reference: See advanced.md for image generation, multi-modal/vision, advisors/middleware, testing patterns, and prompt templates.

Deep Knowledge: Use mcp__documentation__fetch_docs with technology: spring-ai for comprehensive documentation.

Dependencies

<!-- OpenAI -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

<!-- Azure OpenAI -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-azure-openai-spring-boot-starter</artifactId>
</dependency>

<!-- Ollama (local) -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-ollama-spring-boot-starter</artifactId>
</dependency>

<!-- Vector Store - PGVector -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
</dependency>

Configuration

OpenAI

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
          temperature: 0.7
          max-tokens: 1000
      embedding:
        options:
          model: text-embedding-3-small

Azure OpenAI

spring:
  ai:
    azure:
      openai:
        api-key: ${AZURE_OPENAI_KEY}
        endpoint: ${AZURE_OPENAI_ENDPOINT}
        chat:
          options:
            deployment-name: gpt-4o
            temperature: 0.7

Ollama (Local)

spring:
  ai:
    ollama:
      base-url: http://localhost:11434
      chat:
        options:
          model: llama3
          temperature: 0.7

Basic Chat

@Service
@RequiredArgsConstructor
public class ChatService {

    private final ChatClient chatClient;

    public String chat(String message) {
        return chatClient.prompt()
            .user(message)
            .call()
            .content();
    }

    // With system prompt
    public String chatWithContext(String message) {
        return chatClient.prompt()
            .system("You are a helpful assistant specialized in Spring Boot.")
            .user(message)
            .call()
            .content();
    }

    // With parameters
    public String chatWithParams(String message, String topic) {
        return chatClient.prompt()
            .system(s -> s.text("You are an expert in {topic}.")
                .param("topic", topic))
            .user(message)
            .call()
            .content();
    }
}

ChatClient Builder

@Configuration
public class ChatClientConfig {

    @Bean
    public ChatClient chatClient(ChatClient.Builder builder) {
        return builder
            .defaultSystem("You are a helpful AI assistant.")
            .defaultOptions(ChatOptionsBuilder.builder()
                .withTemperature(0.7)
                .withMaxTokens(1000)
                .build())
            .build();
    }
}

Structured Output

public record BookRecommendation(
    String title,
    String author,
    String genre,
    String summary,
    int rating
) {}

@Service
public class BookService {

    private final ChatClient chatClient;

    public BookRecommendation getRecommendation(String preferences) {
        return chatClient.prompt()
            .user("Recommend a book based on: " + preferences)
            .call()
            .entity(BookRecommendation.class);
    }

    public List<BookRecommendation> getRecommendations(String preferences, int count) {
        return chatClient.prompt()
            .user("Recommend " + count + " books based on: " + preferences)
            .call()
            .entity(new ParameterizedTypeReference<List<BookRecommendation>>() {});
    }
}

Streaming

@Service
public class StreamingChatService {

    private final ChatClient chatClient;

    public Flux<String> streamChat(String message) {
        return chatClient.prompt()
            .user(message)
            .stream()
            .content();
    }

    // WebFlux controller
    @GetMapping(value = "/chat/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> streamResponse(@RequestParam String message) {
        return streamChat(message);
    }
}

Function Calling

@Configuration
public class FunctionConfig {

    @Bean
    @Description("Get current weather for a location")
    public Function<WeatherRequest, WeatherResponse> currentWeather() {
        return request -> weatherService.getWeather(request.location());
    }

    @Bean
    @Description("Search for products by name")
    public Function<ProductSearchRequest, List<Product>> searchProducts() {
        return request -> productService.search(request.query(), request.maxResults());
    }
}

public record WeatherRequest(String location) {}
public record WeatherResponse(String location, double temperature, String conditions) {}

@Service
public class AssistantService {

    private final ChatClient chatClient;

    public String assistWithFunctions(String message) {
        return chatClient.prompt()
            .user(message)
            .functions("currentWeather", "searchProducts")
            .call()
            .content();
    }
}

Embeddings

@Service
@RequiredArgsConstructor
public class EmbeddingService {

    private final EmbeddingModel embeddingModel;

    public float[] getEmbedding(String text) {
        EmbeddingResponse response = embeddingModel.embedForResponse(List.of(text));
        return response.getResult().getOutput();
    }

    public List<float[]> getEmbeddings(List<String> texts) {
        EmbeddingResponse response = embeddingModel.embedForResponse(texts);
        return response.getResults().stream()
            .map(e -> e.getOutput())
            .toList();
    }
}

Vector Store (RAG)

Configuration

spring:
  ai:
    vectorstore:
      pgvector:
        dimensions: 1536
        index-type: HNSW
        distance-type: COSINE_DISTANCE

RAG Query

@Service
@RequiredArgsConstructor
public class RagService {

    private final VectorStore vectorStore;
    private final ChatClient chatClient;

    public String queryWithContext(String question) {
        // Retrieve relevant documents
        List<Document> relevantDocs = vectorStore.similaritySearch(
            SearchRequest.query(question)
                .withTopK(5)
                .withSimilarityThreshold(0.7)
        );

        // Build context
        String context = relevantDocs.stream()
            .map(Document::getContent)
            .collect(Collectors.joining("\n\n"));

        // Generate response with context
        return chatClient.prompt()
            .system("""
                You are a helpful assistant. Answer questions based on the provided context.
                If the answer is not in the context, say "I don't have information about that."

                Context:
                {context}
                """)
            .user(question)
            .call()
            .content();
    }
}

QuestionAnswerAdvisor

@Configuration
public class RagConfig {

    @Bean
    public ChatClient ragChatClient(ChatClient.Builder builder, VectorStore vectorStore) {
        return builder
            .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore))
            .build();
    }
}

// Usage is simple - advisor handles RAG automatically
@Service
public class SimpleRagService {

    private final ChatClient ragChatClient;

    public String answer(String question) {
        return ragChatClient.prompt()
            .user(question)
            .call()
            .content();
    }
}

Best Practices

| Do | Don't | |----|-------| | Use structured output for predictable results | Parse free-form text manually | | Implement proper error handling | Ignore API failures | | Use streaming for long responses | Block on large generations | | Cache embeddings when possible | Regenerate embeddings repeatedly | | Set appropriate token limits | Use unlimited tokens |

Production Checklist

[ ] API keys secured (environment variables)
[ ] Rate limiting implemented
[ ] Error handling and retries
[ ] Token usage monitoring
[ ] Response caching where appropriate
[ ] Vector store properly indexed
[ ] Embedding dimension consistency
[ ] Prompt injection protection
[ ] Cost monitoring and alerts
[ ] Fallback models configured

When NOT to Use This Skill

Raw OpenAI/Anthropic API - Use respective SDKs directly
ML model training - Use Python frameworks (PyTorch, TensorFlow)
Non-Spring applications - Use LangChain or native SDKs
Simple text generation - May be overkill for trivial use cases

Anti-Patterns

| Anti-Pattern | Problem | Solution | |--------------|---------|----------| | Hardcoded API keys | Security risk | Use environment variables | | No token limit | Cost explosion | Set max-tokens appropriately | | Synchronous for long requests | Thread blocking | Use streaming | | Ignoring rate limits | API errors, bans | Implement retry with backoff | | No caching for embeddings | High costs | Cache embeddings locally | | Prompt injection vulnerability | Security risk | Sanitize user input |

Quick Troubleshooting

| Problem | Diagnostic | Fix | |---------|------------|-----| | API key invalid | Check error message | Verify OPENAI_API_KEY env var | | Rate limit exceeded | 429 error | Add retry logic, reduce requests | | Timeout on large prompts | Connection timeout | Use streaming, increase timeout | | Embeddings dimension mismatch | Vector store error | Match embedding model dimensions | | Structured output fails | JSON parse error | Simplify schema, add examples |

Reference Documentation

Spring AI Reference