LangChain4J 全面技术详解
Java生态的LLM应用开发框架
目录
- 一、LangChain4J 概述
- 二、核心架构设计
- 三、核心组件详解
- 四、Prompt工程
- 五、Chains(链)
- 六、Tools与Agents
- 七、Memory(记忆)
- 八、RAG实现
- 九、Callback与监控
- 十、与Spring生态集成
- 十一、模型支持
- 十二、性能优化
- 十三、生产部署最佳实践
- 十四、与LangChain Python对比
- 十五、常见问题与解决方案
一、LangChain4J 概述
1.1 什么是LangChain4J
LangChain4J是LangChain框架的Java/Kotlin实现,为Java开发者提供构建LLM应用的全栈能力。
核心定位:
- 通用LLM应用框架:覆盖从简单Prompt到复杂Agent的全场景
- Java原生:类型安全、IDE友好、企业级特性
- 模块化设计:按需引入依赖,避免全量打包
- Spring生态融合:与Spring Boot、Spring AI深度集成
1.2 核心价值主张
- 开发者友好:Java类型系统、编译时检查、丰富的IDE支持
- 企业就绪:安全、监控、可测试性、可维护性
- 灵活组合:模块化设计,自由组合 Chains、Tools、Agents
- 多模型支持:OpenAI、Azure、Anthropic、HuggingFace、本地模型等
- 生产级:异步、流式、重试、熔断、缓存
1.3 与相关框架关系
LangChain(Python) → LangChain4J(Java移植)
↓
Spring AI(更高抽象,更简单)
↓
LlamaIndex(专注RAG,可与LangChain4J配合)
选择建议:
- 简单RAG:Spring AI或LlamaIndex
- 通用LLM应用:LangChain4J
- 复杂Agent:LangChain4J(Agent功能更强)
- 已有Spring生态:Spring AI优先
1.4 适用场景
- 智能客服:对话Agent、工具调用(查订单、退货)
- 数据分析:自然语言查询数据库(SQL生成)
- 代码助手:代码生成、审查、解释
- 文档问答:RAG应用(生产就绪)
- 自动化流程:多步骤推理、API调用链
- 内容生成:报告生成、摘要、翻译
1.5 生态组件
- 核心模块:
langchain4j-core - 模型集成:
langchain4j-open-ai、langchain4j-azure-open-ai、langchain4j-anthropic、langchain4j-ollama等 - 工具集成:
langchain4j-tools(搜索、计算、数据库等) - 向量存储:
langchain4j-vector-store-*(Milvus、Qdrant、Pinecone等) - Spring集成:
langchain4j-spring - 评估:
langchain4j-evaluation
二、核心架构设计
2.1 设计哲学
“LCEL”(LangChain Expression Language)理念:
// 声明式链式调用,类似函数组合
Chain chain = prompt.then(llm).then(outputParser);
关键原则:
- 可组合性:每个组件都是独立的,可任意组合
- 可配置性:所有参数暴露,便于调优
- 可测试性:每个组件可单独测试
- 异步优先:所有操作支持异步(CompletableFuture)
2.2 架构层次
┌─────────────────────────────────────────────────────────────┐
│ Applications │
│ (Spring Boot/Quarkus/Plain Java/Microservices) │
└───────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────────┴─────────────────────────────────┐
│ LangChain4J Core │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Prompt │ │ LLM │ │ Chain │ │ Agent │ │
│ │ Templates│ │ │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └─────────────┘ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Tools │ │ Memory │ │ Retriever │ │ Callbacks │ │
│ └──────────┘ └──────────┘ └──────────┘ └─────────────┘ │
└───────────────────────────┬─────────────────────────────────┘
│
┌───────────────────────────┴─────────────────────────────────┐
│ Model Providers │
│ (OpenAI/Azure/Anthropic/Ollama/HuggingFace/本地模型) │
└───────────────────────────────────────────────────────────┘
2.3 核心抽象
ChatLanguageModel:聊天模型接口(支持多轮对话、工具调用)LanguageModel:纯文本补全模型(单轮)Prompt:提示模板,支持变量插值Chain:链式调用,多个组件串联Tool:可调用函数(外部API、数据库查询等)Agent:自主决策,选择工具完成任务Memory:对话历史管理Retriever:文档检索(RAG场景)OutputParser:解析LLM输出为结构化数据
三、核心组件详解
3.1 模型(LLM)
3.1.1 ChatLanguageModel
主流聊天模型接口:
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.chat.request.ChatRequest;
import dev.langchain4j.model.chat.response.ChatResponse;
ChatLanguageModel model = new OpenAiChatModel(
new OpenAiChatModelOptions.Builder()
.apiKey(System.getenv("OPENAI_API_KEY"))
.modelName("gpt-4-turbo-preview")
.temperature(0.7)
.maxTokens(1000)
.build()
);
ChatResponse response = model.chat(
ChatRequest.builder()
.messages(UserMessage.from("Hello, how are you?"))
.build()
);
String answer = response.aiMessage().text();
3.1.2 LanguageModel
文本补全模型(无对话):
import dev.langchain4j.model.LanguageModel;
LanguageModel lm = new OpenAiLanguageModel(
new OpenAiLanguageModelOptions.Builder()
.apiKey("...")
.modelName("gpt-3.5-turbo-instruct")
.build()
);
String completion = lm.generate("Once upon a time");
3.1.3 流式响应(Streaming)
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
StreamingChatLanguageModel streamingModel = new OpenAiStreamingChatModel(...);
streamingModel.chat(
ChatRequest.builder()
.messages(UserMessage.from("Write a story"))
.build()
(token) -> {
System.out.print(token); // 实时输出
return true; // 返回false停止
}
);
3.2 Prompt与模板
3.2.1 基础PromptTemplate
import dev.langchain4j.prompt.PromptTemplate;
PromptTemplate promptTemplate = PromptTemplate.from(
"You are a {{role}}. Answer the following question: {{question}}"
);
Prompt prompt = promptTemplate.apply(
Map.of(
"role", "financial advisor",
"question", "What is compound interest?"
)
);
String text = prompt.text(); // 渲染后的完整Prompt
3.2.2 ChatPromptTemplate
多轮对话模板:
import dev.langchain4j.prompt.ChatPromptTemplate;
import dev.langchain4j.prompt.chat.ChatMessage;
import dev.langchain4j.prompt.chat.MessageType;
ChatPromptTemplate chatPrompt = ChatPromptTemplate.builder()
.systemMessage(MessageType.SYSTEM, "You are a helpful assistant.")
.message(MessageType.HUMAN, "Hello!")
.message(MessageType.AI, "Hi there!")
.message(MessageType.HUMAN, "{{question}}") // 变量占位符
.build();
Prompt prompt = chatPrompt.apply(
Map.of("question", "What is AI?")
);
预定义角色:
MessageType.SYSTEM:系统指令MessageType.HUMAN/USER:用户消息MessageType.AI/ASSISTANT:助手消息
3.2.3 Few-Shot Prompting
ChatPromptTemplate fewShot = ChatPromptTemplate.builder()
.message(MessageType.HUMAN, "Translate: Hello")
.message(MessageType.AI, "你好")
.message(MessageType.HUMAN, "Translate: {{text}}")
.build();
3.2.4 Prompt字符串插值
使用Mustache语法:
You are a {{role}}.
Context: {{context}}
Question: {{question}}
Answer:
3.3 OutputParsers(输出解析器)
将LLM文本输出解析为结构化数据:
3.3.1 简单解析器
import dev.langchain4j.output.parsers.AbstractOutputParser;
OutputParser<String> parser = new AbstractOutputParser<>() {
@Override
public String parse(String text) {
return text.trim(); // 自定义逻辑
}
};
3.3.2 JsonOutputParser
解析JSON:
import dev.langchain4j.output.parsers.JsonOutputParser;
import com.fasterxml.jackson.annotation.JsonProperty;
public class Answer {
@JsonProperty
public String answer;
@JsonProperty
public List<String> sources;
}
JsonOutputParser<Answer> parser = JsonOutputParser.from(Answer.class);
Prompt prompt = PromptTemplate.from(
"Answer the question. Respond in JSON format: {{format_instructions}}\n\nQuestion: {{question}}",
Map.of("format_instructions", parser.getFormatInstructions())
).apply(...);
String jsonResponse = model.generate(prompt.text());
Answer answer = parser.parse(jsonResponse);
3.3.3 RegexOutputParser
import dev.langchain4j.output.parsers.RegexOutputParser;
RegexOutputParser<Answer> parser = RegexOutputParser.from(
Answer.class,
".*Answer: (.*?)\\. Sources: (.*?)(?:\\n|$).*"
);
3.3.4 EnumOutputParser
import dev.langchain4j.output.parsers.EnumOutputParser;
enum Category { SPORTS, TECHNOLOGY, FINANCE, HEALTH }
EnumOutputParser<Category> parser = EnumOutputParser.from(Category.class);
3.4 Chains(链)
3.4.1 简单Chain
PromptTemplate promptTemplate = PromptTemplate.from(
"Translate '{{text}}' to {{language}}"
);
Chain chain = promptTemplate.then(model).then(new StringTrimmerOutputParser());
String result = chain.execute(
Map.of("text", "Hello", "language", "French")
);
// 输出: "Bonjour"
3.4.2 LLMChain
最常用:Prompt + LLM + Parser:
import dev.langchain4j.chain.LLMChain;
LLMChain chain = LLMChain.builder()
.prompt(promptTemplate)
.llm(model)
.outputParser(parser)
.build();
Answer answer = chain.execute(
Map.of("question", "What is RAG?")
).orElseThrow();
3.4.3 函数式Chain(LCEL风格)
Chain chain = PromptTemplate.from("{{input}}")
.then(model)
.then(outputParser);
String result = chain.run("Hello");
3.4.4 复合Chain
多个Chain顺序执行:
Chain firstChain = PromptTemplate.from("Q: {{q}}\nA:").then(model);
Chain secondChain = PromptTemplate.from("Summarize: {{text}}").then(model);
Chain combinedChain = firstChain.andThen(secondChain);
String finalOutput = combinedChain.run("What is Java?");
// 步骤:firstChain生成答案 → secondChain总结答案
3.4.5 并行Chain
List<Chain> parallelChains = List.of(chain1, chain2, chain3);
List<String> results = ChainExecutionUtils.executeAll(parallelChains);
3.5 Retrievers(检索器)
3.5.1 VectorStoreRetriever
import dev.langchain4j.retriever.DocumentRetriever;
import dev.langchain4j.store.embedding.VectorStore;
VectorStore vectorStore = ...; // 初始化向量库(Milvus/Pinecone等)
DocumentRetriever retriever = VectorStoreRetriever.builder()
.vectorStore(vectorStore)
.similaritySearchRequest(
SimilaritySearchRequest.builder()
.topK(5)
.build()
)
.build();
List<Content> relevantDocuments = retriever.retrieve("What is RAG?");
3.5.2 HybridRetriever
向量+关键词混合检索:
import dev.langchain4j.retriever.HybridRetriever;
HybridRetriever retriever = HybridRetriever.builder()
.vectorRetriever(vectorRetriever)
.keywordRetriever(keywordRetriever)
.rankFusionStrategy(RerankingStrategy.RECIPROCAL_RANK_FUSION)
.build();
3.5.3 自定义Retriever
import dev.langchain4j.retriever.DocumentRetriever;
public class MyRetriever implements DocumentRetriever {
private final VectorStore vectorStore;
@Override
public List<Content> retrieve(String query) {
// 自定义检索逻辑
return vectorStore.similaritySearch(query);
}
}
四、Tools与Agents
4.1 Tools(工具)
4.1.1 内置Tools
CalculatorTool:数学计算SearchTool:搜索引擎(Tavily、Google、Bing)HttpRequestTool:HTTP请求SqlQueryTool:SQL查询FileSystemTool:文件操作WeatherTool:天气查询- 等等
4.1.2 自定义Tool
import dev.langchain4j.agent.tool.Tool;
public class OrderTool implements Tool {
@Override
public String name() {
return "getOrderStatus";
}
@Override
public String description() {
return "Get the status of an order by order ID";
}
@Override
public String execute(@JsonProperty("orderId") String orderId) {
// 调用业务API
Order order = orderService.getOrder(orderId);
return order.getStatus();
}
@Override
public ToolParameter[] parameters() {
return new ToolParameter[] {
ToolParameter.builder()
.name("orderId")
.description("The order ID")
.type(ToolParameterType.STRING)
.required(true)
.build()
};
}
}
4.1.3 ToolRegistry
import dev.langchain4j.agent.tool.ToolRegistry;
ToolRegistry registry = ToolRegistry.builder()
.add(new CalculatorTool())
.add(new WeatherTool())
.add(new OrderTool())
.build();
4.2 Agents(智能体)
4.2.1 ZeroShotAgent
单步决策,选择工具并执行:
import dev.langchain4j.agent.Agent;
import dev.langchain4j.agent.tool.ToolExecutor;
Agent agent = Agent.builder()
.chatLanguageModel(model)
.tools(tools)
.build();
String result = agent.run("What's the weather in Paris and multiply by 2?");
// 自动调用: WeatherTool(Paris) → Calculator(...)
4.2.2 ReAct Agent
多步推理(Reason + Act):
import dev.langchain4j.agent.ChainOfThoughtsThoughtGenerator;
import dev.langchain4j.agent.ChainOfThoughtsThoughtExtractor;
import dev.langchain4j.agent.ReActAgent;
ReActAgent agent = ReActAgent.builder()
.chatLanguageModel(model)
.tools(tools)
.maxIterations(10) // 最大步数
.thoughtExtractor(new ChainOfThoughtsThoughtExtractor())
.build();
AgentExecutionResult result = agent.execute("Complex question...");
Thought步骤:
- Thought: 思考下一步
- Action: 选择工具及参数
- Observation: 工具返回结果
- 重复直到完成
4.2.3 ConversationAgent
有记忆的对话Agent:
import dev.langchain4j.agent.ConversationalAgent;
import dev.langchain4j.memory.ChatMemory;
ConversationalAgent agent = ConversationalAgent.builder()
.chatLanguageModel(model)
.tools(tools)
.chatMemory(memory) // 记忆组件
.build();
agent.execute("Hi, I'm John");
agent.execute("What's my name?"); // 记住名字
4.2.4 Agent执行控制
AgentExecutionResult result = agent.execute(
AgentExecutorRequest.builder()
.input("Calculate 123 * 456")
.interrupt(Runnable::run) // 中断回调
.callback(new AgentCallback() {
@Override
public void onTool(String toolName, String input, String output) {
log.info("Tool {} called with {}, returned {}", toolName, input, output);
}
@Override
public void onThought(String thought) {
log.debug("Thought: {}", thought);
}
})
.build()
);
五、Memory(记忆)
5.1 ChatMemory
管理对话历史:
import dev.langchain4j.memory.chat.ChatMemory;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
ChatMemory memory = MessageWindowChatMemory.withMaxMessages(10);
// 保留最近10条消息
memory.add(UserMessage.from("Hello"));
memory.add(AiMessage.from("Hi! How can I help?"));
// 自动包含历史到下一轮
ChatResponse response = model.chat(
ChatRequest.builder()
.messages(memory.messages(), UserMessage.from("How are you?"))
.build()
);
5.2 内存实现
5.2.1 MessageWindowChatMemory
固定数量窗口:
ChatMemory memory = MessageWindowChatMemory.withMaxMessages(20);
5.2.2 TokenWindowChatMemory
按Token数量窗口:
ChatMemory memory = TokenWindowChatMemory.withMaxTokens(2000);
5.2.3 ChatMemoryStore(自定义存储)
import dev.langchain4j.memory.chat.ChatMemoryStore;
import dev.langchain4j.memory.chat.ChatMemory;
ChatMemoryStore store = new ChatMemoryStore() {
@Override
public List<ChatMessage> getMessages(ChatMemoryId id) {
// 从DB/Redis读取
return loadFromDB(id);
}
@Override
public void updateMessages(ChatMemoryId id, List<ChatMessage> messages) {
// 保存到DB/Redis
saveToDB(id, messages);
}
@Override
public void deleteMessages(ChatMemoryId id) {
deleteFromDB(id);
}
};
ChatMemory memory = ChatMemory.builder()
.chatMemoryStore(store)
.maxMessages(50)
.build();
5.3 对话ID管理
ChatMemoryId memoryId = ChatMemoryId.of("user-123", "conversation-456");
memory.add(memoryId, UserMessage.from("Hello"));
5.4 总结记忆(Summarization)
避免无限增长:
import dev.langchain4j.memory.chat.SummarizingChatMemory;
ChatMemory memory = SummarizingChatMemory.with(
model, // 用于总结的LLM
TokenWindowChatMemory.withMaxTokens(2000),
SummarizationStrategy.WITHIN_TIME_WINDOW // 每10分钟总结一次
);
六、RAG实现
6.1 完整RAG流程
// 1. 加载文档
Document document = Document.from("path/to/file.pdf");
List<Document> documents = List.of(document);
// 2. 分块
TextSplitter splitter = RecursiveCharacterTextSplitter.builder()
.chunkSize(512)
.chunkOverlap(50)
.build();
List<TextSegment> segments = splitter.split(documents);
// 3. 向量化并存储
EmbeddingModel embeddingModel = new OpenAiEmbeddingModel(...);
VectorStore vectorStore = new PineconeVectorStore(...);
EmbeddingStore<TextSegment> embeddingStore = new EmbeddingStoreAdapter<>(vectorStore);
embeddingModel.embedAll(segments, embeddingStore);
// 4. 检索
RetrievalQuery query = RetrievalQuery.query("What is RAG?");
List<Content> relevant = embeddingStore.findRelevant(query, 5);
// 5. 构建Prompt
PromptTemplate promptTemplate = PromptTemplate.from(
"Context:\n{{context}}\n\nQuestion: {{question}}\nAnswer:"
);
Prompt prompt = promptTemplate.apply(
Map.of(
"context", relevant.stream().map(Content::text).collect(Collectors.joining("\n---\n")),
"question", "What is RAG?"
)
);
// 6. 生成回答
String answer = model.generate(prompt.text());
// 7. 返回(answer + 引用来源)
6.2 使用RetrievalQAChain(封装)
import dev.langchain4j.chain.qa.ConversationalRetrievalChain;
RetrievalQAChain chain = ConversationalRetrievalChain.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.chatLanguageModel(model)
.chatMemory(memory)
.build();
Answer answer = chain.execute("What is RAG?");
6.3 Rerank(重排序)
import dev.langchain4j.retriever.Reranker;
Reranker reranker = new CrossEncoderReranker(
"BAAI/bge-reranker-base"
);
List<Content> retrieved = retriever.retrieve(query);
List<ScoredContent<T>> reranked = reranker.rerank(query, retrieved);
七、Agents深度
7.1 Tool Calling(函数调用)
支持OpenAI Function Calling:
ChatLanguageModel model = new OpenAiChatModel(
OpenAiChatModelOptions.builder()
.modelName("gpt-4-turbo-preview")
.functionRegistry(registry) // 注册工具
.build()
);
// 模型自动决定调用工具
ChatResponse response = model.chat(
ChatRequest.builder()
.messages(UserMessage.from("What's the weather in Tokyo?"))
.build()
);
if (response.aiMessage().hasToolExecutionRequests()) {
List<ToolExecutionRequest> requests = response.aiMessage().toolExecutionRequests();
for (ToolExecutionRequest request : requests) {
String result = registry.execute(request); // 执行工具
// 将结果回传给模型
}
}
7.2 Planning(规划)
复杂任务分解:
import dev.langchain4j.agent.planner.Planner;
import dev.langchain4j.agent.planner.SimplePlanner;
Planner planner = SimplePlanner.builder()
.chatLanguageModel(model)
.tools(tools)
.maxIterations(10)
.build();
Plan plan = planner.plan("Analyze quarterly sales and send report via email");
for (Step step : plan.steps()) {
// 执行步骤
}
7.3 Multi-Agent(多Agent协作)
import dev.langchain4j.chain.MultiPromptRouterChain;
import dev.langchain4j.chain.MultiRouteChain;
// 路由到不同专家Agent
MultiPromptRouterChain router = MultiPromptRouterChain.builder()
.llm(model)
.promptTemplates(Map.of(
"legal", legalPromptTemplate,
"finance", financePromptTemplate,
"tech", techPromptTemplate
))
.build();
String result = router.execute(question);
八、Callbacks与监控
8.1 Callback机制
拦截关键事件:
import dev.langchain4j.callback.*;
CallbackManager callbackManager = new CallbackManager.Builder()
.addListener(new StdOutCallbackListener()) // 输出到控制台
.addListener(new LoggingCallbackListener())
.addListener(new MetricsCallbackListener())
.build();
// 传递到所有组件
ChatLanguageModel model = new OpenAiChatModel(..., callbackManager);
Chain chain = ..., callbackManager);
Agent agent = ..., callbackManager);
// 执行时会触发回调
chain.execute(...);
8.2 自定义Listener
public class MetricsListener extends AbstractCallbackListener {
private final MeterRegistry meterRegistry;
@Override
public void onLLMStart(LLMStartData data) {
timer = meterRegistry.timer("llm.request");
timer.start();
}
@Override
public void onLLMEnd(LLMEndData data) {
timer.stop();
meterRegistry.counter("llm.tokens").increment(data.tokenUsage().total());
}
@Override
public void onChainStart(ChainStartData data) {
// 记录chain开始
}
@Override
public void onChainEnd(ChainEndData data) {
// 记录chain结束、耗时
}
}
8.3 分布式追踪
集成OpenTelemetry:
import dev.langchain4j.callback.opus.OpusCallback;
OpusCallback opus = OpusCallback.builder()
.applicationName("my-rag-app")
.environment("production")
.build();
CallbackManager callbackManager = CallbackManager.builder()
.addListener(opus)
.build();
九、Spring集成
9.1 Spring Boot Starter
最简配置:
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-spring-boot-starter</artifactId>
<version>0.30.0</version>
</dependency>
langchain4j:
open-ai:
api-key: ${OPENAI_API_KEY}
model-name: gpt-4-turbo-preview
temperature: 0.7
embedding:
model-name: text-embedding-ada-002
vector-store:
type: pinecone
api-key: ${PINECONE_API_KEY}
index-name: my-index
9.2 自动配置Bean
@SpringBootApplication
public class MyApp {
@Bean
public ChatLanguageModel chatLanguageModel() {
return OpenAiChatModel.builder()
.apiKey(env.get("OPENAI_API_KEY"))
.modelName("gpt-4")
.build();
}
@Bean
public EmbeddingModel embeddingModel() {
return OpenAiEmbeddingModel.builder()
.apiKey(env.get("OPENAI_API_KEY"))
.build();
}
@Bean
public VectorStore vectorStore(EmbeddingModel embeddingModel) {
return PineconeVectorStore.builder()
.apiKey(env.get("PINECONE_API_KEY"))
.indexName("my-index")
.embeddingModel(embeddingModel)
.build();
}
}
9.3 配置方式对比
Java Config vs Spring Boot AutoConfig:
// 传统方式
ChatLanguageModel model = OpenAiChatModel.builder().build();
// Spring Boot方式(自动注入)
@Autowired
private ChatLanguageModel chatLanguageModel;
// 同时支持多模型
@Bean
@Qualifier("gpt4")
public ChatLanguageModel gpt4() { ... }
@Bean
@Qualifier("claude")
public ChatLanguageModel claude() { ... }
9.4 Spring AI对比
LangChain4J vs Spring AI:
| 特性 | LangChain4J | Spring AI |
|---|---|---|
| 功能完整度 | ★★★★★ | ★★★☆☆ |
| Spring集成 | ★★★★☆ | ★★★★★ |
| Agent能力 | ★★★★★ | ★★☆☆☆ |
| 企业特性 | ★★★★★ | ★★★★☆ |
| 学习曲线 | 陡峭 | 平缓 |
- 选择Spring AI:简单RAG、快速起效、深度Spring集成
- 选择LangChain4J:复杂Agent、高级功能、生产级监控
9.5 @ConfigurationProperties
@ConfigurationProperties(prefix = "langchain4j")
public class LangChain4JProperties {
private OpenAi openAi = new OpenAi();
private VectorStore vectorStore = new VectorStore();
// getters & setters
public static class OpenAi {
private String apiKey;
private String modelName;
// ...
}
}
十、模型支持
10.1 云服务API
10.1.1 OpenAI
ChatLanguageModel model = new OpenAiChatModel(
OpenAiChatModelOptions.builder()
.apiKey("sk-...")
.modelName("gpt-4-turbo-preview")
.temperature(0.7)
.maxTokens(2000)
.organizationId("org-...") // 可选
.build()
);
- 支持:gpt-4、gpt-4-turbo、gpt-3.5-turbo、embedding模型
- 流式:
OpenAiStreamingChatModel - 函数调用:自动注册Tool
10.1.2 Azure OpenAI
ChatLanguageModel model = new AzureOpenAiChatModel(
AzureOpenAiChatModelOptions.builder()
.endpoint("https://my-resource.openai.azure.com")
.apiKey("...")
.deploymentName("gpt-4-deployment")
.apiVersion("2024-02-15-preview")
.build()
);
10.1.3 Anthropic(Claude)
ChatLanguageModel model = new AnthropicChatModel(
AnthropicChatModelOptions.builder()
.apiKey("sk-ant-...")
.modelName("claude-3-opus-20240229")
.maxTokens(4000)
.temperature(0.7)
.build()
);
- 支持:claude-3-opus/sonnet/haiku
10.1.4 Google PaLM / Gemini
ChatLanguageModel model = new GoogleAiChatModel(
GoogleAiChatModelOptions.builder()
.apiKey("...")
.modelName("gemini-pro")
.temperature(0.8)
.build()
);
10.2 本地推理
10.2.1 Ollama
本地运行开源模型:
ChatLanguageModel model = new OllamaChatModel(
OllamaChatModelOptions.builder()
.baseUrl("http://localhost:11434")
.modelName("llama2:13b") // 或codellama、mistral
.temperature(0.7)
.build()
);
优势:
- 无API费用
- 数据不离开内网
- 低延迟(本地网络)
要求:机器有GPU(推荐)或足够内存
10.2.2 HuggingFace
使用Transformers:
ChatLanguageModel model = new HuggingFaceChatModel(
HuggingFaceChatModelOptions.builder()
.accessToken("hf_...")
.modelId("meta-llama/Llama-2-13b-chat-hf")
.temperature(0.7)
.build()
);
注意:需Java环境有足够内存(>32GB)
10.2.3 LocalAI
兼容OpenAI API的本地模型:
ChatLanguageModel model = new OpenAiChatModel(
OpenAiChatModelOptions.builder()
.baseUrl("http://localhost:8080/v1") // LocalAI endpoint
.apiKey("not-needed")
.modelName("llama2:13b")
.build()
);
10.3 嵌入模型
EmbeddingModel embeddingModel = new OpenAiEmbeddingModel(
OpenAiEmbeddingModelOptions.builder()
.apiKey("...")
.modelName("text-embedding-3-small")
.build()
);
// 单条
List<Float> embedding = embeddingModel.embed("Hello world");
// 批量
List<List<Float>> embeddings = embeddingModel.embedAll(List.of("text1", "text2"));
本地嵌入模型:
EmbeddingModel embeddingModel = new HuggingFaceEmbeddingModel(
HuggingFaceEmbeddingModelOptions.builder()
.modelId("BAAI/bge-small-en-v1.5")
.build()
);
十一、向量存储
11.1 支持的向量数据库
完整支持列表:
- Pinecone(云托管)
- Weaviate(开源+云)
- Qdrant(开源+云)
- Milvus(开源+云)
- Elasticsearch(全文+向量混合)
- Azure AI Search
- OpenSearch
- Cassandra/ScyllaDB
- Redis(RediSearch模块)
- Neo4j(图+向量)
- Chroma(轻量级)
- Vespa
- 自定义:实现
VectorStore接口
11.2 向量存储用法
import dev.langchain4j.store.embedding.VectorStore;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.RetrievedEmbedding;
// 通用接口
EmbeddingStore<TextSegment> store = new PineconeVectorStore(...);
// 添加
store.add(Embeddable.from(segment, embedding));
// 搜索(相似度)
List<RetrievedEmbedding<TextSegment>> results = store.findRelevant(
query, 5
);
11.3 向量+元数据过滤
import dev.langchain4j.store.embedding.MetadataFilter;
import dev.langchain4j.store.embedding.MetadataFilterBuilder;
List<RetrievedEmbedding<TextSegment>> results = store.findRelevant(
query,
5,
MetadataFilterBuilder
.metadata("author", "John Doe")
.and(MetadataFilterBuilder.metadata("date", ">2024-01-01"))
.build()
);
11.4 向量存储配置示例
11.4.1 Pinecone
VectorStore store = PineconeVectorStore.builder()
.apiKey(System.getenv("PINECONE_API_KEY"))
.indexName("my-index")
.environment("us-east-1")
.embeddingModel(embeddingModel)
.build();
11.4.2 Milvus
VectorStore store = MilvusVectorStore.builder()
.host("localhost")
.port(19530)
.database("default")
.collectionName("my_collection")
.embeddingModel(embeddingModel)
.build();
11.4.3 PostgreSQL + pgvector
VectorStore store = PgVectorVectorStore.builder()
.host("localhost")
.port(5432)
.database("mydb")
.user("user")
.password("pass")
.tableName("embeddings")
.dimension(1536)
.build();
十二、性能优化
12.1 批处理嵌入
List<String> texts = List.of("text1", "text2", ...);
List<List<Float>> embeddings = embeddingModel.embedAll(texts); // 批量API调用
for (int i = 0; i < texts.size(); i++) {
store.add(Embeddable.from(texts.get(i), embeddings.get(i)));
}
12.2 缓存策略
12.2.1 Embedding缓存
EmbeddingModel cachingEmbeddingModel = EmbeddingModelAdapters.cache(
embeddingModel,
Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterAccess(1, TimeUnit.HOURS)
.build()
);
12.2.2 LLM响应缓存
ChatLanguageModel cachingModel = ChatLanguageModelAdapters.cache(
model,
Caffeine.newBuilder()
.maximumSize(1_000)
.expireAfterWrite(10, TimeUnit.MINUTES)
.build()
);
注意:缓存失效策略(基于prompt hash)
12.3 异步与非阻塞
CompletableFuture<String> future = model.generateAsync("Hello");
future.thenAccept(answer -> {
// 处理答案
});
12.4 连接池配置
// HTTP客户端配置
HttpClientBuilderCustomizer customizer = httpClientBuilder ->
httpClientBuilder
.maxConnections(100)
.maxConnectionsPerRoute(50)
.connectionTimeout(10, TimeUnit.SECONDS)
.responseTimeout(60, TimeUnit.SECONDS);
12.5 超时与重试
import dev.langchain4j.retry.RetryHelper;
RetryHelper retryHelper = RetryHelper.builder()
.maxAttempts(3)
.initialInterval(100, TimeUnit.MILLISECONDS)
.maxInterval(5, TimeUnit.SECONDS)
.multiplier(2.0)
.build();
String result = retryHelper.execute(() -> model.generate(prompt));
12.6 成本控制
// Token使用追踪
TokenUsage tokenUsage = response.tokenUsage();
log.info("Tokens: prompt={}, completion={}, total={}",
tokenUsage.inputTokenCount(),
tokenUsage.outputTokenCount(),
tokenUsage.totalTokenCount()
);
// 预算限制
if (tokenUsage.totalTokenCount() > MAX_DAILY_TOKENS) {
throw new BudgetExceededException();
}
十三、测试与评估
13.1 单元测试
@Test
void testChain() {
// Mock LLM
ChatLanguageModel mockModel = mock(ChatLanguageModel.class);
when(mockModel.generate(any()))
.thenReturn(ChatResponse.from(AiMessage.from("mocked answer")));
// 测试Chain
LLMChain chain = LLMChain.builder()
.prompt(promptTemplate)
.llm(mockModel)
.build();
String result = chain.execute(Map.of("question", "test")).orElseThrow();
assertEquals("mocked answer", result);
}
13.2 评估框架
import dev.langchain4j.evaluation.Evaluator;
import dev.langchain4j.evaluation.qa.QAEvaluator;
QAEvaluator evaluator = QAEvaluator.builder()
.chatLanguageModel(model) // 使用GPT-4作为评判
.build();
EvaluationResult result = evaluator.evaluate(
EvaluationInput.builder()
.referenceAnswer("Expected answer")
.actualAnswer("Generated answer")
.query("Question")
.build()
);
result.criteria().forEach(criteria -> {
log.info("{}: {}", criteria.name(), criteria.score());
});
评估维度:
- Correctness:答案正确性
- Faithfulness:基于上下文,无幻觉
- Relevance:答案相关性
- Coherence:连贯性
- Toxicity:毒性
十四、生产环境最佳实践
14.1 配置管理
@Configuration
public class LangChainConfig {
@Value("${langchain4j.openai.api-key}")
private String apiKey;
@Bean
public ChatLanguageModel chatLanguageModel() {
return OpenAiChatModel.builder()
.apiKey(apiKey)
.modelName(env.getProperty("langchain4j.openai.model", "gpt-4"))
.temperature(
env.getProperty("langchain4j.openai.temperature", Double.class, 0.7)
)
.callTimeout(30, TimeUnit.SECONDS)
.maxRetries(3)
.build();
}
}
14.2 多环境配置
# application-dev.yml
langchain4j:
open-ai:
api-key: ${OPENAI_API_KEY_DEV}
model-name: gpt-3.5-turbo
# application-prod.yml
langchain4j:
open-ai:
api-key: ${OPENAI_API_KEY_PROD}
model-name: gpt-4-turbo
max-retries: 5
14.3 健康检查
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
@Component
public class LangChainHealthIndicator implements HealthIndicator {
@Autowired
private ChatLanguageModel model;
@Override
public Health health() {
try {
// 简单测试调用
model.generate("test");
return Health.up().build();
} catch (Exception e) {
return Health.down(e).build();
}
}
}
14.4 指标暴露(Micrometer)
@Bean
public MeterRegistryCustomizer<MeterRegistry> langchainMetrics() {
return registry -> {
Counter.builder("langchain.requests.total")
.description("Total LLM requests")
.register(registry);
Timer.builder("langchain.request.duration")
.description("LLM request duration")
.register(registry);
};
}
十五、常见问题与解决方案
Q1: 如何选择模型?
- 高质量:GPT-4、Claude 3 Opus
- 性价比:GPT-4o mini、Claude 3 Haiku
- 中文优化:国内大模型(通义、文心、ChatGLM)通过OpenAI兼容API
- 免费部署:Llama 3、Mistral(Ollama本地)
Q2: 如何处理长上下文?
- 使用支持长上下文的模型(GPT-4-32k、Claude 200k)
- 分块检索(RAG)
- 使用
TokenWindowChatMemory限制记忆长度 - 总结(Summarization)压缩历史
Q3: 如何降低Token消耗?
- 精简Prompt(删除冗余)
- 缓存常用结果
- 使用更小模型处理简单任务
- 限制返回长度(maxTokens)
- 分块检索时只返回top相关
Q4: 如何调试Prompt?
import dev.langchain4j.prompt.PromptTemplate;
PromptTemplate prompt = PromptTemplate.from("{{input}}");
String rendered = prompt.apply(Map.of("input", "test"));
log.debug("Prompt: {}", rendered); // 查看实际发送的prompt
Q5: 流式响应如何集成WebSocket?
@ServerEndpoint("/chat/stream")
public class ChatWebSocket {
@OnMessage
public void onMessage(String message, Session session) {
streamingModel.chat(
ChatRequest.builder()
.messages(UserMessage.from(message))
.build(),
token -> {
session.getBasicRemote().sendText(token);
return true;
}
);
}
}
Q6: 如何处理异常?
try {
String response = model.generate(prompt);
} catch (Exception e) {
if (e instanceof RateLimitException) {
// 限流,等待后重试
} else if (e instanceof InvalidRequestException) {
// 请求错误,记录并降级
return fallbackResponse();
} else if (e instanceof RuntimeException) {
// 网络超时等
throw new ServiceUnavailableException("LLM service down", e);
}
}
Q7: 如何保证线程安全?
ChatLanguageModel是无状态的,线程安全ChatMemory不是线程安全,需每个会话单独实例PromptTemplate不可变,线程安全
Q8: 如何支持多租户?
public class TenantAwareChatService {
public String chat(String tenantId, String message) {
// 1. 根据tenant选择模型(不同配额)
ChatLanguageModel model = modelRegistry.getModelForTenant(tenantId);
// 2. 租户专属memory(Redis key带上tenant)
ChatMemory memory = getMemoryForTenant(tenantId);
// 3. 租户权限检查(工具过滤)
List<Tool> allowedTools = getToolsForTenant(tenantId);
// 4. 审计日志
auditLog(tenantId, message);
return model.chat(...);
}
}
Q9: 如何评估RAG质量?
- 人工评估:随机抽样,打分
- 自动评估:使用GPT-4作为Judge(Relevancy、Faithfulness)
- A/B测试:对比不同chunk策略
- 用户反馈:👍/👎按钮收集
Q10: 生产环境监控哪些指标?
- 业务:QPS、响应时间(P50/P95/P99)、错误率、用户满意度
- 模型:Token使用量、成本、各模型比例、限流触发
- 向量库:查询延迟、QPS、内存/CPU、命中率
- 应用:JVM内存、GC次数、线程池状态、数据库连接
十六、总结
LangChain4J核心优势
- 功能完整:覆盖LLM应用全场景(Prompt、Chain、Agent、Memory、RAG)
- Java原生:类型安全、编译检查、企业级
- 生产就绪:异步、流式、监控、重试、缓存
- 多模型支持:OpenAI、Azure、Anthropic、本地Ollama等
- Spring集成:Spring Boot Starter、自动配置
与竞品对比
- LangChain(Python):功能同步,生态更成熟,但Java需跟随
- Spring AI:更简单,Spring深度集成,适合快速RAG
- LlamaIndex:专注RAG,检索能力更强,可作为Retriever组件配合使用
学习路线建议
- 入门(1-2天):PromptTemplate + LLMChain + 简单RAG
- 进阶(1周):Agents + Tools + Memory + 复杂Chain
- 高级(1月):自定义组件、生产部署、性能优化
- 专家(数月):框架二次开发、源码阅读
适用团队
- Java技术栈:后端服务、微服务、企业应用
- 已有Spring Boot:可无缝集成
- 需要复杂Agent逻辑:工具调用、多步推理
- 生产环境要求高:监控、稳定性、可维护性
不建议场景
- 简单RAG:Spring AI或LlamaIndex更简单
- Python团队:直接用LangChain Python
- 强实时流处理:考虑专门流处理框架(Flink)
文档版本:v1.0
最后更新:2025年3月
参考:https://langchain4j.dev
评论区