✅ 安装 Python HTTP 请求库:

pip install requests

如使用 LangChain、LlamaIndex 等框架,可额外安装:

pip install langchain llama-index

确保 Ollama 已在本地启动,可通过命令验证:

ollama run llama2

或检查监听端口:

curl http://localhost:11434


🧪 二、基础用法:直接调用 REST API

📥 示例:基本问答请求

import requests

url = "http://localhost:11434/api/generate"
data = {
    "model": "llama2",
    "prompt": "什么是量子计算?",
    "stream": False
}

response = requests.post(url, json=data)
print(response.json()["response"])


🌊 三、流式输出模式

适合边生成边展示(如聊天应用打字效果):

import requests

url = "http://localhost:11434/api/generate"
data = {
    "model": "llama2",
    "prompt": "讲个冷笑话",
    "stream": True
}

with requests.post(url, json=data, stream=True) as r:
    for line in r.iter_lines():
        if line:
            print(line.decode("utf-8"))


💬 四、对话(chat)接口支持上下文

url = "http://localhost:11434/api/chat"
data = {
    "model": "llama2",
    "messages": [
        {"role": "system", "content": "你是编程专家"},
        {"role": "user", "content": "什么是 Python 装饰器?"},
        {"role": "assistant", "content": "装饰器是一个函数..."},
        {"role": "user", "content": "能给我个例子吗?"}
    ],
    "stream": False
}

response = requests.post(url, json=data)
print(response.json()["message"]["content"])


🧱 五、封装为 Python 函数

def ollama_chat(model, messages, stream=False):
    url = "http://localhost:11434/api/chat"
    payload = {
        "model": model,
        "messages": messages,
        "stream": stream
    }
    res = requests.post(url, json=payload)
    return res.json()["message"]["content"]

# 使用示例
history = [
    {"role": "system", "content": "你是旅行顾问"},
    {"role": "user", "content": "推荐我一个适合4月去的欧洲国家"}
]

reply = ollama_chat("llama2", history)
print(reply)


🔌 六、与 LangChain 集成(本地推理)

from langchain_community.llms import Ollama

llm = Ollama(model="llama2", base_url="http://localhost:11434")
print(llm("用一句话描述人工智能"))


🔄 七、与 LlamaIndex 集成(RAG + 本地模型)

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms import Ollama

llm = Ollama(model="llama2")
documents = SimpleDirectoryReader("docs").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(llm=llm)

response = query_engine.query("请从文档中找出历史背景")
print(response)


📚 八、使用场景示例

用途方法示例说明
简单问答/写作工具requests + /generate构建 CLI 或 GUI 写作助手
多轮问答requests + /chat上下文多轮对话
多文档问答LlamaIndex + Ollama结合 PDF/Markdown 文档进行问答
Web 聊天框架集成FastAPI / Flask + requests提供 Web API 服务
LangChain 工具链构建langchain_community.llmsAgent、RAG、提示工程等

✅ 总结

  • Ollama 提供了轻量级 REST 接口,便于 Python 快速集成
  • 配合 LangChain 和 LlamaIndex 可构建复杂 AI 应用
  • 无需联网、本地部署、安全高效