Python 通过 URL 获取 Embedding

`requests`

需求类似于不希望内部数据上传到其他网页，于是希望在本地同时部署 Embedding Model 和 LLM.

于是，我用 llama-server 同时 serve 了 BGE-m3 和 Deepseek-R1-Distill-Llama-8B，前者作为 Embedding 模型暴露在 http://localhost:8081，后者作为 LLM 暴露在 http://localhost:8080

然后就遇到了一个小问题，怎么通过 Python 去获取 Embedding 呢？我这里的解决方案是直接用 requests 库发送请求了。好在 llama.cpp 提供的 llama-server 能够兼容 OpenAI 的 API 接口。

import requests

embedding_url = "http://localhost:8081/v1/embeddings"
# OpenAI compatible embedding
api_key = "not_used"
# 因为是本地部署，所有干脆没有设置 API Key
data = {
    "input": "要嵌入的文字",
    "model": "BGE-m3", # 这里就填本地部署的模型名称
}
headers = {
    "Authorization": f"Bearer {api_key}",
    "content-type": "application/json",
}

result = requests.post(
    embedding_url, 
    data=str(data), # 这里必须将字典以字符串的格式传入
    headers=headers, # 这个 headers 其实也可以不用
)

embedding = result.json()["data"][0]["embedding"]