🤗 Argilla 中使用 Hugging Face Inference Endpoints 的 LLM 建议

我们很高兴地推出在 Argilla 中使用 Hugging Face Inference Endpoints 的建议功能！从 Argilla v1.13.0 开始，任何人只需几行代码即可向反馈数据集记录添加建议。这通过将标注任务转变为快速验证和更正过程，从而减少了生成高质量数据集所需的时间。

Hugging Face 的 Inference Endpoints 使在 Hub 上服务任何 ML 模型变得前所未有的容易。您只需选择要服务的模型、您首选的云提供商和区域，以及要使用的实例类型。在几分钟内，您就可以拥有一个运行中的推理端点。

感谢 Argilla 与 Hugging Face Spaces 模板的集成，此前在 🚀 在 Hugging Face Spaces 上启动 Argilla 中发布过，您只需点击几下即可启动并运行 Argilla 实例。这使您能够将整个工作流程保留在 Hugging Face 的生态系统中。

在这篇文章中，我们将演示如何在 Hugging Face Spaces 中设置 Argilla 实例，部署 Hugging Face Inference Endpoint 以服务 Llama 2 7B Chat，并将其集成到 Argilla 中，以便为 Argilla 数据集添加建议。

只需不到 10 行代码，您就可以使用 Hugging Face Inference Endpoints 自动将 LLM 驱动的建议添加到 Argilla 数据集中的记录！

🚀 在 Spaces 中部署 Argilla

您可以使用多种部署选项之一自行托管 Argilla，注册 Argilla Cloud，或使用此一键部署按钮在 Hugging Face Spaces 上启动 Argilla 实例

🚀 部署 Llama 2 Inference Endpoint

现在，我们可以设置 Hugging Face Inference Endpoint。这使我们能够轻松地在专用、完全托管的基础架构上服务任何模型，同时通过其安全、合规且灵活的生产解决方案保持低成本。

如前所述，我们将使用 Llama 2 的 7B 参数变体，采用 Hugging Face 的格式，针对聊天完成进行了微调。您可以在 meta-llama/llama-2-7b-chat-hf 找到此模型。其他变体也可在 Hugging Face Hub 上找到，网址为 https://hugging-face.cn/meta-llama。

注意： 为了使用 Llama 2，在撰写本文时，用户需要访问 Meta 网站并接受他们的许可条款和可接受使用政策，然后才能通过 Hugging Face Hub 在 Meta 的 Llama 2 组织请求访问 Llama 2 模型。

首先，我们需要确保 Inference Endpoint 正在运行。一旦我们检索到 URL，我们就可以开始向其发送请求。

在向 Inference Endpoint 发送请求之前，我们应该预先知道我们需要使用哪个系统提示，以及我们应该如何格式化我们的提示。在这种情况下，由于我们正在使用 meta-llama/llama-2-7b-chat-hf，我们将需要查找用于微调它的提示，并在发送推理请求时复制相同的格式。有关 Llama 2 的更多信息，请访问 Hugging Face 博客 - Llama 2 发布 - 在 Hugging Face 上获取。

system_prompt = (  "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible,"  " while being safe. Your answers should not include any harmful, unethical, racist, sexist,"  " toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased"  " and positive in nature.\nIf a question does not make any sense, or is not factually coherent,"  " explain why instead of answering something not correct. If you don't know the answer to a"  " question, please don't share false information.")base_prompt = "<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{prompt} [/INST]"

一旦定义了提示，我们就准备好从 huggingface_hub 实例化 InferenceClient，以便稍后通过 text_generation 方法向 Inference Endpoint 发送请求。

以下代码片段展示了如何从我们的 Argilla 实例检索现有的反馈数据集，以及如何使用 huggingface_hub 中的 InferenceClient 向已部署的 Inference Endpoint 发送请求，以便为数据集中的记录添加建议。

import argilla as rgfrom huggingface_hub import InferenceClientrg.init(api_url="<ARGILLA_SPACE_URL>", api_key="<ARGILLA_OWNER_API_KEY")dataset = rg.FeedbackDataset.from_argilla("<ARGILLA_DATASET>", workspace="<ARGILLA_WORKSPACE>")client = InferenceClient("<HF_INFERENCE_ENDPOINT_URL>", token="<HF_TOKEN>")system_prompt = (  "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible,"  " while being safe. Your answers should not include any harmful, unethical, racist, sexist,"  " toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased"  " and positive in nature.\nIf a question does not make any sense, or is not factually coherent,"  " explain why instead of answering something not correct. If you don't know the answer to a"  " question, please don't share false information.")base_prompt = "<s>[INST] <<SYS>>\n{system_prompt}\n<</SYS>>\n\n{prompt} [/INST]"def generate_response(prompt: str) -> str:  prompt = base_prompt.format(system_prompt=system_prompt, prompt=prompt)  response = client.text_generation(    prompt, details=True, max_new_tokens=512, top_k=30, top_p=0.9,    temperature=0.2, repetition_penalty=1.02, stop_sequences=["</s>"],  )  return response.generated_textfor record in dataset.records:  record.update(    suggestions=[      {        "question_name": "response",        "value": generate_response(prompt=record.fields["prompt"]),        "type": "model",        "agent": "llama-2-7b-hf-chat",      },    ],  )