如何在本地托管大型语言模型

Jun 5, 2023

大型语言模型 (LLM) 是人工智能(AI)的一个子集，旨在理解和生成自然语言。这些模型基于大量文本数据进行训练，并使用复杂的算法来学习语言模式。

作者:区块新看点（chengwf.com），WEB3 DID：link3.to/chengwf88888

大型语言模型 (LLM) 是人工智能(AI)的一个子集，旨在理解和生成自然语言。这些模型基于大量文本数据进行训练，并使用复杂的算法来学习语言模式。法学硕士已被用于开发一系列应用程序，从聊天机器人到语言翻译服务。在本文中，我们将讨论如何在单台计算机上创建 LLM 并提供源代码示例。

先决条件

在我们开始之前，必须确保您的计算机满足以下要求：

一台功能强大的计算机，至少有 16GB RAM 和 1TB 可用存储空间
具有至少 8GB 显存的现代 GPU，例如 NVIDIA GeForce 或 AMD Radeon
Linux 操作系统，例如 Ubuntu 或 CentOS
命令行界面 (CLI) 和 Python 编程语言的基础知识

第 1 步：安装 Anaconda

Anaconda 是一个流行的数据科学和机器学习开源平台。它包括我们将用来创建 LLM 的各种工具和库。要安装 Anaconda，请执行以下步骤：

从官网下载Anaconda安装器： https: //www.anaconda.com/products/individual
打开终端并导航到下载安装程序的目录
通过键入以下命令运行安装程序：bash Anaconda3–2021.11-Linux-x86_64.sh
按照屏幕上的说明完成安装

第 2 步：创建Python环境

我们将创建一个 Python 环境来安装 LLM 所需的库和依赖项。要创建环境，请执行以下步骤：

打开终端并输入以下命令：conda create –name lm python=3.8
输入以下命令激活环境：conda activate lm

第 3 步：安装 TensorFlow

TensorFlow 是一个用于构建和训练机器学习模型的开源平台。我们将使用 TensorFlow 创建 LLM。要安装 TensorFlow，请按照以下步骤操作：

在终端中输入以下命令：pip install tensorflow
通过在 Python 中导入来验证是否安装了 TensorFlow：import tensorflow as tf

第 4 步：下载预训练的 LLM

从头开始训练 LLM 需要大量的计算能力和时间。幸运的是，可以使用预训练模型，我们可以针对我们的特定用例进行微调。最受欢迎的预训练 LLM 之一是 GPT-2（生成预训练 Transformer 2）。要下载预训练的 GPT-2 模型，请按照以下步骤操作：

打开终端并输入以下命令：git clone https://github.com/openai/gpt-2.git
通过键入以下内容导航到 gpt-2 目录：cd gpt-2
键入以下命令下载预训练模型：python download_model.py 117M

第 5 步：微调预训练的 LLM

微调预训练的 LLM 涉及针对特定任务在特定数据集上训练模型。在我们的示例中，我们将在文本生成任务上微调预训练的 GPT-2 模型。要微调模型，请执行以下步骤：

通过键入以下命令为微调模型创建一个新目录：mkdir my_model
通过键入以下内容导航到 my_model 目录：cd my_model
通过键入以下命令创建一个新的 Python 文件：touch train.py
在文本编辑器中打开train.py文件并粘贴以下代码：

import tensorflow as tf
import numpy as np
import os
import json
import random
import time
import argparse


#Define the command-line arguments


parser = argparse.ArgumentParser()

parser.add_argument("--dataset_path", type=str, required=True, help="Path to the dataset") parser.add_argument("--model_path", type=str, required=True, help="Path to the pre-trained model") parser.add_argument("--output_path", type=str, required=True, help="Path to save the fine-tuned model")


parser.add_argument("--batch_size", type=int, default=16, help="Batch size for training") parser.add_argument("--epochs", type=int, default=1, help="Number of epochs to train for")

args = parser.parse_args()


#Load the pre-trained GPT-2 model


with open(os.path.join(args.model_path, "hparams.json"), "r") as f:

hparams = json.load(f)

model = tf.compat.v1.estimator.Estimator( model_fn=model_fn, model_dir=args.output_path, params=hparams, config=tf.compat.v1.estimator.RunConfig( save_checkpoints_steps=5000, keep_checkpoint_max=10, save_summary_steps=5000 ))


#Define the input function for the dataset


def input_fn(mode):

dataset = tf.data.TextLineDataset(args.dataset_path)

dataset = dataset.repeat()

dataset = dataset.shuffle(buffer_size=10000)

dataset = dataset.batch(args.batch_size)

dataset = dataset.map(lambda x: tf.strings.substr(x, 0, hparams["n_ctx"]))

iterator = dataset.make_one_shot_iterator()

return iterator.get_next()


#Define the training function


def train(): for epoch in range(args.epochs):

model.train(input_fn=lambda: input_fn(tf.estimator.ModeKeys.TRAIN)) print(f"Epoch {epoch+1} completed.")


#Start the training


train()

在上面的代码中，我们为数据集路径、预训练模型路径、输出路径、批量大小和要训练的轮数定义了命令行参数。然后我们加载预训练的 GPT-2 模型并定义数据集的输入函数。最后，我们定义训练函数并开始训练。

第 6 步：使用微调的 LLM 生成文本

一旦 LLM 被微调，我们就可以用它来生成文本。要生成文本，请按照下列步骤操作：

打开终端并通过键入以下内容导航到 my_model 目录：cd my_model
通过键入以下命令创建一个新的 Python 文件：touch generate.py
在文本编辑器中打开generate.py文件并粘贴以下代码：

rse_args()


#Load the fine-tuned model


with open(os.path.join(args.model_path, "hparams.json"), "r") as f:

hparams = json.load(f)

model_fn = model_fn(hparams, tf.estimator.ModeKeys.PREDICT)

model = tf.compat.v1.estimator.Estimator( model_fn=model_fn, model_dir=args.model_path, params=hparams )


#Define the generation function


def generate_text(length, temperature):

start_token = "<|startoftext|>"

tokens = tokenizer.convert_tokens_to_ids([start_token])

token_length = len(tokens)

while token_length < length:

prediction_input = np.array(tokens[-hparams["n_ctx"]:])

output = list(model.predict(input_fn=lambda: [[prediction_input]]))[0]["logits"]

logits = output[-1] / temperature

logits = logits - np.max(logits)

probs = np.exp(logits) / np.sum(np.exp(logits))

token = np.random.choice(range(hparams["n_vocab"]), p=probs)

tokens.append(token)

token_length += 1

output_text = tokenizer.convert_ids_to_tokens(tokens)

output_text = "".join(output_text).replace("▁", " ")

output_text = output_text.replace(start_token, "")

return output_text


#Generate text


text = generate_text(args.length, args.temperature)


print(text)

在上面的代码中，我们定义了微调模型路径的命令行参数、生成文本的长度以及文本生成的温度。然后我们加载微调模型并定义生成函数。最后，我们使用 generate_text 函数生成文本并打印输出，我们就完成了！

结论

在本文中，我们学习了如何使用 TensorFlow 和 GPT-2 架构在单台计算机上创建大型语言模型 (LLM)。我们首先安装 TensorFlow 并从 OpenAI GitHub 存储库下载 GPT-2 代码。然后，我们在数据集上训练了一个 GPT-2 模型，并使用相同的数据集对预训练的 GPT-2 模型进行了微调。最后，我们使用微调的 LLM 生成文本。

按照本文中的步骤，您可以创建自己的 LLM 并为各种任务生成文本，例如语言翻译、聊天机器人和内容生成。

参考

Radford, A.、Wu, J.、Child, R.、Luan, D.、Amodei, D. 和 Sutskever, I.（2019 年）。语言模型是无监督的多任务学习者。OpenAI 博客，1(8)，9。
开放人工智能。（2019）。GPT-2：语言模型是无监督的多任务学习者。Github。取自https://github.com/openai/gpt-2。
张量流。(2021)。TensorFlow：适用于所有人的开源机器学习框架。取自https://www.tensorflow.org/。
Brown, TB、Mann, B.、Ryder, N.、Subbiah, M.、Kaplan, J.、Dhariwal, P., … & Amodei, D. (2020)。语言模型是少样本学习者。

-END-

我的文章不為圖利設限，全部開放閱讀。如你喜歡本文，請收藏它的 Writing NFT，支持寫作，保育新聞。

原文链接：《如何在本地托管大型语言模型》

喜欢我的创作和项目分享吗？创作并不容易，别忘了给予更大挺喜欢的地与赞赏，以给大家带来WEB3共富之路！

作者WEB3 DID：link3.to/chengwf88888

赞赏ETH：chengwf.eth

CC BY-NC-ND 2.0

喜欢我的创作和项目分享吗？创作并不容易，别忘了给予更大挺喜欢的地与赞赏，以给大家带来WEB3共富之路！作者WEB DID：link3.to/chengwf88888,赞赏ETH：chengwf.eth