当前位置：首页 > news >正文

网站首页背景代码黑龙江农垦建设局网站

news 2025/11/4 5:30:57

网站首页背景代码,黑龙江农垦建设局网站,北京学会网站建设,广州联享品牌网站建设GPT-1简介 GPT-1#xff08;Generative Pre-trained Transformer#xff09;是2018年由Open AI提出的一个结合预训练和微调的用于解决文本理解和文本生成任务的模型。它的基础是Transformer架构#xff0c;具有如下创新点#xff1a; NLP领域的迁移学习#xff1a;通过最…GPT-1简介 GPT-1Generative Pre-trained Transformer是2018年由Open AI提出的一个结合预训练和微调的用于解决文本理解和文本生成任务的模型。它的基础是Transformer架构具有如下创新点 NLP领域的迁移学习通过最少的任务专项数据利用预训练模型出色地完成具体的下游任务。语言建模作为预训练任务使用无监督学习和大规模的文本语料库来训练模型为具体任务微调采用预训练模型来适应监督任务和BERT类似GPT-1同样采取pre-train fine-tune的思路先基于大量未标注语料数据进行预训练后基于少量标注数据进行微调。但GPT-1在预训练任务思路和模型结构上与BERT有所差别。 GPT-1的目标是在预训练的过程中根据现有的所有词元预测下一个词元。这个任务被称为“自回归语言建模”。一个简单的例子输入序列为“The sun rises in the” 训练数据的原句子为“The sun rises in the east” 所以我们的目标输出为“east” 将输入序列输入GPT模型GPT根据输入预测下一个词元“east”在语料库中的概率分布正确词元“east”作为一个“伪标签”来帮助模型训练模型架构 GPT主要使用Transformer Decoder架构但因为没有Encoder所以在Transformer Decoder的基础上移除了计算Encoder与Decoder间注意力分数的Multi-Head Attention Layer。 Masked Multi-HeadSelf-Attention Masked Multi-Head Self-Attention 是Multi-Head Attetion的变种。最大的不同来自于MMSA的掩码机制掩码机制防止模型通过观测未来的词元以进行“作弊”。一个掩码词元mask被用于注意力分数矩阵所以当前词元只能注意到序列中自己和自己之前的词元。未来的次元的注意力分数将被设为0以确保其在Softmax步骤后的实际贡献为0。为什么掩码机制非常重要对于自回归任务模型必须线性地生成词元不能基于未来的信息预测下一个词元。损失函数 GPT使用Cross-Entropy Loss作为损失函数交叉熵损失是这项任务的理想选择因为它通过测量预测的概率分布与真实分布的距离来惩罚不正确的预测。它自然适于处理多类分类任务其中模型从大量词汇表中选择一个标记。模型输入 GPT-1的输入同样为句子或句子对并添加Special Tokens。 [BOS]表示句子的开始论文中给出的token表示为[START]添加到序列最前 [EOS]表示序列的结束论文中的给出的[EXTRACT]添加到序列最后在进行分类任务时会将该special token对应的输出接入输出层我们也可以理解为该token可以学习到整个句子的语义信息 [SEP]用于间隔句子对中的两个句子 GPT Embedding 同样分为三类token Embedding、Position Embedding、Segment Embedding GPT-1模型具体参数模型架构 12个Transformer Decoder Blockhidden_size为768模型输入和输出的向量纬度注意力头数为12FFN维度为3072词表Vocab大小为40000序列长度为512上下文窗口长度训练过程 Adam优化器超参数为0.9, 0.99学习率最大学习率2.5x10e-4 使用2000步作为热身随后线性衰退批大小64梯度剪裁1.0Dropout率0.1 训练过程 100000步大约花费8张NVIDIA V100 GPU训练30天共有117M参数。使用Xavier初始化权重衰退为0.01。下游任务 GPT按照生成式的逻辑统一了下游任务的应用模板使用最后一个token[EOS]or[EXTRACT]对应的hidden state输出到额外的输出层中进行分类标签预测。任务包括文本分类情感分类、新闻分类、文本蕴含根据前提推出假设、文本语义相似度、多类选择在多个next token中进行选择基于MindSpore微调GPT-1进行情感分类 # #安装mindnlp 0.4.0套件 # !pip install mindnlp # !pip uninstall soundfile -y # !pip install download # !pip install jieba # !pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/2.3.1/MindSpore/unified/aarch64/mindspore-2.3.1-cp39-cp39-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simpleimport osimport mindspore from mindspore.dataset import text, GeneratorDataset, transforms from mindspore import nnfrom mindnlp.dataset import load_datasetfrom mindnlp.engine import Trainer# loading dataset imdb_ds load_dataset(imdb, split[train, test]) imdb_train imdb_ds[train] imdb_test imdb_ds[test]imdb_train.get_dataset_size()import numpy as npdef process_dataset(dataset, tokenizer, max_seq_len512, batch_size4, shuffleFalse):is_ascend mindspore.get_context(device_target) Ascenddef tokenize(text):if is_ascend:tokenized tokenizer(text, paddingmax_length, truncationTrue, max_lengthmax_seq_len)else:tokenized tokenizer(text, truncationTrue, max_lengthmax_seq_len)return tokenized[input_ids], tokenized[attention_mask]if shuffle:dataset dataset.shuffle(batch_size)# map datasetdataset dataset.map(operations[tokenize], input_columnstext, output_columns[input_ids, attention_mask])dataset dataset.map(operationstransforms.TypeCast(mindspore.int32), input_columnslabel, output_columnslabels)# batch datasetif is_ascend:dataset dataset.batch(batch_size)else:dataset dataset.padded_batch(batch_size, pad_info{input_ids: (None, tokenizer.pad_token_id),attention_mask: (None, 0)})return datasetfrom mindnlp.transformers import OpenAIGPTTokenizer # tokenizer gpt_tokenizer OpenAIGPTTokenizer.from_pretrained(openai-gpt)# add sepcial token: PAD special_tokens_dict {bos_token: bos,eos_token: eos,pad_token: pad, } num_added_toks gpt_tokenizer.add_special_tokens(special_tokens_dict)#为方便体验流程把原本数据集的十分之一拿出来体验训练和评估, imdb_train, _ imdb_train.split([0.1, 0.9], randomizeFalse)# split train dataset into train and valid datasets imdb_train, imdb_val imdb_train.split([0.7, 0.3])dataset_train process_dataset(imdb_train, gpt_tokenizer, shuffleTrue) dataset_val process_dataset(imdb_val, gpt_tokenizer) dataset_test process_dataset(imdb_test, gpt_tokenizer)# load GPT sequence classification model and set class2 from mindnlp.transformers import OpenAIGPTForSequenceClassification # Import the GPT model for sequence classification from mindnlp import evaluate # Import the evaluation module from MindNLP import numpy as np # Import NumPy for numerical operations# Set up the GPT model for sequence classification with 2 output labels (binary classification). model OpenAIGPTForSequenceClassification.from_pretrained(openai-gpt, num_labels2)# Set the padding token ID in the model configuration to match the tokenizers padding token ID. model.config.pad_token_id gpt_tokenizer.pad_token_id# Resize the token embedding layer to account for any added tokens (e.g., special tokens). model.resize_token_embeddings(model.config.vocab_size 3)from mindnlp.engine import TrainingArguments # Import training arguments for model training configuration.# Define training arguments. training_args TrainingArguments(output_dirgpt_imdb_finetune, # Directory to save model checkpoints and outputs.evaluation_strategyepoch, # Evaluate the model at the end of each epoch.save_strategyepoch, # Save model checkpoints at the end of each epoch.logging_strategyepoch, # Log metrics and progress at the end of each epoch.load_best_model_at_endTrue, # Automatically load the best model (based on evaluation metrics) at the end of training.num_train_epochs1.0, # Number of training epochs (default is 1 for quick experimentation).learning_rate2e-5 # Learning rate for the optimizer. )# Load the accuracy metric for evaluation. metric evaluate.load(accuracy)# Define a function to compute metrics during evaluation. def compute_metrics(eval_pred):logits, labels eval_pred # Unpack predictions (logits) and true labels.predictions np.argmax(logits, axis-1) # Convert logits to class predictions using argmax.return metric.compute(predictionspredictions, referenceslabels) # Compute accuracy metric.# Initialize the Trainer class with the model, training arguments, datasets, and metric computation function. trainer Trainer(modelmodel, # The GPT model to be fine-tuned.argstraining_args, # Training configuration arguments.train_datasetdataset_train, # Training dataset (must be preprocessed and tokenized).eval_datasetdataset_val, # Validation dataset for evaluation.compute_metricscompute_metrics # Metric computation function for evaluation. )# start training trainer.train()trainer.evaluate(dataset_test)

查看全文

http://www.ho-use.cn/article/10817674.html