diff --git "a/docs/\347\254\254\345\215\201\347\253\240/NLP\345\237\272\347\241\200.ipynb" "b/docs/\347\254\254\345\215\201\347\253\240/NLP\345\237\272\347\241\200.ipynb" new file mode 100644 index 000000000..498882d2c --- /dev/null +++ "b/docs/\347\254\254\345\215\201\347\253\240/NLP\345\237\272\347\241\200.ipynb" @@ -0,0 +1,720 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "119ec186", + "metadata": {}, + "source": [ + "# 词嵌入(概念部分)" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "f8e5639e", + "metadata": {}, + "source": [ + "### 在了解什么是词嵌入之前,我们可以思考一下计算机如何识别人类的输入? \n", + "计算机通过将输入信息解析为0和1这般的二进制编码,从而将人类语言转化为机器语言,进行理解。 \n", + "我们先引入一个概念**one-hot编码**,也称为**独热编码**,在给定维度的情况下,一行向量有且仅有一个值为1,例如维度为5的向量[0,0,0,0,1] \n", + "例如,我们在幼儿园或小学学习汉语的时候,首先先识字和词,字和词就会保存在我们的大脑中的某处。
\n", + "\n", + "
一个小朋友刚学会了四个字和词-->[我] [特别] [喜欢] [学习]
\n", + "\n", + "我们的计算机就可以为小朋友开辟一个词向量维度为4的独热编码 \n", + "对于中文 我们先进行分词 我 特别 喜欢 学习 \n", + "那么我们就可以令 我->[1 0 0 0] 特别 ->[0 1 0 0] 喜欢->[0 0 1 0] 学习->[0 0 0 1] \n", + "现在给出一句话 我喜欢学习,那么计算机给出的词向量->[1 0 1 1] \n", + "我们可以思考几个问题: \n", + "1.如果小朋友词汇量越学越多,学到了成千上万个词之后,我们使用上述方法构建的词向量就会有非常大的维度,并且是一个稀疏向量。 \n", + "2.在中文中 诸如 能 会 可以 这样同义词,我们如果使用独热编码,它们是正交的,缺乏词之间的相似性,很难把他们联系到一起。 \n", + "因此我们认为独热编码不是一个很好的词嵌入方法。 \n", + "\n", + "我们再来介绍一下 **稠密表示** \n", + "稠密表示的格式如one-hot编码一致,但数值却不同,如 [0.45,0.65,0.14,1.15,0.97] " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "4db86da3", + "metadata": {}, + "source": [ + "# Bag of Words词袋表示" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "44dc9252", + "metadata": {}, + "source": [ + "  词袋表示顾名思义,我们往一个袋子中装入我们的词汇,构成一个词袋,当我们想表达的时候,我们将其取出,构建词袋的方法可以有如下形式。" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "823f8f2d", + "metadata": {}, + "outputs": [], + "source": [ + "corpus = [\"i like reading\", \"i love drinking\", \"i hate playing\", \"i do nlp\"]#我们的语料库\n", + "word_list = ' '.join(corpus).split()\n", + "word_list = list(sorted(set(word_list)))\n", + "word_dict = {w: i for i, w in enumerate(word_list)}\n", + "number_dict = {i: w for i, w in enumerate(word_list)}" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "8eaeb37d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'do': 0,\n", + " 'drinking': 1,\n", + " 'hate': 2,\n", + " 'i': 3,\n", + " 'like': 4,\n", + " 'love': 5,\n", + " 'nlp': 6,\n", + " 'playing': 7,\n", + " 'reading': 8}" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word_dict" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "2bf380c8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{0: 'do',\n", + " 1: 'drinking',\n", + " 2: 'hate',\n", + " 3: 'i',\n", + " 4: 'like',\n", + " 5: 'love',\n", + " 6: 'nlp',\n", + " 7: 'playing',\n", + " 8: 'reading'}" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "number_dict" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "90e0ef43", + "metadata": {}, + "source": [ + "根据如上形式,我们可以构建一个维度为9的one&-hot编码,如下(除了可以使用np.eye构建,也可以通过sklearn的库调用)" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "9821ed2a", + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "voc_size = len(word_dict)\n", + "bow = []\n", + "for i,name in enumerate(word_dict):\n", + " bow.append(np.eye(voc_size)[word_dict[name]])" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "03f1f12f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[array([1., 0., 0., 0., 0., 0., 0., 0., 0.]),\n", + " array([0., 1., 0., 0., 0., 0., 0., 0., 0.]),\n", + " array([0., 0., 1., 0., 0., 0., 0., 0., 0.]),\n", + " array([0., 0., 0., 1., 0., 0., 0., 0., 0.]),\n", + " array([0., 0., 0., 0., 1., 0., 0., 0., 0.]),\n", + " array([0., 0., 0., 0., 0., 1., 0., 0., 0.]),\n", + " array([0., 0., 0., 0., 0., 0., 1., 0., 0.]),\n", + " array([0., 0., 0., 0., 0., 0., 0., 1., 0.]),\n", + " array([0., 0., 0., 0., 0., 0., 0., 0., 1.])]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "bow" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "086a5fd2", + "metadata": {}, + "source": [ + "# N-gram:基于统计的语言模型\n", + "N-gram 模型是一种自然语言处理模型,它利用了语言中词语之间的相关性来预测下一个出现的词语。N-gram 模型通过对一段文本中连续出现的 n 个词语进行建模,来预测文本中接下来出现的词语。比如,如果一个文本中包含连续出现的词语“the cat sat on”,那么 N-gram 模型可能会预测接下来的词语是“the mat”或“a hat”。 \n", + "\n", + "N-gram 模型的精确性取决于用于训练模型的文本的质量和数量。如果用于训练模型的文本包含大量的语言纠错和拼写错误,那么模型的预测结果也可能不准确。此外,如果用于训练模型的文本量较少,那么模型也可能无法充分捕捉到语言中的复杂性。 \n", + "\n", + "**N-gram 模型的优点:**\n", + "\n", + "简单易用,N-gram 模型的概念非常简单,实现起来也很容易。 \n", + "能够捕捉到语言中的相关性,N-gram 模型通过考虑连续出现的 n 个词语来预测下一个词语,因此它能够捕捉到语言中词语之间的相关性。 \n", + "可以使用已有的语料库进行训练,N-gram 模型可以使用已有的大量语料库进行训练,例如 Google 的 N-gram 数据库,这样可以大大提高模型的准确性。 \n", + "\n", + "**N-gram 模型的缺点:**\n", + "\n", + "对于短文本数据集不适用,N-gram 模型需要大量的文本数据进行训练,因此对于短文本数据集可能无法达到较高的准确性。 \n", + "容易受到噪声和语言纠错的影响,N-gram 模型是基于语料库进行训练的,如果语料库中包含大量的语言纠错和拼写错误,那么模型的预测结果也可能不准确。 \n", + "无法捕捉到语言中的非线性关系,N-gram 模型假设语言中的关系是线性的,但事实上语言中可能存在复杂的非线性关系,N-gram 模型无法捕捉到这些关系。 " + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1f5ad65b", + "metadata": {}, + "source": [ + "# NNLM:前馈神经网络语言模型\n", + "下面通过前馈神经网络模型来**展示滑动**窗口的使用" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "7bddfa77", + "metadata": {}, + "outputs": [], + "source": [ + "#导入必要的库\n", + "import numpy as np\n", + "import torch\n", + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "from tqdm import tqdm\n", + "from torch.autograd import Variable\n", + "dtype = torch.FloatTensor" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "29f23588", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['i',\n", + " 'like',\n", + " 'reading',\n", + " 'i',\n", + " 'love',\n", + " 'drinking',\n", + " 'i',\n", + " 'hate',\n", + " 'playing',\n", + " 'i',\n", + " 'do',\n", + " 'nlp']" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "corpus = [\"i like reading\", \"i love drinking\", \"i hate playing\", \"i do nlp\"]\n", + "\n", + "word_list = ' '.join(corpus).split()\n", + "word_list" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "12b58886", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 1000 cost = 1.010682\n", + "epoch: 2000 cost = 0.695155\n", + "epoch: 3000 cost = 0.597085\n", + "epoch: 4000 cost = 0.531892\n", + "epoch: 5000 cost = 0.376044\n", + "epoch: 6000 cost = 0.118038\n", + "epoch: 7000 cost = 0.077081\n", + "epoch: 8000 cost = 0.053636\n", + "epoch: 9000 cost = 0.038089\n", + "epoch: 10000 cost = 0.027224\n", + "[['i', 'like'], ['i', 'love'], ['i', 'hate'], ['i', 'do']] -> ['studying', 'datawhale', 'playing', 'nlp']\n" + ] + } + ], + "source": [ + "#构建我们需要的语料库\n", + "corpus = [\"i like studying\", \"i love datawhale\", \"i hate playing\", \"i do nlp\"]\n", + "\n", + "word_list = ' '.join(corpus).split() #将语料库转化为一个个单词 ,如['i', 'like', 'reading', 'i', ...,'nlp']\n", + "word_list = list(sorted(set(word_list))) #用set去重后转化为链表\n", + "# print(word_list)\n", + "\n", + "word_dict = {w: i for i, w in enumerate(word_list)} #将词表转化为字典 这边是词对应到index\n", + "number_dict = {i: w for i, w in enumerate(word_list)}#这边是index对应到词\n", + "# print(word_dict)\n", + "# print(number_dict)\n", + "\n", + "n_class = len(word_dict) #计算出我们词表的大小,用于后面词向量的构建\n", + "\n", + "m = 2 #词嵌入维度\n", + "n_step = 2 #滑动窗口的大小\n", + "n_hidden = 2 #隐藏层的维度为2\n", + "\n", + "\n", + "def make_batch(sentence): #由于语料库较小,我们象征性将训练集按照批次处理 \n", + " input_batch = []\n", + " target_batch = []\n", + "\n", + " for sen in sentence:\n", + " word = sen.split()\n", + " input = [word_dict[n] for n in word[:-1]]\n", + " target = word_dict[word[-1]]\n", + "\n", + " input_batch.append(input)\n", + " target_batch.append(target)\n", + "\n", + " return input_batch, target_batch\n", + "\n", + "\n", + "class NNLM(nn.Module): #搭建一个NNLM语言模型\n", + " def __init__(self):\n", + " super(NNLM, self).__init__()\n", + " self.embed = nn.Embedding(n_class, m)\n", + " self.W = nn.Parameter(torch.randn(n_step * m, n_hidden).type(dtype))\n", + " self.d = nn.Parameter(torch.randn(n_hidden).type(dtype))\n", + "\n", + " self.U = nn.Parameter(torch.randn(n_hidden, n_class).type(dtype))\n", + " self.b = nn.Parameter(torch.randn(n_class).type(dtype))\n", + "\n", + " def forward(self, x):\n", + " x = self.embed(x) # 4 x 2 x 2\n", + " x = x.view(-1, n_step * m)\n", + " tanh = torch.tanh(self.d + torch.mm(x, self.W)) # 4 x 2\n", + " output = self.b + torch.mm(tanh, self.U)\n", + " return output\n", + "\n", + "model = NNLM()\n", + "\n", + "criterion = nn.CrossEntropyLoss() #损失函数的设置\n", + "optimizer = optim.Adam(model.parameters(), lr=0.001) #优化器的设置\n", + "\n", + "input_batch, target_batch = make_batch(corpus) #训练集和标签值\n", + "input_batch = Variable(torch.LongTensor(input_batch))\n", + "target_batch = Variable(torch.LongTensor(target_batch))\n", + "\n", + "for epoch in range(10000): #训练过程\n", + " optimizer.zero_grad()\n", + "\n", + " output = model(input_batch) # input: 4 x 2\n", + "\n", + " loss = criterion(output, target_batch)\n", + "\n", + " if (epoch + 1) % 1000 == 0:\n", + " print('epoch:', '%04d' % (epoch + 1), 'cost = {:.6f}'.format(loss.item()))\n", + "\n", + " loss.backward()\n", + " optimizer.step()\n", + "\n", + "predict = model(input_batch).data.max(1, keepdim=True)[1]#模型预测过程\n", + "\n", + "print([sen.split()[:2] for sen in corpus], '->', [number_dict[n.item()] for n in predict.squeeze()])" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "93d8cd2f", + "metadata": {}, + "source": [ + "# Word2Vec模型:主要采用Skip-gram和Cbow两种模式\n", + "前文提到的distributed representation稠密向量表达可以用Word2Vec模型进行训练得到。 \n", + "skip-gram模型(跳字模型)是用中心词去预测周围词 \n", + "cbow模型(连续词袋模型)是用周围词预测中心词 " + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "066f68a0", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 11%|█ | 10615/100000 [00:02<00:24, 3657.80it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 10000 cost = 1.955088\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 21%|██ | 20729/100000 [00:05<00:21, 3758.47it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 20000 cost = 1.673096\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 30%|███ | 30438/100000 [00:08<00:18, 3710.13it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 30000 cost = 2.247422\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 41%|████ | 40638/100000 [00:11<00:15, 3767.87it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 40000 cost = 2.289902\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 50%|█████ | 50486/100000 [00:13<00:13, 3713.98it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 50000 cost = 2.396217\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 61%|██████ | 60572/100000 [00:16<00:11, 3450.47it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 60000 cost = 1.539688\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 71%|███████ | 70638/100000 [00:19<00:07, 3809.11it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 70000 cost = 1.638879\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 80%|████████ | 80403/100000 [00:21<00:05, 3740.33it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 80000 cost = 2.279797\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + " 90%|█████████ | 90480/100000 [00:24<00:02, 3680.03it/s]" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 90000 cost = 1.992100\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████| 100000/100000 [00:27<00:00, 3677.35it/s]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "epoch: 100000 cost = 1.307715\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "打印\n" + ] + }, + { + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import torch.nn as nn\n", + "import torch.optim as optim\n", + "from torch.autograd import variable\n", + "import numpy as np\n", + "import torch\n", + "import matplotlib.pyplot as plt\n", + "from tqdm import tqdm\n", + "\n", + "dtype = torch.FloatTensor\n", + "#我们使用的语料库 \n", + "sentences = ['i like dog','i like cat','i like animal','dog is animal','cat is animal',\n", + " 'dog like meat','cat like meat','cat like fish','dog like meat','i like apple',\n", + " 'i hate apple','i like movie','i like read','dog like bark','dog like cat']\n", + "\n", + "\n", + "\n", + "word_sequence = ' '.join(sentences).split() #将语料库的每一句话的每一个词转化为列表 \n", + "#print(word_sequence)\n", + "\n", + "word_list = list(set(word_sequence)) #构建我们的词表 \n", + "#print(word_list)\n", + "\n", + "#word_voc = list(set(word_sequence)) \n", + "\n", + "#接下来对此表中的每一个词编号 这就用到了我们之前提到的one-hot编码 \n", + "\n", + "#词典 词对应着编号\n", + "word_dict = {w:i for i,w in enumerate(word_list)}\n", + "#print(word_dict)\n", + "#编号对应着词\n", + "index_dict = {i:w for w,i in enumerate(word_list)}\n", + "#print(index_dict)\n", + "\n", + "\n", + "batch_size = 2\n", + "voc_size = len(word_list)\n", + "\n", + "skip_grams = []\n", + "for i in range(1,len(word_sequence)-1,3):\n", + " target = word_dict[word_sequence[i]] #当前词对应的id\n", + " context = [word_dict[word_sequence[i-1]],word_dict[word_sequence[i+1]]] #两个上下文词对应的id\n", + "\n", + " for w in context:\n", + " skip_grams.append([target,w])\n", + "\n", + "embedding_size = 10 \n", + "\n", + "\n", + "class Word2Vec(nn.Module):\n", + " def __init__(self):\n", + " super(Word2Vec,self).__init__()\n", + " self.W1 = nn.Parameter(torch.rand(len(word_dict),embedding_size)).type(dtype) \n", + " #将词的one-hot编码对应到词向量中\n", + " self.W2 = nn.Parameter(torch.rand(embedding_size,voc_size)).type(dtype)\n", + " #将词向量 转化为 输出 \n", + " def forward(self,x):\n", + " hidden_layer = torch.matmul(x,self.W1)\n", + " output_layer = torch.matmul(hidden_layer,self.W2)\n", + " return output_layer\n", + "\n", + "\n", + "model = Word2Vec()\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = optim.Adam(model.parameters(),lr=1e-5)\n", + "\n", + "#print(len(skip_grams))\n", + "#训练函数\n", + "\n", + "def random_batch(data,size):\n", + " random_inputs = []\n", + " random_labels = []\n", + " random_index = np.random.choice(range(len(data)),size,replace=False)\n", + " \n", + " for i in random_index:\n", + " random_inputs.append(np.eye(voc_size)[data[i][0]]) #从一个单位矩阵生成one-hot表示\n", + " random_labels.append(data[i][1])\n", + " \n", + " return random_inputs,random_labels\n", + "\n", + "for epoch in tqdm(range(100000)):\n", + " input_batch,target_batch = random_batch(skip_grams,batch_size) # X -> y\n", + " input_batch = torch.Tensor(input_batch)\n", + " target_batch = torch.LongTensor(target_batch)\n", + "\n", + " optimizer.zero_grad()\n", + "\n", + " output = model(input_batch)\n", + "\n", + " loss = criterion(output,target_batch)\n", + " if((epoch+1)%10000==0):\n", + " print(\"epoch:\",\"%04d\" %(epoch+1),'cost =' ,'{:.6f}'.format(loss))\n", + "\n", + " loss.backward() \n", + " optimizer.step()\n", + "\n", + "for i , label in enumerate(word_list):\n", + " W1,_ = model.parameters()\n", + " x,y = float(W1[i][0]),float(W1[i][1])\n", + " plt.scatter(x,y)\n", + " plt.annotate(label,xy=(x,y),xytext=(5,2),textcoords='offset points',ha='right',va='bottom')\n", + "plt.show()" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "1edccf25", + "metadata": {}, + "source": [ + "在自然语言处理领域,常见的评价指标包括以下几种: \n", + "\n", + "**准确率(Accuracy)**: \n", + "准确率是最简单和常见的评价指标之一,用于度量模型在整体样本集上正确分类的比例。 \n", + "\n", + "**精确率(Precision)和召回率(Recall)**: \n", + "精确率和召回率是用于评估二分类模型性能的指标。精确率指的是模型预测为正例中真正为正例的比例,而召回率指的是真正为正例中被模型预测为正例的比例。 \n", + "\n", + "**F1值(F1-Score)**: \n", + "F1值是精确率和召回率的调和均值,综合了两者的评估结果。F1值越高,代表模型在精确率和召回率之间取得了更好的平衡。 \n", + "\n", + "**混淆矩阵(Confusion Matrix)**: \n", + "混淆矩阵是用于可视化二分类模型性能的矩阵。它将实际类别与模型预测类别的结果进行交叉统计,可以计算出准确率、精确率、召回率等指标。 \n", + "\n", + "**ROC曲线和AUC值(Receiver Operating Characteristic Curve and Area Under Curve)**: \n", + "ROC曲线是以不同的分类阈值为基础,绘制出真正例率(True Positive Rate)和假正例率(False Positive Rate)之间的关系曲线。\n", + "AUC值表示ROC曲线下的面积,用于度量模型在不同阈值下的分类性能。 \n", + "\n", + "**BLEU评估(Bilingual Evaluation Understudy)**: \n", + "BLEU评估用于评估机器翻译质量的指标,通过比较候选翻译与参考翻译之间的词语重叠度来计算得分。 \n", + "\n", + "**困惑度(Perplexity)**: \n", + "困惑度常用于语言模型的评估,表示模型对给定序列进行预测的困难程度。困惑度越低,代表模型对输入序列的预测越准确。 \n", + "\n", + "这些评价指标并不是固定的,具体使用哪些指标取决于任务类型和需求。在不同的自然语言处理任务中,还可能会有其他特定的评价指标被使用。 " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "pytorch", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.9 (default, Aug 31 2020, 12:42:55) \n[GCC 7.3.0]" + }, + "vscode": { + "interpreter": { + "hash": "7648c2b9d25760d0d65f53f9b9a34de48caa24d8265d64b0ff81e2f2641d528d" + } + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}