Ai归档 - 第4页共4页 - laumy的学习笔记

层与块

简单来说，如下图，第一个图中间5个神经元组成了一个层。第二图3个层组成了块。第三个图中3个块组成了整个模型。层层是神经网络的基本计算单元，负责对输入数据进行特定形式的变换，如线性映射、非线性激活等。其主要的功能是接收输入数据，生成输出结果。其中包含学习参数（如全连接层的权重和偏置）或无参数操作（如激活函数），输出形状可能与输入不同，例如全连接层将维度din映射到dout。全连接层 layer = nn.Linear(4, 5) # 输入维度4，输出维度5 X = torch.randn(3, 4) # 输入形状(3,4) output = layer(X) # 输出形状(3,5) :ml-citation{ref="1,3" data="citationList"} nn.Linerar(4, 5)：这里传入两个参数，第一个参数表示输入数据特征维度（示例是4），第二个参数表示输出特征维度(示例是5)。注意这里是特征维度，而不是样本个数,比如这里的特征维度是4，可以输入[2,4],[6,4]即2行4列或6行4列的数据样本。激活函数层 layer = nn.ReLU() # 无参数操作 output = layer(torch.tensor([-1, 2, -3])) # 输出[0, 2, 0] :ml-citation{ref="3,5" data="citationList"} 激活函数层也是单独的一层。激活函数层是神经网络中用于引入非线性的部分，它的作用是帮助网络学习到更加复杂的函数映射。没有激活函数，神经网络只能表示线性函数，而引入非线性后，神经网络可以表示更复杂的模式，从而在各种任务（如分类、回归等）中表现得更好。自定义层在神经网络中，自定义层是用户根据具体任务需求自定义实现的层。与内置层（全连接层、卷积层）不同，自定义层可以根据特定的逻辑或行为来扩展模型。它允许你在训练和推理过程中执行特殊的操作或改变标准层的行为。使用自定义层可以使某些模型进行特殊计算，比如自定义正则化、损失函数或特殊的激活函数等。在pytorch中如何实现自定义层，通常是通过继承torch.nn.Module类来实现的，需要定义的内容如下： init:定义层需要的参数或子层。 forward:定义数据如何通过该层传递并执行相应的计算。无参数层无参数层不包含任何需要训练的参数，通常用于执行某些固定的操作或计算。比如激活函数、归一化操作、数学变换等。 import torch import torch.nn as nn #继承nn.Module class custom_relu(nn.Module): def __init__(self): super().__init__() def forward(self, x): return torch.maximum(x, torch.tensor(0.0)) layer = custom_relu() input_data = torch.randn(3, 3) print(input_data) output_data = layer(input_data) print(output_data) 代码运行结果如下： tensor([[ 0.9986, -0.8549, -0.2031], [ 0.8380, 0.6925, -0.9164], [ 0.5807, -0.5719, 1.1864]]) tensor([[0.9986, 0.0000, 0.0000], [0.8380, 0.6925, 0.0000], [0.5807, 0.0000, 1.1864]]) 带参数的层参数层包含可学习的按时，通常执行一些依赖于权重或偏置的计算，比如线性变换、卷积等。参数层通常会在训练过程中优化这些参数。 import torch import torch.nn as nn class custom_linear_layer(nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.weights = nn.Parameter(torch.randn(input_dim, output_dim)) self.bias = nn.Parameter(torch.randn(output_dim)) def forward(self, x): return torch.matmul(x, self.weights) + self.bias layer = custom_linear_layer(3, 2) input_data = torch.randn(5, 3) print(input_data) output_data = layer(input_data) print(output_data) 运行结果如下： tensor([[-0.7047, 1.8763, 1.8934], [-0.1341, 0.4411, 0.2252], [ 1.0531, 0.2556, -0.0045], [-0.9485, 1.9396, -0.3373], [-0.4364, 0.4522, -0.3176]]) tensor([[ 2.2790, -0.5707], [ 0.0157, -0.3939], [-0.7449, -0.6362], [ 0.1973, -1.3335], [-0.3929, -0.5201]], grad_fn=<AddBackward0>) 块块是由多个层组成的复合模块，用于封装重复或复杂功能的代码逻辑，实现模型结构的模块化。包含前向传播逻辑forward的方法、可嵌套其他块或层形成层次化的结构，继承自nn.Module，支持参数管理和自动梯度计算。 Sequential容器 block = nn.Sequential( nn.Linear(4, 5), nn.ReLU(), nn.Linear(5, 3) ) # 包含3个子层：线性→激活→线性 :ml-citation{ref="6,7" data="citationList"} Sequential容器用于按顺序定义一个神经网络模块，它将各个子模块按照定义顺序组合在一起，从而实现前向传播。输入：假设输入时一个形状为(batch_size, 4)的张量，表示batch_size个样本，每个样本有4个特征。第一个线性层：输入通过第一个nn.Linear(4, 5), 输出形状变为(batch_size, 5)。 ReLu激活函数：输出经过第一个nn.ReLU,所有负数变为0，正数保持不变，输出仍为形状(batch_size, 5)。第二个线性层：经过第二个nn.Linear(5, 3)，输出形状变为(batch_size， 3)。 import torch import torch.nn as nn block = nn.Sequential( nn.Linear(4, 5), # 输入样本是4个特征，转换为5个特征 nn.ReLU(), nn.Linear(5, 3)) #输出3个特征 input_data = torch.randn(2, 4) print(input_data) output_data = block(input_data) print(output_data) 运行结果： tensor([[ 0.3054, 1.0160, -1.7137, -0.3744], [-0.6882, -0.3049, -1.2769, 0.2835]]) tensor([[ 0.4485, 0.6298, -0.1949], [ 0.1992, 0.1609, -0.2480]], grad_fn=<AddmmBackward0>) 自定义块在pytorch中，自定义块通常是通过继承nn.Module创建的自定义或模型块。可以根据需要组合多个操作或实现一些特定功能，创建属于自己的网络模块。如何创建自定义块了？基层nn.Module: 需要先继承nn.Module，这是pytorch中所有神经网络模块的基类。定义init方法：在init方法中定义层，例如nn.Linear、nn.Conv2d等操作并初始化他们。定义forward方法：在forward方法中定义输入数据如何通过自定义层进行处理。 import torch import torch.nn as nn class custom_block(nn.Module): def __init__(self): super().__init__() self.fc1 = nn.Linear(4, 5) self.fc2 = nn.Linear(5, 3) self.relu = nn.ReLU() def forward(self, x): x = self.fc1(x) x = self.relu(x) x = self.fc2(x) return x custom_block = custom_block() input_data = torch.randn(2, 4) print(input_data) output_data = custom_block(input_data) print(output_data) 运行结果如下： tensor([[-0.4663, 0.9429, -0.2072, -1.7672], [ 0.6028, -0.2563, -0.3493, 1.2657]]) tensor([[-0.0273, -0.1265, -0.2595], [ 0.1276, -0.0837, -0.4265]], grad_fn=<AddmmBackward0>) 复杂块待补充参数管理在深度学习中，参数管理通常指的是如何管理模块中的参数，确保它们在训练过程中得到适当的更新，或者在不同阶段(如训练、验证、测试)进行适当的操作。有效的参数管理有助于提高模型训练的效率和稳定性。参数访问 import torch from torch import nn net = nn.Sequential( nn.Linear(4, 8), nn.ReLU(), nn.Linear(8, 1)) X = torch.rand(size=(2, 4)) y = net(X) print(y) print(net[2].state_dict()) print(type(net[2].bias)) print(net[2].bias) print(net[2].bias.data) print(net[2].weight.grad) 运行结果如下： tensor([[-0.1428], [-0.1919]], grad_fn=<AddmmBackward0>) OrderedDict([('weight', tensor([[-0.3178, -0.2009, -0.1120, 0.1502, 0.0054, -0.0864, 0.2142, -0.0564]])), ('bias', tensor([-0.0326]))]) #打印.state_dirct() <class 'torch.nn.parameter.Parameter'> #-打印.bias Parameter containing: tensor([-0.0326], requires_grad=True) tensor([-0.0326]) #打印.bias.data None #打印.weight.grad 也可以使用下面的一次性访问所有参数 print(*[(name, param.shape) for name, param in net[0].named_parameters()]) print(*[(name, param.shape) for name, param in net.named_parameters()]) 运行结果： ('weight', torch.Size([8, 4])) ('bias', torch.Size([8])) ('0.weight', torch.Size([8, 4])) ('0.bias', torch.Size([8])) ('2.weight', torch.Size([1, 8])) ('2.bias', torch.Size([1])) 另外可以使用print打印模型的结构 print(net) 运行如下： Sequential( (0): Linear(in_features=4, out_features=8, bias=True) (1): ReLU() (2): Linear(in_features=8, out_features=1, bias=True) ) 参数初始化 def init_normal(m): if type(m) == nn.Linear: nn.init.normal_(m.weight, mean=1, std=0.01) nn.init.zeros_(m.bias) net.apply(init_normal) net[0].weight.data[0], net[0].bias.data[0] 运行结果如下： (tensor([0.9942, 0.9995, 0.9971, 0.9903]), tensor(0.)) 上面的代码定义了一个init_normal函数，改变了weight和bias，初始化为标准差0.01的高斯随机变量且将参数设置为0。参数绑定所谓参数绑定，就是将多个层间使用共享参数，下面看示例。 shared = nn.Linear(8, 8) net = nn.Sequential(nn.Linear(4, 8), nn.ReLU(), shared, nn.ReLU(), shared, nn.ReLU(), nn.Linear(8, 1)) net(X) print(net) print(net[2].weight.data[0] == net[4].weight.data[0]) net[2].weight.data[0, 0] = 100 print(net[2].weight.data[0] == net[4].weight.data[0]) 运行结果如下： Sequential( (0): Linear(in_features=4, out_features=8, bias=True) (1): ReLU() (2): Linear(in_features=8, out_features=8, bias=True) (3): ReLU() (4): Linear(in_features=8, out_features=8, bias=True) (5): ReLU() (6): Linear(in_features=8, out_features=1, bias=True) ) tensor([True, True, True, True, True, True, True, True]) tensor([True, True, True, True, True, True, True, True]) 可以看到第2层和第4层的参数是一样的，他们不仅值相等，当改变其中一个参数，另一个参数也会一起改变为一样的值。参数存储在pytorch中可以调用save和load保存和读取文件，示例如下。 import torch from torch import nn from torch.nn import functional as F x = torch.arange(4) print(x) torch.save(x, 'x-file') x2 = torch.load('x-file') print(x2) 打印如下： tensor([0, 1, 2, 3]) tensor([0, 1, 2, 3]) 在训练过程中，可以将参数进行保存，下面是示例。 class nlp(nn.Module): def __init__(self): super().__init__() self.hidden = nn.Linear(20, 256) self.output = nn.Linear(256, 10) def forward(self, x): return self.output(F.relu(self.hidden(x))) net = nlp() print(net) X = torch.randn(size=(2, 20)) print(X) Y = net(X) print(Y) torch.save(net.state_dict(), 'nlp.params') clone = nlp() clone.load_state_dict(torch.load('nlp.params')) clone.eval Y_clone = clone(X) Y_clone == Y 打印结果： nlp( (hidden): Linear(in_features=20, out_features=256, bias=True) (output): Linear(in_features=256, out_features=10, bias=True) ) tensor([[-1.3927, -1.9475, -0.6044, -0.5835, -0.5661, -0.4240, -1.4481, -0.0627, 0.7437, 1.0465, 0.1806, 0.1096, -1.2199, 1.1642, 1.0633, 1.3925, 0.3849, 0.9443, -0.4781, 0.6522], [ 1.2506, -0.7369, 0.7148, -0.3734, 1.3801, 0.4163, -1.3707, 0.5407, -0.1734, -1.1068, -0.1630, 1.2899, 0.4753, 0.7332, 0.5401, -0.4011, -0.5356, -0.5833, 0.8288, -0.5439]]) tensor([[-0.6972, -0.0666, 0.5621, -0.4620, -0.1545, 0.2283, 0.1647, 0.1879, 0.1907, -0.1658], [-0.2174, 0.2586, 0.2867, -0.2213, -0.0090, 0.0687, -0.0382, -0.0477, -0.3194, 0.1438]], grad_fn=<AddmmBackward0>) tensor([[True, True, True, True, True, True, True, True, True, True], [True, True, True, True, True, True, True, True, True, True]]) 上面的示例中，先调用torch.save(net.state_dirc(), 'npl.params')将参数保存起来，然后接着通过load_state_dict(torch.load('npl.params'))，将参数读取出来。通过保存参数的方法，可以将训练的实例化进行备份，从上一次保存的参数接着训练。本文来自： <动手学深度学习 V2> 的学习笔记

🕒 2025-07-05 📁 深度学习 👤 laumy 🔥 325 热度
Windows Ai开发环境安装

annaconda可以理解为ai环境可以创建很多个房间，比如允许多个不同版本的python。每个房间可以保存不同的环境变量。步骤1：下载安装包，安装anaconda，https://www.anaconda.com/ 步骤2：设置环境变量设置环境变量需要根据软件实际的安装位置，这里的软件是安装的D盘的。在cmd命令中，执行conda info表示设置环境变量成功。步骤3：创建环境打开Anaconda Prompt终端界面，创建开发环境前，先更新清华的源。 conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/ conda config --set show_channel_urls yes 然后进行安装： conda create -n py39_test python=3.9 -y 其中-n指定环境的名称， python=3.9表示安装python3.9的版本，-y表示同意所有安装过程中的所有确认。步骤4：激活环境 conda activate py39_test 步骤4：安装基础环境 pip install -r requirements.txt 使用pip install 进行安装，requirements.txt内容如下。 contourpy==1.3.0 cycler==0.12.1 filelock==3.16.1 fonttools==4.55.3 fsspec==2024.12.0 importlib_resources==6.5.2 Jinja2==3.1.5 kiwisolver==1.4.7 MarkupSafe==3.0.2 matplotlib==3.9.4 mpmath==1.3.0 networkx==3.2.1 numpy==2.0.2 packaging==24.2 pandas==2.2.3 pillow==11.1.0 pyparsing==3.2.1 python-dateutil==2.9.0.post0 pytz==2024.2 six==1.17.0 sympy==1.13.1 torch==2.5.1 torchaudio==2.5.1 torchvision==0.20.1 typing_extensions==4.12.2 tzdata==2024.2 zipp==3.21.0 使用pip list可以查看安装的包。步骤5：安装pycharm，下载链接 https://www.jetbrains.com/pycharm/download/?section=windows

🕒 2025-07-05 📁 Ai应用 👤 laumy 🔥 107 热度
前向传播、反向传播和计算图

前向传播（Forward Propagation）前向传播是神经网络中从输入数据到输出预测值的计算过程。它通过逐层应用权重（W）和偏置（b），最终生成预测值 $y' $，并计算损失函数$L $。模型定义 $$ y' = W \cdot x + b $$ 损失函数（均方误差） $$ L = \frac{1}{n} \sum_{i=1}^{n} (y'(i) - y_{\text{true}}(i))^2 $$ 示例输入数据：$x = [1.0, 2.0] $ 真实标签：$y_{\text{true}} = [3.0, 5.0]$ 参数初始值：$W = 1.0, \, b = 0.5$ 前向计算预测值：$y'(1) = 1.0 \cdot 1.0 + 0.5 = 1.5, \quad y'(2) = 1.0 \cdot 2.0 + 0.5 = 2.5$ 损失函数：$L = \frac{1}{2} \left[ (1.5 - 3)^2 + (2.5 - 5)^2 \right] = \frac{1}{2} (2.25 + 6.25) = 4.25$ 计算图（Computational Graph）计算图是一种数据结构，用于表示前向传播中的计算过程。图中的节点代表数学操作（如加法、乘法），边代表数据流动（张量）。对上述线性回归模型，计算图如下：输入 x → (Multiply W) → (Add b) → 预测值 t_p → (Subtract y_true) → 误差平方 → 求和平均 → 损失 L 节点：乘法、加法、平方、求和、平均等操作。边：数据流（如 $ x, W, b, y', L$）。反向传播（Backward Propagation）反向传播是通过链式法则（Chain Rule），从损失函数$ L $开始，反向计算每个参数$（W, b）$的梯度 $( \frac{\partial L}{\partial W} ) $和$ ( \frac{\partial L}{\partial b} ) $的过程。下面以线性回归模型公式示例：损失对预测值的梯度 $$ \frac{\partial L}{\partial y'(i)} = \frac{2}{n} (y'(i) - y_{\text{true}}(i)) $$ 预测值对参数的梯度对权重 $W $：$ \frac{\partial y'(i)}{\partial W} = x(i) $ 对偏置 $b $：$\frac{\partial y'(i)}{\partial b} = 1 $ 合并梯度权重梯度：$ \frac{\partial L}{\partial W} = \sum_{i=1}^{n} \frac{\partial L}{\partial y'(i)} \cdot \frac{\partial y'(i)}{\partial W} = \frac{2}{n} \sum_{i=1}^{n} (y'(i) - y_{\text{true}}(i)) \cdot x(i) $ 偏置梯度：$ \frac{\partial L}{\partial b} = \sum_{i=1}^{n} \frac{\partial L}{\partial y'(i)} \cdot \frac{\partial y'(i)}{\partial b} = \frac{2}{n} \sum_{i=1}^{n} (y'(i) - y_{\text{true}}(i)) $ 反向传播示例使用前向传播的结果：计算误差项 $$ y'(1) - y_{\text{true}}(1) = 1.5 - 3 = -1.5, \quad y'(2) - y_{\text{true}}(2) = 2.5 - 5 = -2.5 $$ 计算梯度权重梯度：$\frac{\partial L}{\partial W} = \frac{2}{2} [(-1.5) \cdot 1.0 + (-2.5) \cdot 2.0] = 1.0 \cdot (-1.5 - 5) = -6.5 $ 偏置梯度：$\frac{\partial L}{\partial b} = \frac{2}{2} [(-1.5) + (-2.5)] = 1.0 \cdot (-4) = -4$ PyTorch示例 import torch # 定义参数（启用梯度追踪） W = torch.tensor(1.0, requires_grad=True) b = torch.tensor(0.5, requires_grad=True) # 输入数据 x = torch.tensor([1.0, 2.0]) y_true = torch.tensor([3.0, 5.0]) # 前向传播 y_pred = W * x + b loss = torch.mean((y_pred - y_true) ** 2) # 反向传播 loss.backward() # 输出梯度 print(f"dL/dW: {W.grad}") # 输出 tensor(-6.5) print(f"dL/db: {b.grad}") # 输出 tensor(-4.0) 关键点说明动态计算图：PyTorch 在前向传播时自动构建计算图。反向传播触发：调用 .backward() 后，从损失节点反向遍历图，计算所有 requires_grad=True 的张量的梯度。梯度存储：梯度结果存储在张量的 .grad 属性中。总结：概念作用示例中的体现前向传播计算预测值和损失函数 $ y' = W \cdot x + b, L = 4.25 $ 计算图记录所有计算操作，为反向传播提供路径乘法、加法、平方、求和、平均等操作组成的数据结构反向传播通过链式法则计算参数梯度 $ \frac{\partial L}{\partial W} = -6.5 $ 输入 x │ ▼ [W*x] → 乘法操作（计算图节点） │ ▼ [+b] → 加法操作（计算图节点） │ ▼ 预测值 y' → [平方损失] → 平均损失 L │ ▲ └──────────────────────────┘ 反向传播（梯度回传）

🕒 2025-07-04 📁 深度学习 👤 laumy 🔥 321 热度
梯度计算

什么是梯度梯度(Gradient)是用于描述多元函数在某一点的变化率最大的方向及其大小。在深度学习中，梯度被广泛用于优化模型参数(如神经网络的权重和偏置)，通过梯度下降等算法最小化损失函数。对于多元函数 $f(x_1, x_2, \dots, x_n)$，其梯度是一个向量，由函数对每个变量的偏导数组成，记作： $$ \nabla f = \left( \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \dots, \frac{\partial f}{\partial x_n} \right) $$ 其中： $\nabla f$ 是梯度符号（读作“nabla f”）。 $\frac{\partial f}{\partial x_i}$ 是函数 $f$ 对变量 $x_i$ 的偏导数。直观理解梯度假设有一个二元函数 $f(x, y) = x^2 + y^2$，其梯度为： $$ \nabla f = \left( \frac{\partial f}{\partial x}, \frac{\partial f}{\partial y} \right) = (2x, 2y) $$ 在点 $(1, 1)$ 处，梯度为 $(2, 2)$，表示函数在该点沿方向 $(2, 2)$ 增长最快。若想最小化 $f(x, y)$，应沿着负梯度方向 $-(2, 2)$ 移动，即更新参数： $$ x \leftarrow x - \alpha \cdot 2x $$ $$ y \leftarrow y - \alpha \cdot 2y $$ 其中 $\alpha$ 是学习率。梯度在机器学习中的作用在机器学习中，梯度表示损失函数（Loss Function）对模型参数的敏感度。例如，对于模型参数 $W$（权重）和 $b$（偏置），梯度 $\nabla L$ 包含两个分量： $$ \nabla L = \left( \frac{\partial L}{\partial W}, \frac{\partial L}{\partial b} \right) $$ 通过沿着负梯度方向更新参数（即梯度下降），可以逐步降低损失函数的值。梯度下降的示例目标：最小化函数（线性回归的损失函数）。 $$ L(W, b) = (W \cdot x + b - y_{\text{true}})^2 $$ 假设 $$ x = 2, \quad y_{\text{true}} = 4, \quad W = 1, \quad b = 0.5 $$ 计算预测值： $$ y_{\text{pred}} = W \cdot x + b = 1 \cdot 2 + 0.5 = 2.5 $$ 计算损失： $$ L = (y_{\text{pred}} - y_{\text{true}})^2 = (2.5 - 4)^2 = 2.25 $$ 计算梯度： $$ \frac{\partial L}{\partial W} = 2 (y_{\text{pred}} - y_{\text{true}}) \cdot x = 2 (2.5 - 4) \cdot 2 = -6.0 $$ $$ \frac{\partial L}{\partial b} = 2 (y_{\text{pred}} - y_{\text{true}}) = 2 (2.5 - 4) = -3.0 $$ 梯度为 $$ \nabla L = (-6.0, -3.0) $$ 参数更新（学习率 $ (\alpha = 0.1)）$： $$ W_{\text{new}} = W - \alpha \cdot \frac{\partial L}{\partial W} = 1 - 0.1 \cdot (-6.0) = 1.6 $$ $$ b_{\text{new}} = b - \alpha \cdot \frac{\partial L}{\partial b} = 0.5 - 0.1 \cdot (-3.0) = 0.8 $$ 梯度计算推导这个公式是梯度计算中的一部分，计算的是损失函数 (L) 对参数 (W) 的偏导数。我们来一步步推导这个公式。假设损失函数为： $$ L(W, b) = (W \cdot x + b - y_{\text{true}})^2 $$ 其中$ W$是权重，$b $是偏置，$x $是输入，$y_{\text{true}} $是真实的标签。我们要计算的是损失函数 $L$ 对权重 $W$的偏导数$ \frac{\partial L}{\partial W}$。步骤 1: 定义损失函数损失函数是预测值和真实值之间的误差的平方，定义为： $$ L(W, b) = (y_{\text{pred}} - y_{\text{true}})^2 $$ 其中，$y_{\text{pred}} = W \cdot x + b $是模型的预测值。这个损失函数是一个二次函数，目标是最小化它。步骤 2: 使用链式法则求梯度我们需要对损失函数 (L) 关于 (W) 求偏导数。首先可以应用链式法则： $$ \frac{\partial L}{\partial W} = \frac{\partial L}{\partial y_{\text{pred}}} \cdot \frac{\partial y_{\text{pred}}}{\partial W} $$ 步骤 3: 计算每一部分的偏导数第一部分: 计算 $\frac{\partial L}{\partial y_{\text{pred}}}$。由于损失函数是平方误差形式： $$ L = (y_{\text{pred}} - y_{\text{true}})^2 $$ 对$y_{\text{pred}}$求导，得到： $$ \frac{\partial L}{\partial y_{\text{pred}}} = 2(y_{\text{pred}} - y_{\text{true}}) $$ 第二部分: 计算 $\frac{\partial y_{\text{pred}}}{\partial W}$。由于 $y_{\text{pred}} $= $W \cdot x + b$，对$ W $求导，得到： $$ \frac{\partial y_{\text{pred}}}{\partial W} = x $$ 步骤 4: 合并结果现在将两部分结果结合起来： $$ \frac{\partial L}{\partial W} = 2(y_{\text{pred}} - y_{\text{true}}) \cdot x $$ 步骤 5: 将具体数值代入根据给定的数值 $x = 2$,$ y_{\text{true}} = 4$, $W = 1$, 和 $b = 0.5$，我们首先计算预测值$y_{\text{pred}}$： $$ y_{\text{pred}} = W \cdot x + b = 1 \cdot 2 + 0.5 = 2.5 $$ 然后代入到梯度公式中： $$ \frac{\partial L}{\partial W} = 2(2.5 - 4) \cdot 2 = 2(-1.5) \cdot 2 = -6.0 $$ 所以，损失函数$ L$ 对 $W $的偏导数是 $-6.0$。总结：对于复杂的梯度计算可以利用链式法则。在该示例中，先令 $y_{\text{pred}} $= $W \cdot x + b$。对$W$求偏导，就可以转化为，$\frac{\partial L}{\partial W} = \frac{\partial L}{\partial y_{\text{pred}}} \cdot \frac{\partial y_{\text{pred}}}{\partial W}$，然后可以先求$\frac{\partial L}{\partial y_{\text{pred}}} $，再求$\frac{\partial y_{\text{pred}}}{\partial W}$，这样计算就没有这么复杂了。根据公式$\frac{\partial L}{\partial y_{\text{pred}}} = 2(y_{\text{pred}} - y_{\text{true}})$,而$\frac{\partial y_{\text{pred}}}{\partial W} = x$,所以$\frac{\partial L}{\partial W} = 2(y_{\text{pred}} - y_{\text{true}}) \cdot x$，因此知道预测值、真实值、输入值、当前的权重和偏置即可算出偏导。同理$b$也可以用类似方法，继而算出损失函数的梯度$\nabla L = \left( \frac{\partial L}{\partial W}, \frac{\partial L}{\partial b} \right)$ pytorch示例在pytorch中通过自动微分Autograd自动计算梯度，示例如下： import torch # 定义参数（启用梯度追踪） W = torch.tensor(1.0, requires_grad=True) b = torch.tensor(0.5, requires_grad=True) # 输入数据 x = torch.tensor(2.0) y_true = torch.tensor(4.0) # 前向传播 y_pred = W * x + b loss = (y_pred - y_true) ** 2 # 反向传播计算梯度 loss.backward() # 输出梯度 print(f"dL/dW: {W.grad}") # 输出 tensor(-6.0) print(f"dL/db: {b.grad}") # 输出 tensor(-3.0) 概念数学表达意义梯度定义 ∇f = (∂f/∂x₁, …) 多元函数变化最快的方向及其速率梯度下降 W ← W − α ⋅ ∂L/∂W 沿负梯度方向更新参数以最小化损失函数 PyTorch自动微分 loss.backward() 通过反向传播自动计算所有参数的梯度并存储在 .grad 中

🕒 2025-07-03 📁 深度学习 👤 laumy 🔥 262 热度
激活函数

概念前面我们主要使用的是线性模型，但是线性模型有很多局限性，因为我们要建模的问题并不能单纯使用线性模型就能够拟合的，如下示例。我们要拟合红色部分的函数，使用线性模型即使在怎么调整W和b都没法进行拟合出来，要拟合这样的函数，我们需要非线性的函数。如上图，要拟合这样的模型，我们可以使用①②③函数相加再加上一个b偏置。那这里的①②③函数怎么来了，可以看出是wx+b再经过一个sigmoid转换得来，那这里的sigmoid我们就称为激活函数。激活函数的主要作用是引入非线性，使得神经网络能够处理更复杂的问题并避免退化成线性模型。没有激活函数，神经网络就无法发挥其强大的学习和表达能力。选择合适的激活函数对模型的训练和性能表现至关重要。常见的激活函数 ReLU 激活函数公式：$ \text{ReLU}(x) = \max(0, x) $ x = torch.arange(-8.0, 8.0, 0.1, requires_grad=True) y = torch.relu(x) d2l.plot(x.detach(), y.detach(), 'x', 'relu(x)', figsize=(5, 2.5)) ReLU激活函数用得比较多，因为其计算相对简单，不需要复杂的指数计算，因为指数计算都很贵。 ReLU函数进行求导，可以发现当输入为负时，导数为0，当输入为正是，导数为1。可以使用y.backward来计算导数，可以理解导数就是梯度。x取不同位置进行求导得到的值，就是相应位置的梯度。 y.backward(torch.ones_like(x), retain_graph=True) d2l.plot(x.detach(), x.grad, 'x', 'grad of relu', figsize=(5, 2.5)) Sigmoid 激活函数公式： $ \sigma(x) = \frac{1}{1 + e^{-x}} $ y = torch.sigmoid(x) d2l.plot(x.detach(), y.detach(), 'x', 'sigmoid(x)', figsize=(5, 2.5)) Tanh 激活函数公式： $ \tanh(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}} $ y = torch.tanh(x) d2l.plot(x.detach(), y.detach(), 'x', 'tanh(x)', figsize=(5, 2.5)) Softmax 激活函数公式： $ \text{Softmax}(z_i) = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}} $ 在前面章节中，我们使用softmax用于线性回归的多分类，但其实softmax也可以看做一种激活函数。 softmax将神经网络的输出转换为概率分布，确保每个类别的输出值在0到1之间，且所有类别的概率和为1。如z=[2.0,1.0,0.1] 经过softmax计算转化后得[0.7,0.2,0.1]，如果神经网络的输出为三个类别的得分，表示第一个类别的预测概率最大，约为70%。总结来说，Softmax 是一种激活函数，它专门用于多分类问题的输出层，帮助模型生成一个概率分布，便于做分类决策。

🕒 2025-07-02 📁 深度学习 👤 laumy 🔥 260 热度
sotfmax回归实现

什么是sotfmax回归 Softmax回归（Softmax Regression），也叫多项逻辑回归，是一种用于多分类问题的分类算法。它是对逻辑回归（Logistic Regression）的一种扩展，适用于处理输出类别数大于2的情况。Softmax回归通过使用Softmax函数来将每个类别的输出转化为一个概率分布，使得输出值能够表示每个类别的概率，并且所有类别的概率之和为1。举个例子：假设有一个包含3个类别的多分类问题：苹果、香蕉、橙子。对于每个输入样本（例如一张图片），Softmax回归模型会输出三个值（每个类别的概率），也就是概率分布。例如：苹果的概率：0.6 香蕉的概率：0.3 橙子的概率：0.1 这些概率加起来等于1，模型会将输入样本分类为苹果（因为概率最大）。 softmax函数对于每个类别$ k $ ，我们会计算一个得分$ z_k $，然后将这个得分转化为概率。得分通常是由输入数据$ \mathbf{x} $与对应类别的权重向量$ \mathbf{w}_k $ 的线性组合给出的：$ z_k = \mathbf{w}_k^T \mathbf{x} + b_k $, 其中，$ \mathbf{w}_k $ 是第$ k$ 个类别的权重，$ b_k$ 是偏置项，$ \mathbf{x} $ 是输入特征向量。Softmax函数用于将这些得分$ z_k $转换成概率。 Softmax函数的形式如下：$ P(y = k | \mathbf{x}) = \frac{e^{z_k}}{\sum_{j=1}^K e^{z_j}} $ 。 $ P(y = k | \mathbf{x}) ) 是输入 ( \mathbf{x} $ 属于类别k的概率。 $ z_k $ 是类别 $ k $ 的得分。 $ \sum_{j=1}^K e^{z_j} $ 是所有类别得分的指数函数的和，确保概率和为1。交叉熵损失函数为了训练Softmax回归模型，我们使用交叉熵损失函数来评估模型预测与真实标签之间的差异。交叉熵损失函数的公式如下：$ L(\theta) = - \sum_{i=1}^N \sum_{k=1}^K y_{ik} \log P(y_k = 1 | \mathbf{x}_i) $ 其中： - $ N $ 是训练集中的样本数。 - $ y_{ik} $ 是样本 $ i $是否属于类别 $ k $ 的标签（通常是1或0）。 - $ P(y_k = 1 | \mathbf{x}_i) $ 是输入 $ \mathbf{x}_i $ 属于类别 $ k $ 的概率。 softmax实现示例数据读取 pip install d2l==0.16 import torch from IPython import display from d2l import torch as d2l batch_size = 256 train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size) 这里直接使用了d2l.load_data_fashion_mnist() 函数加载 Fashion-MNIST 数据集。load_data_fashion_mnist 是 d2l 库中的一个工具函数，用于加载 Fashion-MNIST 数据集并返回训练集和测试集的数据迭代器。train_iter 是训练集的迭代器。test_iter 是测试集的迭代器。数据迭代器是用于在模型训练和评估过程中批量加载数据的对象。batch_size 参数指定了每个批次包含多少个样本。可以用下面的示例代码打印输入的数据 n = 10 for X, y in train_iter: break X_selected = X[0:n].reshape((n, 28, 28)) titles = [f'Label: {int(label)}' for label in y[0:n]] d2l.show_images(X_selected, 1, n, titles=titles) 定义模型 sotfmax函数计算softmax的步骤如下：对每个项求幂（使用exp）对每一行求和（小批量中每个样本是一行），得到每个样本的规范化常数将每一行除以其规范化常数，确保结果的和为1 def softmax(X): X_exp = torch.exp(X) print(X_exp) partition = X_exp.sum(1, keepdim=True) print(partition) return X_exp / partition 示例如下： X = torch.normal(0, 1, (2, 5)) #使用正态分布生成2行5列的矩阵 print(X) X_prob = softmax(X) X_prob, X_prob.sum(1) #生成的数据 tensor([[ 0.3141, 0.5186, -0.6949, 0.5918, -2.2370], [-0.3814, 0.8092, -0.1959, 0.7489, 1.8790]]) #torch.exp(X)：对矩阵中每个数据求e^x指数运算后的结果 tensor([[1.3690, 1.6797, 0.4991, 1.8072, 0.1068], [0.6829, 2.2460, 0.8221, 2.1146, 6.5472]]) #X_exp.sum(1, keepdim=True)：对每一行求和 tensor([[ 5.4618], [12.4129]]) #将每一行除以其规范化常数，确保结果的和为1 (tensor([[0.2506, 0.3075, 0.0914, 0.3309, 0.0196], [0.0550, 0.1809, 0.0662, 0.1704, 0.5275]]), tensor([1., 1.])) 模型和参数 num_inputs = 784 num_outputs = 10 W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True) b = torch.zeros(num_outputs, requires_grad=True) def net(X): return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b) 模型还是使用的是线性模式，只是在线性模型的基础上再加了一个softmax函数。模型的数学表示为：$ \hat{y} = \text{softmax}(X W + b) $ $ X \in \mathbb{R}^{n \times 784} $ 是输入样本矩阵，$ n $ 是样本数量。 $ W \in \mathbb{R}^{784 \times 10} $ 是权重矩阵。 $ b \in \mathbb{R}^{10} $ 是偏置向量。 $ \hat{y} \in \mathbb{R}^{n \times 10} $ 是输出矩阵，其中每一行是一个样本的预测类别概率。 softmax 函数的公式为：$ \text{softmax}(z_i) = \frac{e^{z_i}}{\sum_{j} e^{z_j}} $ 其中 $ z_i $ 是某一类别的得分，$ j $ 遍历所有类别（在这个例子中是 10 个类别）。通过 softmax 函数，每个输出都会被转换为一个概率，所有类别的概率加起来为 1。定义损失函数 def cross_entropy(y_hat, y): return - torch.log(y_hat[range(len(y_hat)), y]) y_hat：这是模型的预测输出，通常是一个经过 softmax 函数处理的概率分布。y_hat 的形状通常是 (batch_size, num_classes)，其中 batch_size 是样本数量，num_classes 是类别数量。每一行表示一个样本对各个类别的预测概率。 y：这是实际标签的索引，形状为 (batch_size,)，表示每个样本的真实类别的索引。 y_hat[range(len(y_hat)), y]：这是通过 y 中的类别索引提取 y_hat 中对应类别的预测概率。range(len(y_hat)) 生成一个从 0 到 batch_size-1 的索引序列，表示每个样本。通过 y 索引，获取每个样本对应类别的概率值。 torch.log(...)：对提取的预测概率取对数。交叉熵损失函数中有一个 log 操作，它衡量了预测概率和真实标签之间的差异。负号：交叉熵是通过负对数似然（negative log-likelihood）计算的，因此需要对结果取负。损失函数的公式为：$ L = - \frac{1}{n} \sum_{i=1}^{n} \log(\hat{y}_{i, y_i}) $, 通过对每个样本的预测概率取对数，并对所有样本的对数损失求和再取负值。分类精度分类精度= 样本预测正确数量除以样本总数（len(y)）。也可以理解是预测对的概率，比如输入样本图片识别正确数为1，总样本数2时，精度为 1/2 = 0.5。先看看例子，y_hat模型的预测输出，通常是一个二维矩阵，形状为 (样本数, 类别数)。例如，2个样本（输入的图片）3个类别(猫、狗、猪)的输出可能是 [[0.1, 0.2, 0.7], [0.3, 0.4, 0.3]]，即每个样本对应输出的一个概率分布，样本1对应的概率分布[0.1, 0.2, 0.7]，样本2对应的概率分布是[0.3, 0.4, 0.3]，而真实的标签y是一个一维向量，每个元素表示对应样本的正确类别索引，如[2, 1]，其中2代表的是狗，1代表猫。那y_hat和y怎么做比较和转换了？解决的办法就是，我们取每个样本概率分布中最大概率的索引，也就是通过 argmax(axis=1) 沿着行方向（即每个样本）找到概率最大的类别索引。例如，[[0.1, 0.2, 0.7], [0.3, 0.4, 0.3]] 会得到 [2, 1]，即第一行最大是0.7，索引位置是2，第二行最大是0.4，级索引是1。有了这样的结果，就可以y_hat和y做比较了，比如y=[2,1], 那么y_hat输出结果是[2,1]，那么表示全部预测对。 def accuracy(y_hat, y): if len(y_hat.shape) > 1 and y_hat.shape[1] > 1: y_hat = y_hat.argmax(axis=1) #y_hat将输出索引如[2,1],下面结算的是y_hat和y进行比较，返回正确的个数。 cmp = y_hat.type(y.dtype) == y return float(cmp.type(y.dtype).sum()) 因此上面这个函数，最终返回的是正确的个数，比如y_hat = [[0.1, 0.2, 0.7], [0.3, 0.4, 0.3]],y是[2,2]经过accuracy函数处理后，返回的是结果是1，因为y_hat = y_hat.argmax(axis=1)计算后，返回的是[2,1]，与实际的标签[2,2]有一个不对，即第二个样本预测错了。那么最终的分类精度就等于1/2 = 0.5。训练 def train_epoch_ch3(net, train_iter, loss, updater): # 将模型设置为训练模式 if isinstance(net, torch.nn.Module): net.train() # 训练损失总和、训练准确度总和、样本数 metric = Accumulator(3) for X, y in train_iter: # 计算梯度并更新参数 y_hat = net(X) l = loss(y_hat, y) if isinstance(updater, torch.optim.Optimizer): # 使用PyTorch内置的优化器和损失函数 updater.zero_grad() l.mean().backward() updater.step() else: # 使用定制的优化器和损失函数 l.sum().backward() updater(X.shape[0]) metric.add(float(l.sum()), accuracy(y_hat, y), y.numel()) # 返回训练损失和训练精度 return metric[0] / metric[2], metric[1] / metric[2] def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater): animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9], legend=['train loss', 'train acc', 'test acc']) for epoch in range(num_epochs): train_metrics = train_epoch_ch3(net, train_iter, loss, updater) //返回训练损失和训练精度 test_acc = evaluate_accuracy(net, test_iter) //返回的是测试精度 animator.add(epoch + 1, train_metrics + (test_acc,)) //将其绘制到图像上。 train_loss, train_acc = train_metrics assert train_loss < 0.5, train_loss assert train_acc <= 1 and train_acc > 0.7, train_acc assert test_acc <= 1 and test_acc > 0.7, test_acc lr = 0.1 def updater(batch_size): return d2l.sgd([W, b], lr, batch_size) num_epochs = 10 train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater) 下面是训练的过程显示： train loss: 训练损失，也就是损失函数的结果。是模型在训练集上的平均损失值，通常使用损失函数来衡量。例如，常用的交叉熵损失（cross-entropy loss）或均方误差（mean squared error）。损失越小，说明模型在训练数据上的表现越好。它反映了模型预测值与真实标签之间的差距。 train acc: Training Accuracy, 训练精度。是模型在训练集上的正确预测的比例。它通过比较模型的预测结果和真实标签来计算。训练精度=正确预测的样本数量/总样本数量。训练准确度越高，说明模型在训练数据上的拟合程度越好。训练准确度反映了模型对训练集的学习能力。 test acc: Test Accuracy,测试精度。是指模型在未见过的测试集上的准确度。它与训练准确度不同，测试集用来评估模型的泛化能力。测试准确度反映了模型对新数据的预测能力。如果测试准确度高，说明模型不仅在训练集上表现好，而且具有较强的泛化能力，能够适应未见过的数据。预测 def predict_ch3(net, test_iter, n=6): for X, y in test_iter: break trues = d2l.get_fashion_mnist_labels(y) preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1)) titles = [true +'\n' + pred for true, pred in zip(trues, preds)] d2l.show_images( X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n]) predict_ch3(net, test_iter) 使用训练好的模型，来预测实际的效果：总结一、公式和代码公式：y = softmax(WX+b) 代码实现：y = softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b) 二、输入和输出示例输入： X= torch.Size([256, 1, 28, 28])-->X=torch.Size([256, 784]) 由于WX要满足矩阵乘，所以要把X做处理X.reshape((-1, W.shape[0])) W=torch.Size([784, 10]) b=torch.Size([10]) 输出: y_hat=torch.Size([256, 10]) -->([256,784])*([784,10]) = ([256, 10])矩阵相乘下面是打印第一行的结果，也就是对应输入第一个样本的预测结果。最后为0.99613，如果最后一项是代表是shirt，但是表示第一个样本就是shirt。 tensor([4.8291e-06, 1.2489e-07, 3.7127e-06, 2.1127e-07, 1.3334e-06, 2.6440e-03, 1.9091e-05, 8.7211e-04, 3.2460e-04, 9.9613e-01]) 本文来自： <动手学深度学习 V2> 的学习笔记

🕒 2025-07-01 📁 深度学习 👤 laumy 🔥 263 热度
线性回归实现

线性回归线性回归模型根据给定的数据集和对应的标签，通过一个函数模型来拟合数据集以及对应标签的映射关系。而这个模型可以设置为y=wx+b的一个函数，其中x和w是一个向量。目标就是找出权重w和偏执b的值，使得模型更逼近数据集合的规律，也就是能够预测的更准确。线性回归示例实现 pytorch本身有线性回归的函数，只是这里通过实现pytoch来加深理解读取数据集 def data_iter(batch_size, features, labels): num_examples = len(features) #获取数据的长度，假1000行，输出1000 indices = list(range(num_examples)) #生成一个下标，结果[0,...,999] random.shuffle(indices)#打散indices，使数据随机，结果[77,99,0,13,....] for i in range(0, num_examples, batch_size): #表示从0到num_examples，步长为 batch_size batch_indices = torch.tensor( indices[i: min(i + batch_size, num_examples)]) print(batch_indices) #i 到 i + batch_size 的索引转换为一 PyTorch张量 yield features[batch_indices], labels[batch_indices] #每次循环时，yield 会返回一个元组 (features_batch, labels_batch)， #其中 features_batch 是一个包含该批次特征数据的 Tensor，labels_batch #是该批次对应的标签数据。定义一个函数data_iter，将数据集（x）、以及数据集对应的特征（y）作为函数输入，分割成大小为batch_size的小批量数据集（x）和特征集（y）。之所以要进行分割每次抽取小批量样本，是利用了GPU并行运算的优势。每个样本都可以并行地进行模型计算，同时在后续计算梯度时，每个样本损失函数的梯度可以被并行计算。 batch_size = 10 for X, y in data_iter(batch_size, features, labels): print(X, '\n', y) break 运行结果 tensor([940, 41, 385, 262, 655, 402, 317, 256, 984, 644]) --print(batch_indices) tensor([[-0.9666, 0.8299], [-1.8890, 0.1645], [ 0.0274, -0.6944], [ 2.0289, 0.7227], [ 1.0077, 0.6674], [ 1.8692, 0.5002], [-0.9469, 1.7404], [ 0.8589, -0.5467], [ 1.1260, 0.1262], [-0.6988, -0.0683]]) tensor([[-0.5347], [-0.1296], [ 6.6105], [ 5.7961], [ 3.9675], [ 6.2448], [-3.5983], [ 7.7625], [ 6.0183], [ 3.0294]]) 那么训练的数据集和特征怎么来了，一般是通过需要训练的目标处理得来，为了方便本章用一个函数来模拟生成数据集。 def synthetic_data(w, b, num_examples): X = torch.normal(0, 1, (num_examples, len(w))) y = torch.matmul(X, w) + b y += torch.normal(0, 0.01, y.shape) return X, y.reshape((-1, 1)) 上面这个函数，实际上就是给先指了w和b生成y=wx+b模型中的x和y, 而我们就是要训练找出w和b。生成输入数据 X：torch.normal(mean, std, size) 用于从正态分布中生成数据。这里，mean=0 表示均值为0，std=1 表示标准差为1。(num_examples, len(w)) 是生成张量的形状，这里 num_examples 是生成的样本数，len(w) 是每个样本的特征数（即权重向量 w 的长度）。所以 X 是一个形状为 (num_examples, len(w)) 的矩阵，其中包含了从标准正态分布中采样的特征数据。生成标签 y: torch.matmul(X, w) 计算输入特征 X 和权重 w 的矩阵乘法。结果是一个形状为 (num_examples,) 的张量，表示每个样本的预测值（不包括偏置）。+ b 将偏置 b 加到每个样本的预测值中，这样就得到最终的标签 y。这就是线性回归模型中的公式 y = Xw + b。添加噪声： torch.normal(0, 0.01, y.shape) 生成一个与 y 形状相同的噪声项，噪声来自均值为 0，标准差为 0.01 的正态分布。这一步是为了给数据添加一些随机噪声，使得生成的数据更符合实际情况。现实中，数据通常会有一些误差或噪声，因此我们在标签 y 上添加小的随机波动。返回数据: X 是生成的输入特征数据。y.reshape((-1, 1)) 将标签 y 转换为一个形状为 (num_examples, 1) 的列向量，以确保标签的形状是列向量。生成合成的线性数据集，数据集的特征 X 是从标准正态分布中采样的，而标签 y 是通过线性方程 y = Xw + b 生成的，并且在 y 上添加了一些小的噪声。 true_w = torch.tensor([2, -3.4]) true_b = 4.2 features, labels = synthetic_data(true_w, true_b, 10) print('features:', features, '\nfeatures len', len(features), '\nlabel:', labels) 运行结果 features: tensor([[ 4.3255e-01, -1.4288e+00], [ 2.2412e-01, -1.8749e-01], [-5.6843e-01, 1.0930e+00], [ 1.3660e+00, -1.8141e-03], [ 3.9331e-01, -2.4553e-02], [-6.3184e-01, -8.4748e-01], [-1.7891e-02, -1.4018e+00], [-4.8070e-01, 8.5689e-01], [ 2.0670e+00, 3.8301e-02], [ 1.7682e+00, 1.9595e-01]]) features len 10 label: tensor([[ 9.9307], [ 5.2856], [-0.6669], [ 6.9439], [ 5.0759], [ 5.8344], [ 8.9642], [ 0.3175], [ 8.2140], [ 7.0458]]) 定义模型我们的模型函数是y=wX+b，也就是计算输入特征X和权重W，这里的Xw是一个向量，而b是一个标量，但是用一个向量加上一个标量是，标量会被加到每个分量上，这是广播机制。 def linreg(X, w, b): return torch.matmul(X, w) + b 在开始计算随机梯度下降优化模型参数之前，需要先预设一些参数。下面是使用正态分布随机初始化w和b。 w = torch.normal(0, 0.01, size=(2,1), requires_grad=True) b = torch.zeros(1, requires_grad=True) w, b 定义损失函数损失函数就是根据我们采样处理的数据X输入到我们的模型中计算处理的值y’跟真实值y的差距，这里使用平方损失函数,即loss=(y’-y)^2，详细的公式为：$ L(w, b) = \frac{1}{n} \sum_{i=1}^{n} \ell^{(i)}(w, b) = \frac{1}{n} \sum_{i=1}^{n} \left( \frac{1}{2} \left( w^\top x^{(i)} + b - y^{(i)} \right)^2 \right) $ def squared_loss(y_hat, y): return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2 优化参数详细的公式为：$ (w, b) \leftarrow (w, b) - \frac{\eta}{|B|} \sum_{i \in B} \nabla_{(w, b)} \ell^{(i)}(w, b) $ 损失函数是对对应参数求的偏导，即如果是$w$就是对$w$的偏导，如果是$b$就是$b$的偏导，$x$是当前采样的具体值（不是变量），公式中的$B$是抽样的小批量，是固定数量的训练样本。具体算法的步骤如下，对于$W$更新参数的公式为： $ w \leftarrow w - \frac{\eta}{|B|} \sum_{i \in B} \frac{\partial \ell^{(i)}}{\partial w}(w, b) = w - \frac{\eta}{|B|} \sum_{i \in B} x^{(i)} \left( w^\top x^{(i)} + b - y^{(i)} \right) $ 对于$b$更新参数的公式为： $ b \leftarrow b - \frac{\eta}{|B|} \sum_{i \in B} \frac{\partial \ell^{(i)}}{\partial b}(w, b) = b - \frac{\eta}{|B|} \sum_{i \in B} \left( w^\top x^{(i)} + b - y^{(i)} \right) $ 从上面的公式可以看出，梯度是批量误差的和，没处理一个批量数据，更新一次参数，而不是每处理一个数据更新一次参数。 def sgd(params, lr, batch_size): with torch.no_grad(): for param in params: param -= lr * param.grad / batch_size #梯度值为param.grad param.grad.zero_() param.grad是哪里来的？系统自动计算而来，下一章节会介绍。训练在训练之前，需要先初始化参数， w = torch.normal(0, 0.01, size=(2,1), requires_grad=True) b = torch.zeros(1, requires_grad=True) w, b 接下来开始训练 lr = 0.03 num_epochs = 5000 net = linreg loss = squared_loss for epoch in range(num_epochs): for X, y in data_iter(batch_size, features, labels): l = loss(net(X, w, b), y) # X和y的小批量损失 # 因为l形状是(batch_size,1)，而不是一个标量。l中的所有元素被加到一起， # 并以此计算关于[w,b]的梯度 l.sum().backward() sgd([w, b], lr, batch_size) # 使用参数的梯度更新参数 print('true_w',true_w, 'w', w, '\ntrue_b', true_b, 'b',b) with torch.no_grad(): train_l = loss(net(features, w, b), labels) print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}') print(f'w的估计误差: {true_w - w.reshape(true_w.shape)}') print(f'b的估计误差: {true_b - b}') 为什么l.sum().backward能够自动计算存储梯度值？在初始化w和b的参数时，设定了requires_grad=True。在计算损失时，net(X, w, b)会生成预测值y_hat，并通过loss函数与真实值y构建计算图。调用l.sum().backward()时，PyTorch的autograd系统会从标量损失l.sum()开反向传播，自动计算w和b的梯度，并存储在w.grad和b.grad中。 tensor([4, 3, 9, 6, 8, 2, 5, 7, 1, 0]) ---batch_size = 10 true_w tensor([ 2.0000, -3.4000]) w tensor([[ 2.0056],[-3.4014]], requires_grad=True) true_b 4.2 b tensor([4.1983], requires_grad=True) epoch 1, loss 0.000056 tensor([5, 6, 2, 4, 0, 7, 8, 1, 9, 3]) true_w tensor([ 2.0000, -3.4000]) w tensor([[ 2.0056],[-3.4014]], requires_grad=True) true_b 4.2 b tensor([4.1983], requires_grad=True) epoch 2, loss 0.000056 tensor([8, 5, 6, 9, 7, 4, 2, 1, 0, 3]) true_w tensor([ 2.0000, -3.4000]) w tensor([[ 2.0056],[-3.4014]], requires_grad=True) true_b 4.2 b tensor([4.1983], requires_grad=True) epoch 3, loss 0.000056 tensor([9, 3, 5, 2, 8, 0, 7, 4, 6, 1]) true_w tensor([ 2.0000, -3.4000]) w tensor([[ 2.0056],[-3.4014]], requires_grad=True) true_b 4.2 b tensor([4.1983], requires_grad=True) epoch 4, loss 0.000056 tensor([8, 1, 5, 3, 0, 6, 2, 4, 9, 7]) true_w tensor([ 2.0000, -3.4000]) w tensor([[ 2.0056],[-3.4014]], requires_grad=True) true_b 4.2 b tensor([4.1983], requires_grad=True) epoch 5, loss 0.000056 tensor([3, 7, 4, 0, 6, 9, 2, 1, 5, 8]) true_w tensor([ 2.0000, -3.4000]) w tensor([[ 2.0056],[-3.4014]], requires_grad=True) true_b 4.2 b tensor([4.1983], requires_grad=True) epoch 6, loss 0.000056 tensor([7, 4, 6, 1, 0, 5, 3, 8, 9, 2]) true_w tensor([ 2.0000, -3.4000]) w tensor([[ 2.0056],[-3.4014]], requires_grad=True) true_b 4.2 b tensor([4.1983], requires_grad=True) 误差结果： w的估计误差: tensor([-0.0056, 0.0014], grad_fn=<SubBackward0>) b的估计误差: tensor([0.0017], grad_fn=<RsubBackward1>) 本文来自： <动手学深度学习 V2> 的学习笔记

🕒 2025-07-01 📁 深度学习 👤 laumy 🔥 318 热度
小智Ai语音交互简要分析

app start 主要是初始化板级、显示、WiFi连接、音频codec、编解码、协议、音效、唤醒几个环节。 auto& board = Board::GetInstance(); //获取板级实例 SetDeviceState(kDeviceStateStarting);//设置出事状态为kDeviceStateStarting /* Setup the display */ auto display = board.GetDisplay(); //获取显示实例 /* Setup the audio codec */ auto codec = board.GetAudioCodec();//获取codec实例 opus_decode_sample_rate_ = codec->output_sample_rate();//获取当前codec的采样率 opus_decoder_ = std::make_unique<OpusDecoderWrapper>(opus_decode_sample_rate_, 1);//初始化opus解码，设置解码采样率 opus_encoder_ = std::make_unique<OpusEncoderWrapper>(16000, 1, OPUS_FRAME_DURATION_MS);//初始化opus编码，设置采样率16Khz // For ML307 boards, we use complexity 5 to save bandwidth // For other boards, we use complexity 3 to save CPU //根据板级来设置opus编码的复杂度 if (board.GetBoardType() == "ml307") { ESP_LOGI(TAG, "ML307 board detected, setting opus encoder complexity to 5"); opus_encoder_->SetComplexity(5); } else { ESP_LOGI(TAG, "WiFi board detected, setting opus encoder complexity to 3"); opus_encoder_->SetComplexity(3); } //如果codec的采样率不是16Khz，需要进行重采样，下面是重采样初始化。 if (codec->input_sample_rate() != 16000) { input_resampler_.Configure(codec->input_sample_rate(), 16000); reference_resampler_.Configure(codec->input_sample_rate(), 16000); } //注册codec输入音频的回调，表示有录音的pcm，触发mainloop处理。 codec->OnInputReady([this, codec]() { BaseType_t higher_priority_task_woken = pdFALSE; xEventGroupSetBitsFromISR(event_group_, AUDIO_INPUT_READY_EVENT, &higher_priority_task_woken); return higher_priority_task_woken == pdTRUE; }); //注册codec输出音频的回调，表示有录音的pcm，触发mainloop处理。 codec->OnOutputReady([this]() { BaseType_t higher_priority_task_woken = pdFALSE; xEventGroupSetBitsFromISR(event_group_, AUDIO_OUTPUT_READY_EVENT, &higher_priority_task_woken); return higher_priority_task_woken == pdTRUE; }); //启动硬件codec，使能录音和播放。 codec->Start(); //开启一个mainloop线程，处理主要逻辑 /* Start the main loop */ xTaskCreate([](void* arg) { Application* app = (Application*)arg; app->MainLoop(); vTaskDelete(NULL); }, "main_loop", 4096 * 2, this, 4, nullptr); //等待WiFi连接好 /* Wait for the network to be ready */ board.StartNetwork(); // Initialize the protocol display->SetStatus(Lang::Strings::LOADING_PROTOCOL);//显示正在加载协议根据使用MQTT还是Websocet来选择通信协议 #ifdef CONFIG_CONNECTION_TYPE_WEBSOCKET protocol_ = std::make_unique<WebsocketProtocol>(); #else protocol_ = std::make_unique<MqttProtocol>(); #endif //注册网络接收异常回调函数 protocol_->OnNetworkError([this](const std::string& message) { SetDeviceState(kDeviceStateIdle); Alert(Lang::Strings::ERROR, message.c_str(), "sad", Lang::Sounds::P3_EXCLAMATION); }); //注册接收音频的回调函数，接收到音频后，往加入解码队列 protocol_->OnIncomingAudio([this](std::vector<uint8_t>&& data) { std::lock_guard<std::mutex> lock(mutex_); if (device_state_ == kDeviceStateSpeaking) { audio_decode_queue_.emplace_back(std::move(data)); } }); //注册接收协议打开音频的回调，主要是下发解码的的属性信息，包括采样率等。 protocol_->OnAudioChannelOpened([this, codec, &board]() { board.SetPowerSaveMode(false); if (protocol_->server_sample_rate() != codec->output_sample_rate()) { ESP_LOGW(TAG, "Server sample rate %d does not match device output sample rate %d, resampling may cause distortion", protocol_->server_sample_rate(), codec->output_sample_rate()); } SetDecodeSampleRate(protocol_->server_sample_rate()); auto& thing_manager = iot::ThingManager::GetInstance(); protocol_->SendIotDescriptors(thing_manager.GetDescriptorsJson()); std::string states; if (thing_manager.GetStatesJson(states, false)) { protocol_->SendIotStates(states); } }); //注册音频的关闭回调 protocol_->OnAudioChannelClosed([this, &board]() { board.SetPowerSaveMode(true); Schedule([this]() { auto display = Board::GetInstance().GetDisplay(); display->SetChatMessage("system", ""); SetDeviceState(kDeviceStateIdle); }); }); //注册json解析回调，通知文本，状态等信息 protocol_->OnIncomingJson([this, display](const cJSON* root) { // Parse JSON data auto type = cJSON_GetObjectItem(root, "type"); //文字转语音的状态，包括start，stop，sentence_start/stop（句子开始结束）， if (strcmp(type->valuestring, "tts") == 0) { auto state = cJSON_GetObjectItem(root, "state"); if (strcmp(state->valuestring, "start") == 0) { Schedule([this]() { aborted_ = false; if (device_state_ == kDeviceStateIdle || device_state_ == kDeviceStateListening) { SetDeviceState(kDeviceStateSpeaking); } }); } else if (strcmp(state->valuestring, "stop") == 0) { Schedule([this]() { if (device_state_ == kDeviceStateSpeaking) { background_task_->WaitForCompletion(); if (keep_listening_) { protocol_->SendStartListening(kListeningModeAutoStop); SetDeviceState(kDeviceStateListening); } else { SetDeviceState(kDeviceStateIdle); } } }); //句子开始 } else if (strcmp(state->valuestring, "sentence_start") == 0) { auto text = cJSON_GetObjectItem(root, "text"); if (text != NULL) { ESP_LOGI(TAG, "<< %s", text->valuestring); Schedule([this, display, message = std::string(text->valuestring)]() { display->SetChatMessage("assistant", message.c_str()); }); } } =//stt：语音转文字信息 } else if (strcmp(type->valuestring, "stt") == 0) { auto text = cJSON_GetObjectItem(root, "text"); if (text != NULL) { ESP_LOGI(TAG, ">> %s", text->valuestring); Schedule([this, display, message = std::string(text->valuestring)]() { display->SetChatMessage("user", message.c_str()); }); } } else if (strcmp(type->valuestring, "llm") == 0) { auto emotion = cJSON_GetObjectItem(root, "emotion"); if (emotion != NULL) { Schedule([this, display, emotion_str = std::string(emotion->valuestring)]() { display->SetEmotion(emotion_str.c_str()); }); } } else if (strcmp(type->valuestring, "iot") == 0) { auto commands = cJSON_GetObjectItem(root, "commands"); if (commands != NULL) { auto& thing_manager = iot::ThingManager::GetInstance(); for (int i = 0; i < cJSON_GetArraySize(commands); ++i) { auto command = cJSON_GetArrayItem(commands, i); thing_manager.Invoke(command); } } } }); //启动协议 protocol_->Start(); //检测OTA的版本，如果版本比较低则进行升级 // Check for new firmware version or get the MQTT broker address ota_.SetCheckVersionUrl(CONFIG_OTA_VERSION_URL); ota_.SetHeader("Device-Id", SystemInfo::GetMacAddress().c_str()); ota_.SetHeader("Client-Id", board.GetUuid()); ota_.SetHeader("Accept-Language", Lang::CODE); auto app_desc = esp_app_get_description(); ota_.SetHeader("User-Agent", std::string(BOARD_NAME "/") + app_desc->version); xTaskCreate([](void* arg) { Application* app = (Application*)arg; app->CheckNewVersion(); vTaskDelete(NULL); }, "check_new_version", 4096 * 2, this, 2, nullptr); #if CONFIG_USE_AUDIO_PROCESSOR //初始化音频处理，主要是降噪，回声消除，VAD检测等。 audio_processor_.Initialize(codec->input_channels(), codec->input_reference()); audio_processor_.OnOutput([this](std::vector<int16_t>&& data) { background_task_->Schedule([this, data = std::move(data)]() mutable { opus_encoder_->Encode(std::move(data), [this](std::vector<uint8_t>&& opus) { //如果启动了音效处理，注册ouput的输出回调。 Schedule([this, opus = std::move(opus)]() { protocol_->SendAudio(opus); }); }); }); }); //注册VAD状态变化 audio_processor_.OnVadStateChange([this](bool speaking) { if (device_state_ == kDeviceStateListening) { Schedule([this, speaking]() { if (speaking) { voice_detected_ = true; } else { voice_detected_ = false; } auto led = Board::GetInstance().GetLed(); led->OnStateChanged();//只点个灯？？ }); } }); #endif #if CONFIG_USE_WAKE_WORD_DETECT //启动唤醒检测，初始化唤醒 wake_word_detect_.Initialize(codec->input_channels(), codec->input_reference()); //唤醒词处理回调函数，其中获取到的唤醒词是字符串，还包括获取处理唤醒词的音频编解码 //唤醒词音频部分是否仅仅是唤醒词部分，还包含其他内容数据？需要确认 wake_word_detect_.OnWakeWordDetected([this](const std::string& wake_word) { Schedule([this, &wake_word]() { //如果是idle状态，主要逻辑是，处理业务为连接网络，编码唤醒词，重开唤醒检测 //推送唤醒的音频数据和预料字符串到云端服务器。 if (device_state_ == kDeviceStateIdle) { SetDeviceState(kDeviceStateConnecting); //将唤醒音频内容进行编码 wake_word_detect_.EncodeWakeWordData(); if (!protocol_->OpenAudioChannel()) { //重新再次打开唤醒检测， wake_word_detect_.StartDetection(); return; } //哪些情况会停止唤醒检测：1 检测到唤醒词后会停止。2.处于listening的时候会停止。3.OTA升级过程会停止 std::vector<uint8_t> opus; //编码并将唤醒数据推送到服务器（除了唤醒词可能还包括说话数据？） // Encode and send the wake word data to the server while (wake_word_detect_.GetWakeWordOpus(opus)) { protocol_->SendAudio(opus); } //发送唤醒词的字符串 // Set the chat state to wake word detected protocol_->SendWakeWordDetected(wake_word); ESP_LOGI(TAG, "Wake word detected: %s", wake_word.c_str()); keep_listening_ = true; SetDeviceState(kDeviceStateIdle); } else if (device_state_ == kDeviceStateSpeaking) { //如果说话状态，则将说话进行停止，设置一个停止标志位，并发送停止speak给服务不要再发opus了？ AbortSpeaking(kAbortReasonWakeWordDetected); } else if (device_state_ == kDeviceStateActivating) { SetDeviceState(kDeviceStateIdle); } }); }); //启动唤醒检测 wake_word_detect_.StartDetection(); #endif //设置状态为IDLE状态 SetDeviceState(kDeviceStateIdle); esp_timer_start_periodic(clock_timer_handle_, 1000000); mainloop void Application::MainLoop() { while (true) { auto bits = xEventGroupWaitBits(event_group_, SCHEDULE_EVENT | AUDIO_INPUT_READY_EVENT | AUDIO_OUTPUT_READY_EVENT, pdTRUE, pdFALSE, portMAX_DELAY); //处理录音音频处理，将收到的音频做处理送到队列 if (bits & AUDIO_INPUT_READY_EVENT) { InputAudio(); } //处理云端音频处理，将编码的音频进行解码送播放器 if (bits & AUDIO_OUTPUT_READY_EVENT) { OutputAudio(); } //处理其他任务的队列 if (bits & SCHEDULE_EVENT) { std::unique_lock<std::mutex> lock(mutex_); std::list<std::function<void()>> tasks = std::move(main_tasks_); lock.unlock(); for (auto& task : tasks) { task(); } } } } 录音通路录音处理 // I2S收到音频，触发app应用注册的回调函数通知函数codec->OnInputReady,如下 //通知有数据了，实际读数据通过Read去读。 IRAM_ATTR bool AudioCodec::on_recv(i2s_chan_handle_t handle, i2s_event_data_t *event, void *user_ctx) { auto audio_codec = (AudioCodec*)user_ctx; if (audio_codec->input_enabled_ && audio_codec->on_input_ready_) { return audio_codec->on_input_ready_(); } return false; } //通过eventsetbit触发通知mainloop线程处理音频 codec->OnInputReady([this, codec]() { BaseType_t higher_priority_task_woken = pdFALSE; xEventGroupSetBitsFromISR(event_group_, AUDIO_INPUT_READY_EVENT, &higher_priority_task_woken); return higher_priority_task_woken == pdTRUE; }); //在mainloop中触发Application::InputAudio() void Application::InputAudio() { //获取codec的实例 auto codec = Board::GetInstance().GetAudioCodec(); std::vector<int16_t> data; //获取codec的音频pcm数据存到data中。 if (!codec->InputData(data)) { return;//如果数据为空，直接返回 } //如果采样率不是16Khz，需要进行重采样 if (codec->input_sample_rate() != 16000) { if (codec->input_channels() == 2) { auto mic_channel = std::vector<int16_t>(data.size() / 2); auto reference_channel = std::vector<int16_t>(data.size() / 2); for (size_t i = 0, j = 0; i < mic_channel.size(); ++i, j += 2) { mic_channel[i] = data[j]; reference_channel[i] = data[j + 1]; } auto resampled_mic = std::vector<int16_t>(input_resampler_.GetOutputSamples(mic_channel.size())); auto resampled_reference = std::vector<int16_t>(reference_resampler_.GetOutputSamples(reference_channel.size())); input_resampler_.Process(mic_channel.data(), mic_channel.size(), resampled_mic.data()); reference_resampler_.Process(reference_channel.data(), reference_channel.size(), resampled_reference.data()); data.resize(resampled_mic.size() + resampled_reference.size()); for (size_t i = 0, j = 0; i < resampled_mic.size(); ++i, j += 2) { data[j] = resampled_mic[i]; data[j + 1] = resampled_reference[i]; } } else { auto resampled = std::vector<int16_t>(input_resampler_.GetOutputSamples(data.size())); input_resampler_.Process(data.data(), data.size(), resampled.data()); data = std::move(resampled); } } //如果启动了唤醒检测，判断唤醒检测是否还在运行，如果还在运行将当前的数据合并到唤醒 //检测的buffer中。 #if CONFIG_USE_WAKE_WORD_DETECT if (wake_word_detect_.IsDetectionRunning()) { wake_word_detect_.Feed(data); //会将当前的数据喂给AFE接口，用于做唤醒词 //唤醒词也直接送到云端了？？？ } #endif //如果打开了音效处理，将音频数据push到音效处理中，直接返回 #if CONFIG_USE_AUDIO_PROCESSOR if (audio_processor_.IsRunning()) { audio_processor_.Input(data); } #else //如果没有打开音效处理，判断当前的状态是否是监听状态，如果是将音频进行编码 //然后推送到远端服务中。 if (device_state_ == kDeviceStateListening) { background_task_->Schedule([this, data = std::move(data)]() mutable { opus_encoder_->Encode(std::move(data), [this](std::vector<uint8_t>&& opus) { Schedule([this, opus = std::move(opus)]() { protocol_->SendAudio(opus); }); }); }); } #endif } 音效处理以下是音效处理过程 //将数据喂给AFE模块，当处理完了之后会触发回调？ void AudioProcessor::Input(const std::vector<int16_t>& data) { input_buffer_.insert(input_buffer_.end(), data.begin(), data.end()); auto feed_size = afe_iface_->get_feed_chunksize(afe_data_) * channels_; while (input_buffer_.size() >= feed_size) { auto chunk = input_buffer_.data(); afe_iface_->feed(afe_data_, chunk); input_buffer_.erase(input_buffer_.begin(), input_buffer_.begin() + feed_size); } } void AudioProcessor::AudioProcessorTask() { auto fetch_size = afe_iface_->get_fetch_chunksize(afe_data_); auto feed_size = afe_iface_->get_feed_chunksize(afe_data_); ESP_LOGI(TAG, "Audio communication task started, feed size: %d fetch size: %d", feed_size, fetch_size); while (true) { //获取到PROCESSOR_RUNNING后，不会清除bit（第三个参数），也就说会再次得到运行。 //也就是说AudioProcessor::Start()后，这个会循环运行，直到调用Stop清除。 xEventGroupWaitBits(event_group_, PROCESSOR_RUNNING, pdFALSE, pdTRUE, portMAX_DELAY); //等待获取处理后的数据。 auto res = afe_iface_->fetch_with_delay(afe_data_, portMAX_DELAY); if ((xEventGroupGetBits(event_group_) & PROCESSOR_RUNNING) == 0) { continue; } if (res == nullptr || res->ret_value == ESP_FAIL) { if (res != nullptr) { ESP_LOGI(TAG, "Error code: %d", res->ret_value); } continue; } // VAD state change if (vad_state_change_callback_) { if (res->vad_state == VAD_SPEECH && !is_speaking_) { is_speaking_ = true; vad_state_change_callback_(true); } else if (res->vad_state == VAD_SILENCE && is_speaking_) { is_speaking_ = false; vad_state_change_callback_(false); } } //获取到数据，将数据回调给app->audio_processor_.OnOutput if (output_callback_) { output_callback_(std::vector<int16_t>(res->data, res->data + res->data_size / sizeof(int16_t))); } } } //处理的音效数据的回调，将数据进行编码，然后推送到云端服务器。 audio_processor_.OnOutput([this](std::vector<int16_t>&& data) { background_task_->Schedule([this, data = std::move(data)]() mutable { opus_encoder_->Encode(std::move(data), [this](std::vector<uint8_t>&& opus) { Schedule([this, opus = std::move(opus)]() { protocol_->SendAudio(opus); }); }); }); }); 播放通路 //1. 通过解析输入的json来启动状态的切换。 protocol_->OnIncomingJson([this, display](const cJSON* root) { // Parse JSON data auto type = cJSON_GetObjectItem(root, "type"); if (strcmp(type->valuestring, "tts") == 0) { auto state = cJSON_GetObjectItem(root, "state"); //收到云端音频，云端会发送start，需要切换到speaking状态。 if (strcmp(state->valuestring, "start") == 0) { Schedule([this]() { aborted_ = false; if (device_state_ == kDeviceStateIdle || device_state_ == kDeviceStateListening) { SetDeviceState(kDeviceStateSpeaking); } }); //本次话题结束后，云端会发送stop，可切换到idle。 } else if (strcmp(state->valuestring, "stop") == 0) { Schedule([this]() { if (device_state_ == kDeviceStateSpeaking) { background_task_->WaitForCompletion(); if (keep_listening_) { protocol_->SendStartListening(kListeningModeAutoStop); SetDeviceState(kDeviceStateListening); } else { SetDeviceState(kDeviceStateIdle); } } }); } else if (strcmp(state->valuestring, "sentence_start") == 0) { auto text = cJSON_GetObjectItem(root, "text"); if (text != NULL) { ESP_LOGI(TAG, "<< %s", text->valuestring); Schedule([this, display, message = std::string(text->valuestring)]() { display->SetChatMessage("assistant", message.c_str()); }); } } //2.解析到云端的json后，会发生状态的迁移 void Application::SetDeviceState(DeviceState state) { if (device_state_ == state) { return; } clock_ticks_ = 0; auto previous_state = device_state_; device_state_ = state; ESP_LOGI(TAG, "STATE: %s", STATE_STRINGS[device_state_]); // The state is changed, wait for all background tasks to finish background_task_->WaitForCompletion(); //如果后台有线程还在运行，等待运行结束 auto& board = Board::GetInstance(); auto codec = board.GetAudioCodec(); auto display = board.GetDisplay(); auto led = board.GetLed(); led->OnStateChanged(); switch (state) { case kDeviceStateUnknown: case kDeviceStateIdle: //idle状态，显示"待命" display->SetStatus(Lang::Strings::STANDBY); display->SetEmotion("neutral"); #if CONFIG_USE_AUDIO_PROCESSOR //关掉音效处理 audio_processor_.Stop(); #endif #if CONFIG_USE_WAKE_WORD_DETECT //开启语音唤醒检测 wake_word_detect_.StartDetection(); #endif break; case kDeviceStateConnecting: //连接状态，表示连接服务器 display->SetStatus(Lang::Strings::CONNECTING); display->SetEmotion("neutral"); display->SetChatMessage("system", ""); break; case kDeviceStateListening: //说话状态，显示说话中 display->SetStatus(Lang::Strings::LISTENING); display->SetEmotion("neutral"); //复位解码器，清除掉原来的 ResetDecoder(); //复位编码器的状态 opus_encoder_->ResetState(); #if CONFIG_USE_AUDIO_PROCESSOR //启动音效处理（回声消除？） audio_processor_.Start(); #endif #if CONFIG_USE_WAKE_WORD_DETECT //关闭唤醒检测 wake_word_detect_.StopDetection(); #endif //更新IOT状态 UpdateIotStates(); if (previous_state == kDeviceStateSpeaking) { // FIXME: Wait for the speaker to empty the buffer vTaskDelay(pdMS_TO_TICKS(120)); } break; case kDeviceStateSpeaking: display->SetStatus(Lang::Strings::SPEAKING); //复位解码器 ResetDecoder(); //使能codec输出 codec->EnableOutput(true); #if CONFIG_USE_AUDIO_PROCESSOR //音效处理停止 audio_processor_.Stop(); #endif #if CONFIG_USE_WAKE_WORD_DETECT //开启唤醒检测 wake_word_detect_.StartDetection(); #endif break; default: // Do nothing break; } } //3. 接收云端音频数据的回调，如果是speak状态，将数据入队到队列 protocol_->OnIncomingAudio([this](std::vector<uint8_t>&& data) { std::lock_guard<std::mutex> lock(mutex_); if (device_state_ == kDeviceStateSpeaking) { audio_decode_queue_.emplace_back(std::move(data)); } }); //4.当音频输出准备好后，不会不断的调用这个回调？？触发mainloop调用OutputAudio codec->OnOutputReady([this]() { BaseType_t higher_priority_task_woken = pdFALSE; xEventGroupSetBitsFromISR(event_group_, AUDIO_OUTPUT_READY_EVENT, &higher_priority_task_woken); return higher_priority_task_woken == pdTRUE; }); //5. output处理 void Application::OutputAudio() { auto now = std::chrono::steady_clock::now(); auto codec = Board::GetInstance().GetAudioCodec(); const int max_silence_seconds = 10; std::unique_lock<std::mutex> lock(mutex_); //判断解码队列是否为空，如果为空，把codec输出关了，也就是不要再触发回调 if (audio_decode_queue_.empty()) { // Disable the output if there is no audio data for a long time if (device_state_ == kDeviceStateIdle) { auto duration = std::chrono::duration_cast<std::chrono::seconds>(now - last_output_time_).count(); if (duration > max_silence_seconds) { codec->EnableOutput(false); } } return; } //如果是在监听状态，清除掉解码队列，直接返回 if (device_state_ == kDeviceStateListening) { audio_decode_queue_.clear(); return; } //获取编码的数据 last_output_time_ = now; auto opus = std::move(audio_decode_queue_.front()); audio_decode_queue_.pop_front(); lock.unlock(); //将解码数据添加到调度中进行解码播放 background_task_->Schedule([this, codec, opus = std::move(opus)]() mutable { //如果禁止标志位置起，直接退出。在打断唤醒的时候回置起 if (aborted_) { return; } std::vector<int16_t> pcm; //解码为pcm if (!opus_decoder_->Decode(std::move(opus), pcm)) { return; } //如果云端的采样率和codec采样率不一样，进行重采样。 // Resample if the sample rate is different if (opus_decode_sample_rate_ != codec->output_sample_rate()) { int target_size = output_resampler_.GetOutputSamples(pcm.size()); std::vector<int16_t> resampled(target_size); output_resampler_.Process(pcm.data(), pcm.size(), resampled.data()); pcm = std::move(resampled); } //播放音频 codec->OutputData(pcm); }); }

🕒 2025-04-03 📁 Ai应用 👤 laumy 🔥 781 热度
2条命令本地部署deepseek

环境是centos，下面是部署步骤。命令1：安装ollama 安装命令：curl -fsSL https://ollama.com/install.sh | sh 安装日志： >>> Cleaning up old version at /usr/local/lib/ollama >>> Installing ollama to /usr/local >>> Downloading Linux amd64 bundle ######################################################################## 100.0% >>> Creating ollama user... >>> Adding ollama user to video group... >>> Adding current user to ollama group... >>> Creating ollama systemd service... >>> Enabling and starting ollama service... Created symlink from /etc/systemd/system/default.target.wants/ollama.service to /etc/systemd/system/ollama.service. >>> The Ollama API is now available at 127.0.0.1:11434. >>> Install complete. Run "ollama" from the command line. 命令2：下载deepseek模型安装命令：ollama run deepseek-r1:7b 安装完成后，会直接进入交互控制台： pulling manifest pulling 96c415656d37... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 4.7 GB pulling 369ca498f347... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 387 B pulling 6e4c38e1172f... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.1 KB pulling f4d24e9138dd... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 148 B pulling 40fb844194b2... 100% ▕█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ 487 B verifying sha256 digest writing manifest success >>> hello <think> </think> Hello! How can I assist you today? 😊 >>> 你好 <think> </think> 你好！有什么我可以帮助你的吗？😊 >>> 你是什么模型 <think> </think> 您好！我是由中国的深度求索（DeepSeek）公司开发的智能助手DeepSeek-R1。如您有任何任何问题，我会尽我所能为您提供帮助。运行时，如果加上--verbose可以查看运行性能参数，如下： total duration: 379.511567ms load duration: 14.749448ms prompt eval count: 60 token(s) prompt eval duration: 15.863495ms prompt eval rate: 3782.27 tokens/s eval count: 64 token(s) eval duration: 322.980292ms eval rate: 198.15 tokens/s total duration:总耗时379.51ms，表示从请求开始到响应完成的整体处理时间 load duration: 模型加载耗时14.75ms，可能涉及模型初始化或数据加载阶段的时间消耗 prompt eval count:输入提示词（prompt）解析的token数量为60个 prompt eval duration:提示词解析耗时15.86ms，反映模型对输入文本的预处理效率 prompt eval rate: 提示词解析速率3782.27 tokens/s，属于高性能表现（通常千级tokens/s为优秀） eval count: 生成输出的token数量为64个 eval duration: 生成耗时322.98ms，占整体耗时的主要部分。 eval rate: 生成速率198.15 tokens/s，属于典型的大模型推理速度（百级tokens/s为常见范围） GGUF导入部署这种方式可以通过导入GUFF格式的大模型，GUFF格式大模型可以从Hugging Face获取https://huggingface.co/。也可以在modelscope上获取https://modelscope.cn/models。首先从Hugging Face或者modelscope下载GGUF格式的模型，然后部署主要分为两个步骤创建模型通过create指定模型modelfile。 ollama create qwen2.5:7b -f qwen2.5-7b.modelfile modelfile内容如下，指定了模型的路径，模型配置文件描述了模型的参数，更多信息这里不做阐述。 FROM "./qwen2.5-7b-instruct-q4_0.gguf" 运行模型列出模型 ollama list 运行模型 verbose参数可以打印性能。 ollama run qwen2.5:7b --verbose 也可以使用ollama pull从ollama官方下载，https://ollama.com/search 支持API访问修改ollama的本地端口 /etc/systemd/system/ollama.service [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin" Environment="OLLAMA_HOST=0.0.0.0" [Install] WantedBy=default.target 然后重新启动 systemctl daemon-reload systemctl restart ollama 确认是否启动成功： sudo netstat -tulpn | grep 11434 # 确认监听0.0.0.0:11434:cite[3]:cite[6] 远程API调用示例 # 查询API版本（验证连通性） curl http://<服务器公网IP>:11434/api/version # 发送生成请求 curl http://localhost:11434/api/generate -d "{\\\\\\\\\\\\\\\\"model\\\\\\\\\\\\\\\\": \\\\\\\\\\\\\\\\"deepseek-r1:7b\\\\\\\\\\\\\\\\", \\\\\\\\\\\\\\\\"prompt\\\\\\\\\\\\\\\\": \\\\\\\\\\\\\\\\"为什么草是绿的\\\\\\\\\\\\\\\\"}" 参考：https://github.com/datawhalechina/handy-ollama/blob/main/docs/C4/1.%20Ollama%20API%20%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97.md 支持web聊天安装docker 如果要按照网页版的聊天需要安装open ui，先安装docker。（1）更新系统 sudo yum update -y （2）Docker 需要一些依赖包，你可以通过运行以下命令来安装： sudo yum install -y yum-utils device-mapper-persistent-data lvm2 （3）更新本地镜像源 sudo yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo sed -i 's/download.docker.com/mirrors.aliyun.com\\\\\\\\\\\\\\\\/docker-ce/g' /etc/yum.repos.d/docker-ce.repo yum makecache fast （4）安装docker sudo yum install -y docker-ce （5）设置开机自启动 sudo systemctl start docker sudo systemctl enable docker （6）验证 sudo docker --version systemctl status docker docker安装open webui 拉取并运行 Open WebUI 容器，将容器端口 8080 映射到主机 3000 端口 docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main 如果3000端口被占用了，会报错，重新启动也会提示错误如下。报错解决： docker run -d -p 6664:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main docker: Error response from daemon: Conflict. The container name "/open-webui" is already in use by container "88f6e12e8e3814038911c30d788cb222d0792a9fc0af45f41140e07186e62a16". You have to remove (or rename) that container to be able to reuse that name. 你遇到的问题是 Docker 容器名称冲突。错误消息表明，容器名称 /open-webui 已经被另一个正在运行的容器占用，因此你无法启动新的容器。（1）查看当前运行的容器： docker ps -a 88f6e12e8e38 ghcr.io/open-webui/open-webui:main "bash start.sh" 3 minutes ago Created open-webui （2）停止并删除已有的容器 docker stop open-webui docker rm open-webui 登录网址https://xxx:6664 配置即可访问。

🕒 2025-02-10 📁 Ai应用 👤 laumy 🔥 442 热度
豆包大模型接入体验

前置条件需要先创建获得API key和创建推理接入点。 API key获取 https://www.volcengine.com/docs/82379/1361424#f79da451 创建推理接入点 https://www.volcengine.com/docs/82379/1099522 安装python环境 python版本需要安装到Python 2.7或以上版本。执行python --version可以检查当前Python的版本信息。我这里的版本已经到3.8.10 python3 --version Python 3.8.10 接着安装豆包sdk pip install volcengine-python-sdk Collecting volcengine-python-sdk Downloading volcengine-python-sdk-1.0.118.tar.gz (3.1 MB) |████████████████████████████████| 3.1 MB 9.7 kB/s Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from volcengine-python-sdk) (2019.11.28) Requirement already satisfied: python-dateutil>=2.1 in /usr/lib/python3/dist-packages (from volcengine-python-sdk) (2.7.3) Requirement already satisfied: six>=1.10 in /usr/lib/python3/dist-packages (from volcengine-python-sdk) (1.14.0) Requirement already satisfied: urllib3>=1.23 in /usr/lib/python3/dist-packages (from volcengine-python-sdk) (1.25.8) Building wheels for collected packages: volcengine-python-sdk Building wheel for volcengine-python-sdk (setup.py) ... done Created wheel for volcengine-python-sdk: filename=volcengine_python_sdk-1.0.118-py3-none-any.whl size=10397043 sha256=c4546246eb0ef4e1c68e8047c6f2773d601821bd1acb7bc3a6162919f161423b Stored in directory: /home/apple/.cache/pip/wheels/d2/dc/23/70fa1060e1a527a290fc87a35469401b7588cdb51a2b75797d Successfully built volcengine-python-sdk Installing collected packages: volcengine-python-sdk Successfully installed volcengine-python-sdk-1.0.118 需要更新 pip install --upgrade 'volcengine-python-sdk[ark]' Requirement already up-to-date: volcengine-python-sdk[ark] in /home/apple/.local/lib/python3.8/site-packages (1.0.118) Requirement already satisfied, skipping upgrade: urllib3>=1.23 in /usr/lib/python3/dist-packages (from volcengine-python-sdk[ark]) (1.25.8) Requirement already satisfied, skipping upgrade: six>=1.10 in /usr/lib/python3/dist-packages (from volcengine-python-sdk[ark]) (1.14.0) Requirement already satisfied, skipping upgrade: python-dateutil>=2.1 in /usr/lib/python3/dist-packages (from volcengine-python-sdk[ark]) (2.7.3) Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from volcengine-python-sdk[ark]) (2019.11.28) Collecting cryptography<43.0.4,>=43.0.3; extra == "ark" Downloading cryptography-43.0.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB) |████████████████████████████████| 4.0 MB 1.7 MB/s Collecting httpx<1,>=0.23.0; extra == "ark" Downloading httpx-0.28.1-py3-none-any.whl (73 kB) |████████████████████████████████| 73 kB 1.0 MB/s Collecting pydantic<3,>=1.9.0; extra == "ark" Downloading pydantic-2.10.4-py3-none-any.whl (431 kB) |████████████████████████████████| 431 kB 1.6 MB/s Collecting anyio<5,>=3.5.0; extra == "ark" Downloading anyio-4.5.2-py3-none-any.whl (89 kB) |████████████████████████████████| 89 kB 1.8 MB/s Collecting cffi>=1.12; platform_python_implementation != "PyPy" Downloading cffi-1.17.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (446 kB) |████████████████████████████████| 446 kB 1.2 MB/s Collecting httpcore==1.* Downloading httpcore-1.0.7-py3-none-any.whl (78 kB) |████████████████████████████████| 78 kB 1.8 MB/s Requirement already satisfied, skipping upgrade: idna in /usr/lib/python3/dist-packages (from httpx<1,>=0.23.0; extra == "ark"->volcengine-python-sdk[ark]) (2.8) Collecting pydantic-core==2.27.2 Downloading pydantic_core-2.27.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB) |████████████████████████████████| 2.0 MB 1.0 MB/s Collecting typing-extensions>=4.12.2 Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB) Collecting annotated-types>=0.6.0 Downloading annotated_types-0.7.0-py3-none-any.whl (13 kB) Collecting exceptiongroup>=1.0.2; python_version < "3.11" Downloading exceptiongroup-1.2.2-py3-none-any.whl (16 kB) Collecting sniffio>=1.1 Downloading sniffio-1.3.1-py3-none-any.whl (10 kB) Collecting pycparser Downloading pycparser-2.22-py3-none-any.whl (117 kB) |████████████████████████████████| 117 kB 2.9 MB/s Collecting h11<0.15,>=0.13 Downloading h11-0.14.0-py3-none-any.whl (58 kB) |████████████████████████████████| 58 kB 3.3 MB/s Installing collected packages: pycparser, cffi, cryptography, h11, httpcore, exceptiongroup, typing-extensions, sniffio, anyio, httpx, pydantic-core, annotated-types, pydantic WARNING: The script httpx is installed in '/home/apple/.local/bin' which is not on PATH. Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. Successfully installed annotated-types-0.7.0 anyio-4.5.2 cffi-1.17.1 cryptography-43.0.3 exceptiongroup-1.2.2 h11-0.14.0 httpcore-1.0.7 httpx-0.28.1 pycparser-2.22 pydantic-2.10.4 pydantic-core-2.27.2 sniffio-1.3.1 typing-extensions-4.12.2 测试单张图片测试 vim test.py import os # 通过 pip install volcengine-python-sdk[ark] 安装方舟SDK from volcenginesdkarkruntime import Ark # 替换为您的模型推理接入点 model="ep-20250101121404-stw4s" # 初始化Ark客户端，从环境变量中读取您的API Key client = Ark( api_key=os.getenv('ARK_API_KEY'), ) # 创建一个对话请求 response = client.chat.completions.create( # 指定您部署了视觉理解大模型的推理接入点ID model = model, messages = [ { "role": "user", # 指定消息的角色为用户 "content": [ # 消息内容列表 {"type": "text", "text":"这张图片讲了什么？"}, # 文本消息 { "type": "image_url", # 图片消息 # 图片的URL，需要大模型进行理解的图片链接 "image_url": {"url": "http://www.laumy.tech/wp-content/uploads/2024/12/wp_editor_md_7a3e5882d13fb51eecfaaf7fc8c53b59.jpg"} }, ], } ], ) print(response.choices[0]) 执行返回结果 python3 test.py Choice( finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage( content='这张图片展示了一个WebRTC（Web实时通信）的流程示意图，涉及到PC（个人计算机）、MQTT代理（mqtt broker）和CEMARA设备。以下是流程图的主要步骤： \n\n1. **PC端操作**： \n - **连接和订阅**：PC端首先进行连接（connect），然后订阅相关主题（"webrtc/id/jsonrpc"和"webrtc/id/jsonrpc-replay"）。 \n - **发布消息**：PC端发布消息（pub），发送"offer"请求（offer (req)）。 \n - **接收消息**：PC端接收来自MQTT代理的消息，包括"message"事件和相关的应答（res）。 \n - **创建应答**：PC端创建应答（pc.createAnswer），并设置远程描述（pc.setRemoteDescription）。 \n\n2. **STUN/TURN服务器交互**： \n - **STUN/TURN绑定请求和应答**：在STUN/TURN服务器上，PC端发起绑定请求（binding req）和应答（binding res），获取SDP（Session Description Protocol）信息。 \n - **ANSWER请求和应答**：PC端发送ANSWER请求（anser (req)），并接收ANSWER应答（anser (res)）。 \n\n3. **检查和连接过程**： \n - **检查连接**：PC端按照优先级顺序检查连接的顺畅性（host、srflx、relay）。 \n - **连接完成**：经过一系列的检查和交互，PC端与CEMARA设备成功连接（CONNECTED）。\n\n4. **数据交互和完成**： \n - **数据交互**：PC端和CEMARA设备开始进行数据交互（agent_send和agent_recv）。 \n - **完成状态**：数据交互完成后，流程进入“COMPLETED”状态，表示整个WebRTC通信过程结束。 \n\n整个流程图清晰地展示了WebRTC通信过程中PC端与MQTT代理以及STUN/TURN服务器之间的交互过程，包括连接、消息发布、应答接收、绑定请求、检查连接等步骤，最终实现了PC端与CEMARA设备的数据通信。', role='assistant', function_call=None, tool_calls=None, audio=None ) )

🕒 2025-01-01 📁 Ai应用 👤 laumy 🔥 1701 热度

« 上一页 1 2 3 4