考虑去问一下那个学长，我是在本地搭建环境跑，还是在colab上面跑

本地有4070移动端GPU，完全可以流畅、高效地完成CS231n所有作业。体验会优于Colab免费版！

那还是在本地运行吧，正好学习一下配环境

一边学一边做是官方和大多数同学推荐的路线。每完成一章lecture就尝试对应部分的代码，实现理论与实践结合。

为了避免稀奇古怪的问题，目前Lecture slides放在D:\Documents_Private\大学\大二上\cs231n，路径包含中文，不好。在D盘整了个cs231n

git remote add origin https://github.com/1JI0O/cs231n_assignments.git
https://github.com/1JI0O/cs231n_assignments.git
 
git --set-upstream origin master

环境配置

Software Setup 本地环境配置

即使 Anaconda/Miniconda 已经装在 C 盘，你也可以新建环境时指定放在 D 盘：

conda create --prefix D:\conda_envs\cs231n python=3.7

激活环境时也要用路径：

conda activate D:\conda_envs\cs231n

妈的，我现在算是想起来了，要使用anaconda prompt执行activate和创建之类的工作

关于requirements.txt numpy matplotlib jupyter scipy torch torchvision tqdm

其中，torch相关要和本地cuda匹配 Get Started

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126

更新了镜像源

conda config --add channels https://mirrors.ustc.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.ustc.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.ustc.edu.cn/anaconda/cloud/conda-forge/
conda config --set show_channel_urls yes

还是有问题，取消使用镜像站，还是有问题，考虑使用manba 在base中安装manba

mamba create -p D:\conda_envs\cs231n python=3.7

mamba activate D:\conda_envs\cs231n

好极了，还需要用管理员模式运行anaconda prompt，妈的很好，mamba安装成功了太快了，mamba真的太快了！conda之于manba如同550c之于550w

mamba shell init --shell cmd.exe --root-prefix=~/.local/share/mamba

你已经正确执行了 mamba 的初始化命令： 接下来需要做的是：

关闭当前所有命令行窗口（CMD/Anaconda Prompt/PowerShell 等）。
重新打开一个新的 CMD 窗口。
再激活你的 cs231n 环境

很好，py3.7已经彻底过时了，需要换成至少3.8，那就直接上3.9吧

mamba env remove -p D:\conda_envs\cs231n

还是需要管理员模式运行

mamba create -p D:\conda_envs\cs231n python=3.9

mamba activate D:\conda_envs\cs231n

安装pytorch，还是得用这个

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126

08点25分 2025-10-30 太好了，终于装完了

jupyter notebook

Assignment1

Assignment 1

使用wsl，

/mnt/d/cs231n/assignment1/cs231n/datasets$ ./get_datasets.sh

KNN

还需要装 imageio

num_training = 5000
mask = list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]

这里 mask = list(range(num_training)) 等价于 [0, 1, 2, ..., 4999]。
X_train[mask] 表示把原始训练集 X_train 的前5000个样本取出来（即下标为0到4999的那部分）。
同理，y_train[mask] 也是取出前5000个标签。

需要安装：future

from past.builtins import xrange 是 CS231n 代码为了兼容 Python 2/3 写法，所以需要 future 这个库。
future 安装后会自动包含 past 模块。

First, open cs231n/classifiers/k_nearest_neighbor.py and implement the function compute_distances_two_loops that uses a (very inefficient) double loop over all pairs of (test, train) examples and computes the distance matrix one element at a time. 需要先实现py文件，然后在notebook中测试

L2距离（欧氏距离）

d_{2} (I_{1}, I_{2}) = p \sum (I_{1}^{p} - I_{2}^{p})^{2}

Inline Question 1

Notice the structured patterns in the distance matrix, where some rows or columns are visibly brighter. (Note that with the default color scheme black indicates low distances while white indicates high distances.)

What in the data is the cause behind the distinctly bright rows?
What causes the columns?

一行是一个测试样本，如果这一行特别亮，说明这一行算出来的距离特别大，说明这个测试样本可能不属于训练样本中的任何一个种类，也可能是差异太大了

一列是一个训练样本，亮说明这个训练样本类别比较独特，说明它和大多测试样本不相似

图像分析：这个图中每个点代表算出来的测试样本i和训练样本j的L2距离，[i,j]也就是行和列

#平方展开后运算 (x-y)**2 = x**2 + y**2 - 2*x*y
# (num_test, 1)
X_square = np.sum(X**2, axis=1, keepdims=True)
# (1, num_train)
X_train_square = np.sum(self.X_train**2, axis=1, keepdims=True).T
# (num_test, num_train)
# @ 内积
cross_term = X @ self.X_train.T
 
dists = np.sqrt(X_square + X_train_square - 2 * cross_term)
# X_square 自动复制成 (num_test, num_train)，每一列都一样。
# X_train_square 自动复制成 (num_test, num_train)，每一行都一样。

2025-10-30 19点31分

完成了1-1 knn
感觉我的效率好低，速度好慢啊，该死

唉，似乎用vscode可视化提交和推送会出问题，那么还是在终端进行commit和push吧

Softmax

mamba activate D:\conda_envs\cs231n
D:
jupyter notebook

关于mask

mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

目的：从原始训练集（X_train, y_train）中选出一部分作为验证集，用于调参和模型早停等。
mask的取法：生成一个范围为 [num_training, num_training + num_validation) 的索引（例如 49000~49999），即取出原始训练集最后1000个数据点作为验证集。

关于softmax的梯度怎么算

对于每个类别j：
- 如果 j 是正确类别：梯度 = X[i] * (p[j] - 1)
- 如果 j 不是正确类别：梯度 = X[i] * p[j]
其中，p = exp(scores) / sum(exp(scores))，也就是概率

正则项 reg * sum(W * W) 对 W 的导数就是 2 * reg * W （这道题原有代码用的是L2正则化）

softmax梯度为什么要分情况讨论

在计算softmax的梯度时，我们需要分别讨论 $i = j$ 和 $i \neq = j$ 两种情况。
这是因为softmax的每个输出不仅依赖于自己的输入，还依赖于所有类别的输入。

1. softmax的数学定义

对于输入向量 $z$ ，softmax输出 $y_{i}$ ：

y_{i} = \frac{e ^{z_{i}}}{\sum _{k} e ^{z_{k}}}

2. 求梯度（雅可比矩阵）

我们要求 $y_{i}$ 对 $z_{j}$ 的偏导数 $\frac{\partial y _{i}}{\partial z _{j}}$ 。

当 $i = j$ 时： $\frac{\partial y _{i}}{\partial z _{j}} = y_{i} \cdot (1 - y_{i})$
当 $i \neq = j$ 时： $\frac{\partial y _{i}}{\partial z _{j}} = - y_{i} \cdot y_{j}$

3. 为什么要分情况

$i = j$ 时，求的是“自己对自己的影响”。
$i \neq = j$ 时，求的是“其他类别的输入对当前输出的影响”。

这是因为softmax的分母是所有类别的输入的指数和，导致每个输出对每个输入都有关联。

4. 总结

softmax的梯度分情况是因为每个类别的输出都受所有输入影响，只有分情况才能准确描述每一项的变化关系。

softmax计算减去最大值

scores -= np.max(scores)

softmax 前先减去最大值，是为了防止指数函数计算时出现数值溢出或下溢，保证计算稳定，结果不会变。

为什么减去最大值不会影响结果？

softmax 的分子和分母都乘以同一个数 e−C（C是常数，不影响比例）： $\frac{e ^{x_{i} - C}}{\sum _{j} e ^{x_{j} - C}} = \frac{e ^{x_{i}}}{\sum _{j} e ^{x_{j}}}$
所以，无论你减去什么常数，结果都一样。

为什么选最大值？

选最大值可以让所有 $x_{i} - max (x)$ 都是负数或0，指数不会太大，避免溢出。
是最安全、最常用的做法。

避免溢出

显然

避免下溢

假如大家都很小，比方说-1000 -1001 -1002，减去最大值-1000，变成了0 -1 -2，没那么小了

X.shape[0]和X[i]的意义是什么

X 是你的输入数据，shape 是 (N, D)：
- N：样本数量（minibatch 的样本数）
- D：每个样本的特征维度（比如一张图片展平后的长度）
X.shape[0] 就是样本数 N，也就是有多少个样本（data points）。
X[i] 表示第 i 个样本的特征向量，shape 是 (D,)。
也就是说，X[i] 是第 i 行，包含了当前数据点的所有特征。

X = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

X.shape[0] 是 3，说明有 3 个样本。
X[0] 是 [1, 2, 3]，第一个样本。

X转置在前还是在后

在 X.T.dot(dscores) 里，X的转置在前，dscores 在后。
这也是标准的矩阵乘法规则，通常写作：(D, N) x (N, C) = (D, C)。

详细解释

假设：

X 的 shape 是 (N, D)，N 是样本数，D 是特征数。
dscores 的 shape 是 (N, C)，N 是样本数，C 是类别数。

如果你要计算 softmax 损失的权重梯度 dW，你需要对每一列（类别）累加每个样本的特征向量乘以对应的损失梯度，所以：

X.T 的 shape 是 (D, N)
dscores 的 shape 是 (N, C)

矩阵乘法 X.T.dot(dscores) 结果是 (D, C)，正好和权重参数 W 的 shape 一致。

唉，ipynb

每次你重新打开 Jupyter Notebook 或者重启内核（Restart Kernel），都需要从头开始依次运行前面的代码块，才能保证所有变量、函数、导入的库等都被正确加载。否则后面的代码很可能会因为缺少依赖或变量未定义而报错。

linear classfier

上面实现的softmax是linear classfier里面用到的组件之一，还需要继续实现liner classfier那个py

为什么loss是pass

Softmax 子类重写了 loss，实际调用的是子类的 loss。
如果你直接用 LinearClassifier().train(...) 会报错，因为 loss 只有 pass。
用 Softmax().train(...)，就会调用 softmax 的 loss 逻辑，程序就能正常跑起来！

“bias trick”（偏置技巧）

如果我们把 bias 单独作为 b 去训练，每次不仅要学 W，还要单独学 b。但你可以把 b 看作 W 的一行：

只要把输入 x 后面拼上一位常数1（如 ([x_1, x_2, …, x_D, 1])），
权重矩阵 W 也多加一列/bias那一列，
这样 Wx+b 就变成了 W′[x;1]。
训练时，W’ 这一列专门学原来的 bias。

这叫把 bias 融进了权重，只优化一个大权重矩阵就行，不用单独管 b。

尝试哪种超参数组合最好的那个代码块

# Provided as a reference. You may or may not want to change these hyperparameters
learning_rates = [1e-7, 1e-6,1e-5]
regularization_strengths = [2.5e4, 1e4,0.5e4]

如果是按照原来的参数设置，也就是数组里面各只有2个可能取值，后面的“Visualize the learned weights for each class.”看起来就像是一坨屎，需要多加一点可能的取值，才会看上去有点感觉，像是模糊的共有特征。

Two-Layer Neural Network

backward中，对于每一层，计算dx是为了反向传播，利用链式法则层层计算梯度；计算dw和db，是为了更新参数；只要这一层有可学习参数（如 w 和 b），每次反向传播都要计算 dw 和 db，并用于参数更新。

out = x_row.dot(w) + b dout = dx_row.dot(w) # .dot(w):乘以常数 dx_row = dout.dot(w.T)

out = x_row.dot(w) + b 参见dw和db的计算

loss, dscores = softmax_loss(scores, y)
loss += 0.5 * self.reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
 
dh1, dW2, db2 = affine_backward(dscores, fc_cache2)
dW2 += self.reg * W2
# 既然上面loss是单独加的，导数也可以单独加，d_loss / d_W2加上 self.reg * W2
 
da1 = relu_backward(dh1, relu_cache1)
 
dX, dW1, db1 = affine_backward(da1, fc_cache1)
dW1 += self.reg * W1

from cs231n.solver import Solver
 
solver = Solver(
    model, 
    data, 
    update_rule='sgd',
    optim_config={
        'learning_rate': 1e-4,
    },
    lr_decay=0.95,
    num_epochs=10,
    batch_size=100,
    print_every=100,
    verbose=True
)
solver.train()

这里学习率如果设置为1e-2，那么会数值溢出，精确度很低，说明太大了

等把所有超参数组合尝试完，估计要花很长时间，反正目前模型在测试集上最高精度已经有55%了，已经符合要求了，那就直接终止训练，差不多得了。

妈的，感觉我之前安装cuda适配版pytorch没什么用，一方面现在就没用到pytorch，另一方面数据也没写入gpu

2025-11-0816点29分，完成了1-3 2层神经网络

Higher Level Representations: Image Features

The notebook features.ipynb will examine the improvements gained by using higher-level representations as opposed to using raw pixel values.

X_train_feats = extract_features(X_train, feature_fns, verbose=True) 这样就会打印进度

调大lr，调小reg，一下天翻地覆！

给我气笑了

是的，你说得完全正确：
如果学习率太小、迭代次数又不够多，神经网络基本“学不动”，性能就会很差。这在CS231n之类的MLP网络上尤其明显。

# learning_rates = [5e-4, 1e-3, 2e-3, 5e-3, 1e-2]
learning_rates = [0.05, 0.1, 0.2, 0.5]

之前的准确率一直在0.3左右，比上面直接用softmax还低，于是加大lr，效果立竿见影

此外，关于regs

MLP正则化过强会抑制模型表达力，令模型退化为“复杂线性模型”。理想正则应“轻微约束但不打断网络自适应能力”，经验值就是1e-3量级。

浅层（简单）模型可以扛得住大正则，深层更敏感

线性模型参数少，表达力弱，本身不会过拟合太多，加大正则反而能提升泛化。
二（多）层MLP本身的自由度更多，所以比较容易受到大weight decay抑制，造成网络“学不活”或者激活全死亡。

2025-11-15 16点40分完成了1-4

Training a fully connected network

The notebook FullyConnectedNets.ipynb will walk you through implementing the fully connected network.

5层网络初始 weight_scale 比 2层网络更需要大一点，否则梯度级联变小，完全学不到。
建议你把 weight_scale 一步提高到 1e-2 开始试

2025-11-17 23点40分 # Update rules 还没做

CS231n Deep Learning for Computer Vision See the Momentum Update section

    mu = config["momentum"]
    lr = config["learning_rate"]

    v = mu * v - lr * dw      # 先更新速度
    # 先往之前的方向走一点，再加上当前梯度的影响
    next_w = w + v            # 再用速度更新参数

后面实现best model

(Epoch 6 / 20) train acc: 0.413000; val_acc: 0.422000
(Iteration 2301 / 7640) loss: 1.863238
(Iteration 2401 / 7640) loss: 1.961564

hidden_dims = [512, 256, 128]   # 可以试[256,128,64] 或更大
weight_scale = 2e-2             # 试试从 2e-2, 1e-2, 5e-3 中挑
learning_rate = 5e-4            # 1e-3, 5e-4, 3e-4 等
reg = 0.15                      # 正则化强度，可以试0.1-0.2
dropout = 0.4                   # 可调为0.4~0.6
use_batchnorm = True            # 有时关掉也能好，但通常打开更稳定

基本上不动了，那就不要接着训练了

hidden_dims=[512, 512, 512, 512]
reg=0
learning_rate=1e-3
weight_scale=3e-2

不加正则化试试

(Epoch 3 / 20) train acc: 0.560000; val_acc: 0.491000

不错，这个参数很好了

(Epoch 4 / 20) train acc: 0.556000; val_acc: 0.504000

好耶

2025-11-19 08点37分

Validation set accuracy:  0.52
Test set accuracy:  0.518

不错，这个参数很好，已经达到要求了。至此，assignment1已经完成了

Assignment 2

In this assignment you will practice writing backpropagation code, and training Neural Networks and Convolutional Neural Networks. The goals of this assignment are as follows:

Implement Batch Normalization and Layer Normalization for training deep networks.
Implement Dropout to regularize networks.
Understand the architecture of Convolutional Neural Networks and get practice with training them.
Gain experience with a major deep learning framework, PyTorch.
Understand and implement RNN networks. Combine them with CNN networks for image captioning.

有pytorch，不错，但是到Q4: PyTorch on CIFAR-10才开始上模型，那么前面的3个考虑速通

Q1: Batch Normalization

In notebook BatchNormalization.ipynb you will implement batch normalization, and use it to train deep fully connected networks.

不知道为什么，得从assignment1里面手动把数据文件夹复制过去，自动下载脚本有问题

by gpt4.1 总结 Batch Normalization（批归一化）和 Layer Normalization（层归一化）的区别：

归一化的维度不同

Batch Normalization（批归一化）
- 对“每一个特征（通常是每一个通道）”，在一个batch（即一批样本）上计算均值和方差。
- 例如，对于图像的每一个通道（R/G/B），对同一个batch里面所有图片的该通道求均值和方差，然后每张图的对应通道就用这个均值和方差进行标准化。
Layer Normalization（层归一化）
- 对“单一样本内部的所有特征”，在每个样本自身的所有特征维度上计算均值和方差。
- 例如，对于一张图片，会对其所有像素/通道的特征向量在本张图片内部做归一化，而不会参考其他图片的信息。

应用场景不同

BatchNorm
- 对batch size有要求，batch太小效果变差。
- 适用于卷积神经网络（CNN）、深度前馈网络等。
- 训练和推理时统计方式不同（推理阶段用移动平均）。
LayerNorm
- 和batch size无关，可以用于batch size为1，适合RNN、Transformer（注意力机制）等序列模型。
- 训练和推理阶段方式一致。

主要优缺点

BatchNorm
- 优点：加速训练收敛，提高网络泛化能力。
- 缺点：对batch size敏感，批很小效果大幅下降；不适合RNN等序列任务。
LayerNorm
- 优点：小batch也能用，对变长输入和序列建模更好。
- 缺点：某些场景收敛速度不如BatchNorm。

代码实现上的区别 BatchNorm：

mean = np.mean(x, axis=0)  # (N, D) -> (D,)
var = np.var(x, axis=0)    # (N, D) -> (D,)
x_hat = (x - mean) / np.sqrt(var + eps)

这里 axis=0，表示对每一列（特征）做归一化。

LayerNorm：

mean = np.mean(x, axis=1, keepdims=True)  # (N, D) -> (N, 1)
var = np.var(x, axis=1, keepdims=True)    # (N, D) -> (N, 1)
x_hat = (x - mean) / np.sqrt(var + eps)

这里 axis=1，表示对每一行（样本）做归一化。

总结

BatchNorm：对特征归一化，跨样本统计。
LayerNorm：对样本归一化，跨特征统计。

Q2: Dropout

The notebook Dropout.ipynb will help you implement dropout and explore its effects on model generalization.

2025-11-20 16点09分已经push。这个q比之前的都简单很多。

Q3: Convolutional Neural Networks

In the notebook ConvolutionalNetworks.ipynb you will implement several new layers that are commonly used in convolutional networks.

The fast convolution implementation depends on a Cython extension; to compile it, run the cell below. Next, save the Colab notebook (`File > Save`) and **restart the runtime** (`Runtime > Restart runtime`). You can then re-execute the preceeding cells from top to bottom and skip the cell below as you only need to run it once for the compilation step.

基于此，在cs231n环境里面，到setup.py的目录

python setup.py build_ext --inplace
mamba install Cython
python setup.py build_ext --inplace

欸，这个居然没有inline-question 2025-11-20 19点02分写完了至此，手写numpy部分已经完成，接下来是pytorch的时代！

Q4: PyTorch on CIFAR-10

For this part, you will be working with PyTorch, a popular and powerful deep learning framework.

Open up PyTorch.ipynb. There, you will learn how the framework works, culminating in training a convolutional network of your own design on CIFAR-10 to get the best performance you can.

太好了，这个使用gpu是用的pytorch的原生调用，太好了，哈哈

不对，有一些包的问题，先备份了当前环境，在D:\conda_envs\cs231n_env_backup.yml

妈的，看来要重新建立一个环境了

mamba create -p D:\conda_envs\pytorch-cs231n python=3.9
#activate
mamba install numpy scipy matplotlib jupyter notebook
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu126

然而还是遇到了

 DLL load failed while importing _imaging: 操作系统无法运行 %1。

的问题

参照ImportError: DLL load failed while importing _imaging: 操作系统无法运行 %1_importerror: dll load failed while importing pyexp-CSDN博客

pip uninstall Pillow
pip install Pillow

然后就可以了

using device: cuda

妈的，老环境占用了7.89 GB，找机会删了

jcjohnson/pytorch-examples: Simple examples to introduce PyTorch 可以看看这个readme，入门pytorch

为什么要用 `with torch.no_grad():`

在PyTorch中，张量的所有操作默认都会加入到计算图（computational graph）中，以便后续可以进行自动微分，也就是反向传播（backward）。

但是在更新参数（如：w1 -= learning_rate * w1.grad）时，我们只是想用当前梯度调整参数的数值，并不需要这些“参数更新操作”再被记录到计算图中。否则，下一次反向传播的时候会变得混乱，甚至内存泄漏。

with torch.no_grad():

这样语句块里的内容就不会被PyTorch追踪和记录到计算图，只是在纯粹地修改参数的数值，效率高且正确。

为什么要手动 `zero_()` 梯度（Manually zero the gradients）

PyTorch中的w1.grad和w2.grad保存的是上一轮反向传播时累积下来的梯度。如果不清空它们，下一轮的backward()计算会把梯度累加到原有的梯度上，这通常不是我们期望的。

手动清空梯度的原因就是：“每一轮反向传播后，我们想要用新的梯度重新来。” 所以：

w1.grad.zero_()
w2.grad.zero_()

这样每次反向传播以后都能有一个干净的梯度空间，不会意外地累积多余的梯度。

with torch.no_grad(): 防止参数更新操作被添加到计算图里，只做纯粹的数值更新。
w1.grad.zero_() 防止梯度被累加，每轮反向传播后清零，保证梯度正确。

关于 PyTorch: Custom nn Modules

自定义Module只管how to“前向”（输入如何→输出）。
loss和backward交给PyTorch自动做，手动调方法调用就行。

2025-11-20 21点22分

ChatGPT training framework: TensorFlow and JAX · GitHub Copilot
jcjohnson/pytorch-examples: Simple examples to introduce PyTorch 在哈基米的帮助下，差不多过了一遍pytorch，认为接下来这个a2q4需要主要靠自己手写一遍，熟悉一下

(98 封私信 / 82 条消息) 保姆级 PyTorch 数据处理教程(1)：DataLoader - 知乎可以看看这个

Q5: Image Captioning with Vanilla RNNs

The notebook RNN_Captioning_pytorch.ipynb will walk you through the implementation of vanilla recurrent neural networks and apply them to image captioning on COCO.

当你在运行 Python 程序时遇到 ModuleNotFoundError: No module named 'past' 错误，这通常是因为缺少 past 模块。实际上，这个模块是通过安装 future 包来解决的。

ModuleNotFoundError: No module named ‘h5py’

ModuleNotFoundError: No module named ‘imageio’

然后才可以

D:\cs231n\assignment2\cs231n\datasets/coco_captioning 唉，这个路径有问题，需要手动下载数据集

%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/datasets/
!bash get_coco_dataset.sh
%cd /content/drive/My\ Drive/$FOLDERNAME

这个需要手动执行，唉，还需要在wsl里面执行 harold@LAPTOP-SNLLGVHI:/mnt/d/cs231n/assignment2/cs231n/datasets$ ./get_coco_dataset.sh 把文件推送到github仓库时，应该排除这个数据集文件夹唉，需要全局代理

关于.gitignore

Windows 下你用了反斜杠 \，但 .gitignore 必须用正斜杠 /。
要忽略整个文件夹，应该加 / 结尾。唉

唉，我不理解这个压缩包为什么要包两层

2025-11-22 18点27分现在数据才弄完，做完了前期工作

import torch
a = torch.randn(3, 4)
b = torch.randn(4, 5)
c = a @ b  # 等价于 torch.matmul(a, b)

见鬼了，怎么还有这种语法

这种问题99%是因为 loss_history 保存了 tensor 导致内存泄漏。 然后就kernel die了

[I 2025-11-22 21:16:50.344 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
[W 2025-11-22 21:16:50.344 ServerApp] kernel 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e restarted
[I 2025-11-22 21:16:50.348 ServerApp] Starting buffering for 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e:2bf0016b-e819-45b9-a7fc-9ac7e9a78d49
[I 2025-11-22 21:16:50.365 ServerApp] Connecting to kernel 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e.
[I 2025-11-22 21:16:50.365 ServerApp] Restoring connection for 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e:2bf0016b-e819-45b9-a7fc-9ac7e9a78d49
[I 2025-11-22 21:17:29.725 ServerApp] Saving file at /cs231n/assignment2/RNN_Captioning_pytorch.ipynb
OMP: Error #15: Initializing libomp.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/
[I 2025-11-22 21:19:02.356 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
[W 2025-11-22 21:19:02.359 ServerApp] kernel 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e restarted
[I 2025-11-22 21:19:02.365 ServerApp] Starting buffering for 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e:2bf0016b-e819-45b9-a7fc-9ac7e9a78d49
[I 2025-11-22 21:19:02.379 ServerApp] Connecting to kernel 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e.
[I 2025-11-22 21:19:02.379 ServerApp] Restoring connection for 547c3492-21c2-4ae5-a8bf-cf7f1fe0765e:2bf0016b-e819-45b9-a7fc-9ac7e9a78d49
[I 2025-11-22 21:19:30.695 ServerApp] Saving file at /cs231n/assignment2/RNN_Captioning_pytorch.ipynb

唉，还是那个经典问题，我记得我在2508实习（存疑） - d2l里面遇到过这是OpenMP 多线程库冲突，常见于 Windows 下你同时装了 PyTorch、NumPy、scikit-learn、MKL、OpenCV 等包时。
它和你的 loss 记录方式无关，是底层动态库冲突导致的 kernel 崩溃。

临时解决办法:

import os
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

推荐方案：只保留一个 OpenMP 动态库

卸载多余的 numpy、opencv、scikit-learn 等包，确保只用 conda/pip 安装的官方包。
不要混用 pip 和 conda 安装同一个包。
优先用 conda 安装 PyTorch、numpy、scipy 等科学计算包。

然后就可以了

21点31分 2025-11-22 做完了

Assignment 3

You will use PyTorch for the majority of this homework.

很好

Q1: Image Captioning with Transformers

The notebook Transformer_Captioning.ipynb will walk you through the implementation of a Transformer model and apply it to image captioning on COCO.

Q2: Self-Supervised Learning for Image Classification

In the notebook Self_Supervised_Learning.ipynb, you will learn how to leverage self-supervised pretraining to obtain better performance on image classification tasks. When first opening the notebook, go to Runtime > Change runtime type and set Hardware accelerator to GPU.

Q3: Denoising Diffusion Probabilistic Models

In the notebook DDPM.ipynb, you will implement a Denoising Diffusion Probabilistic Model (DDPM) and apply it to image generation.

Q4: CLIP and Dino

In the notebook CLIP_DINO.ipynb, you will implement CLIP and DINO, two self-supervised learning methods that leverage large amounts of unlabeled data to learn visual representations.

1ji0o's Blog

目录

CS231n作业

环境配置

Assignment1

KNN

Softmax

关于mask

关于softmax的梯度怎么算

softmax梯度为什么要分情况讨论

1. softmax的数学定义

2. 求梯度（雅可比矩阵）

3. 为什么要分情况

4. 总结

softmax计算减去最大值

为什么减去最大值不会影响结果？

为什么选最大值？

避免溢出

避免下溢

X.shape[0]和X[i]的意义是什么

X转置在前还是在后

详细解释

唉，ipynb

linear classfier

为什么loss是pass

“bias trick”（偏置技巧）

尝试哪种超参数组合最好的那个代码块

Two-Layer Neural Network

Higher Level Representations: Image Features

Training a fully connected network

Assignment 2

Q1: Batch Normalization

Q2: Dropout

Q3: Convolutional Neural Networks

Q4: PyTorch on CIFAR-10

为什么要用 with torch.no_grad():

为什么要手动 zero_() 梯度（Manually zero the gradients）

关于 PyTorch: Custom nn Modules

Q5: Image Captioning with Vanilla RNNs

Assignment 3

Q1: Image Captioning with Transformers

Q2: Self-Supervised Learning for Image Classification

Q3: Denoising Diffusion Probabilistic Models

Q4: CLIP and Dino

关系图谱

目录

反向链接

为什么要用 `with torch.no_grad():`

为什么要手动 `zero_()` 梯度（Manually zero the gradients）