题目一

实验目的

用变分自编码器生成MNIST手写数字，实现以下要求：

推荐使用高斯分布随机初始化模型参数，可以避免一部分模式坍塌问题。

Chromatic Vizier...About 12 min

实验室项目记录——为RAG构造数据集

"在框架如此完备的情况下，使用模型是俗手；设计模型是本手；而构造训练数据集，my friend，这才是妙手。"

实验室项目记录——为RAG构造数据集

前置准备

背景和需求

最后要做成一个知识图谱类的RAG模型，做知识库训练的数据主要是论文。

Chromatic Vizier...About 18 min

RL: Proximal policy optimization (PPO)

Review

On-policy v.s. Off-policy

If the actor to train and the actor for interacting is the same, we call it on-policy. In other words, if the actor himself does training to gain experience, it is on policy; if the actor gains experience by watching other actors train, it is off-policy.