# 第三次作业

在本次作业中，大家首先通过代码来模拟课件里描述的GCN例子的计算过程。
然后根据计算过程，大家自己来实现一个朴素版本的GCN模型并在cora数据集上进行测试。
最后，大家还将运行CogDL版本GCN模型的实验作为对比。

本作业需要安装[CogDL](https://github.com/THUDM/cogdl)：pip install cogdl

如需使用gpu版，请先安装gpu版本的[PyTorch](https://pytorch.org/get-started/locally/)，再安装cogdl。

本作业由智谱GNN中心及课程团队筹备，由CogDL团队提供技术支持。


In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F

**第一部分：手动模拟GCN的计算和训练过程。**

---



1. 根据初始的邻接矩阵A得到正则化后的邻接矩阵normA。


In [2]:
A = torch.tensor([[0, 1, 1, 1], [1, 0, 1, 0], [1, 1, 0, 1], [1, 0, 1, 0]])
A = A + torch.eye(4)
print("A=", A)
###################
##### 作业填空 #####
###################
# 计算度数矩阵D，并对A进行正则化得到normA
D = 
normA = 
print("normA=", normA)

A= tensor([[1., 1., 1., 1.],
        [1., 1., 1., 0.],
        [1., 1., 1., 1.],
        [1., 0., 1., 1.]])
normA= tensor([[0.2500, 0.2887, 0.2500, 0.2887],
        [0.2887, 0.3333, 0.2887, 0.0000],
        [0.2500, 0.2887, 0.2500, 0.2887],
        [0.2887, 0.0000, 0.2887, 0.3333]])


2. 根据初始特征X，模型参数W1，邻接矩阵normA来计算第一层的输出H1。

In [3]:
H0 = X = torch.FloatTensor([[1,0], [0,1], [1,0], [1,1]])
W1 = torch.tensor([[1, -0.5], [0.5, 1]], requires_grad=True)
###################
##### 作业填空 #####
###################
# 通过normA/H0/W1计算得到H1
H1 = 
print("H1=", H1)

H1= tensor([[1.0774, 0.1830],
        [0.7440, 0.0447],
        [1.0774, 0.1830],
        [1.0774, 0.0000]], grad_fn=<ReluBackward0>)


3. 计算第二层的输出H2和最后的输出Z。

In [4]:
W2 = torch.tensor([[0.5, -0.5], [1, 0.5]], requires_grad=True)
###################
##### 作业填空 #####
###################
# 通过normA/H1/W2计算得到H2和Z
H2 = 
print("H2=", H2)
Z = 
print("Z=", Z)

H2= tensor([[ 0.6366, -0.4800],
        [ 0.5556, -0.3747],
        [ 0.6366, -0.4800],
        [ 0.5962, -0.4377]], grad_fn=<MmBackward0>)
Z= tensor([[0.7534, 0.2466],
        [0.7171, 0.2829],
        [0.7534, 0.2466],
        [0.7377, 0.2623]], grad_fn=<SoftmaxBackward0>)


4. 计算损失函数loss。

In [5]:
Y = torch.LongTensor([0, 1, 0, 0])
###################
##### 作业填空 #####
###################
# 根据输出Z和标签Y来计算最后的loss
loss =
print(loss.item())

0.5333563685417175


5. 通过loss进行反向传播。可以看到模型参数W1/W2的梯度值。

In [6]:
loss.backward(retain_graph=True)
print(W1)
print(W1.grad)
print(W2)
print(W2.grad)

tensor([[ 1.0000, -0.5000],
        [ 0.5000,  1.0000]], requires_grad=True)
tensor([[-0.0352,  0.0085],
        [-0.0088,  0.0052]])
tensor([[ 0.5000, -0.5000],
        [ 1.0000,  0.5000]], requires_grad=True)
tensor([[-0.0396,  0.0396],
        [ 0.0018, -0.0018]])


**第二部分：使用你实现的GCN模型来运行cora数据集**

---



1. 从cogdl中加载cora数据集（x表示特征，y表示标签，mask表示训练/验证/测试集的划分）

In [7]:
from cogdl.datasets import build_dataset_from_name

dataset = build_dataset_from_name("cora")
data = dataset[0]
print(data)
n = data.x.shape[0]
edge_index = torch.stack(data.edge_index)
A = torch.sparse_coo_tensor(edge_index, torch.ones(edge_index.shape[1]), (n, n)).to_dense()

Graph(x=[2708, 1433], y=[2708], train_mask=[2708], val_mask=[2708], test_mask=[2708], edge_index=[2, 10556])


2. 使用你实现的GCN模型进行训练（在GCN模型的forward中填入你在第一部分中写的代码）

In [8]:
import math
import copy
from tqdm import tqdm

def accuracy(y_pred, y_true):
    y_true = y_true.squeeze().long()
    preds = y_pred.max(1)[1].type_as(y_true)
    correct = preds.eq(y_true).double()
    correct = correct.sum().item()
    return correct / len(y_true)

class GCN(nn.Module):

    def __init__(
        self,
        in_feats,
        hidden_size,
        out_feats,
    ):
        super(GCN, self).__init__()
        self.out_feats = out_feats
        self.W1 = nn.Parameter(torch.FloatTensor(in_feats, hidden_size))
        self.W2 = nn.Parameter(torch.FloatTensor(hidden_size, out_feats))
        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1.0 / math.sqrt(self.out_feats)
        torch.nn.init.uniform_(self.W1, -stdv, stdv)
        torch.nn.init.uniform_(self.W2, -stdv, stdv)

    def forward(self, A, X):
        n = X.shape[0]
        A = A + torch.eye(n, device=X.device)
        ###################
        ##### 作业填空 #####
        ###################
        # 依次计算normA/H1/H2，然后返回H2。注意：此处不需要计算Z，因为通常直接根据H2和Y来计算loss。
        # 注意使用self.W1/W2来调用模型参数。
        

        return H2


hidden_size = 64
model = GCN(data.x.shape[1], hidden_size, data.y.max() + 1)

if torch.cuda.is_available():
    device = torch.device("cuda")
    model = model.to(device)
    A = A.to(device)
    data.apply(lambda x: x.to(device))

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
epoch_iter = tqdm(range(100), position=0, leave=True)
best_model = None
best_loss = 1e8
for epoch in epoch_iter:
    model.train()
    optimizer.zero_grad()
    logits = model(A, data.x)
    loss = F.cross_entropy(logits[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    train_loss = loss.item()

    model.eval()
    with torch.no_grad():
        logits = model(A, data.x)
        val_loss = F.cross_entropy(logits[data.val_mask], data.y[data.val_mask]).item()
        val_acc = accuracy(logits[data.val_mask], data.y[data.val_mask])
        if val_loss < best_loss:
            best_loss = val_loss
            best_model = copy.deepcopy(model)

    epoch_iter.set_description(f"Epoch: {epoch:03d}, Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")

with torch.no_grad():
    logits = best_model(A, data.x)
    val_acc = accuracy(logits[data.val_mask], data.y[data.val_mask])
    test_acc = accuracy(logits[data.test_mask], data.y[data.test_mask])
print("Val Acc", val_acc)
print("Test Acc", test_acc)

Epoch: 099, Train Loss: 0.0112, Val Loss: 0.7280, Val Acc: 0.7720: 100%|██████████| 100/100 [02:32<00:00,  1.52s/it]


Val Acc 0.774
Test Acc 0.802


3. 调用cogdl的GCN模型来运行cora数据集，观察两者的区别并思考（主要是训练时间）

In [9]:
from cogdl.utils import set_random_seed
from cogdl.models.nn import GCN
from cogdl.trainer import Trainer
from cogdl.wrappers import fetch_model_wrapper, fetch_data_wrapper

set_random_seed(100)
model = GCN(
    in_feats=data.num_features,
    hidden_size=64,
    out_feats=data.num_classes,
    num_layers=2,
    dropout=0.3,
    activation="relu"
)

if torch.cuda.is_available():
    device = "cuda"
    device_ids = [0]
else:
    device = "cpu"
    device_ids = None
mw_class = fetch_model_wrapper("node_classification_mw")
dw_class = fetch_data_wrapper("node_classification_dw")
optimizer_cfg = dict(lr=0.01, weight_decay=0)
model_wrapper = mw_class(model, optimizer_cfg)
dataset_wrapper = dw_class(dataset)
trainer = Trainer(epochs=100,
                  early_stopping=False,
                  cpu=device=="cpu",
                  device_ids=device_ids)
ret = trainer.run(model_wrapper, dataset_wrapper)


  0%|          | 0/100 [00:00<?, ?it/s]

Model Parameters: 92231


Epoch: 100, train_loss:  0.0150, val_acc:  0.7800: 100%|██████████| 100/100 [00:04<00:00, 22.95it/s]


Saving 34-th model to ./checkpoints/model.pt ...
Loading model from ./checkpoints/model.pt ...
{'test_acc': 0.814, 'val_acc': 0.794}
