Graph Convolutional Networks for NLP Using Comet

Published in

Heartbeat

8 min readJun 6, 2023

Graph Convolutional Networks (GCNs) are a type of neural network that operates on graphs, which are mathematical structures consisting of nodes and edges. GCNs have been successfully applied to many domains, including computer vision and social network analysis.

In recent years, researchers have also explored using GCNs for natural language processing (NLP) tasks, such as text classification, sentiment analysis, and entity recognition. GCNs use a combination of graph-based representations and convolutional neural networks to analyze large amounts of textual data.

This article provides a brief overview of GCNs for NLP tasks and how to implement them using PyTorch and Comet. By the end, you will fully understand the GCN architecture and the steps for implementing NLP projects with Comet’s experiment tracking tool.

Background

Before diving into GCNs for NLP, let’s first discuss the basics of GCNs. In a traditional neural network, the input is a vector or a tensor, and the output is also a vector or a tensor. However, in a GCN, the input and output are graphs. The graph is represented as the Adjacency Matrix, where each entry represents the presence or absence of an edge between two nodes. In addition, each node is associated with a feature vector that represents the node’s attributes.

A GCN consists of multiple layers, each of which applies a graph convolution operation to the input graph. The graph convolution operation is similar to the convolution operation in a traditional convolutional neural network, but it operates on the graph structure instead of the spatial structure. In each layer, the node features are updated based on the features of their neighboring nodes. The output of the final layer is a graph with updated node features, which can be used for downstream tasks such as classification or regression.

Once the GCN is trained, it is easier to process new graphs and make predictions about them. Now that we have a basic understanding of GCNs, let’s consider GCNs for Natural Language Processing (NLP) tasks.

Using GCNs for Natural Language Processing (NLP)

GCNs work for various NLP tasks, such as text classification, sentiment analysis, and entity recognition. The main idea is to represent the text data as a graph and then use a GCN to process it.

In this representation, each word is a node in the graph, and each edge represents a syntactic or semantic relationship between the words. The edge weights can be determined based on the distance between the words in the sentence or the similarity between their meanings.

To illustrate this better, we will implement a text classification task using PyTorch and Comet. We will use the Cora dataset, which consists of academic papers and their classification labels. Each paper is represented as a bag of words, where each word is a feature, and its value is the frequency of occurrence in the paper. We will construct a graph based on the citation links between the papers and use GCNs to classify the papers.

Prerequisites

To follow along with this tutorial, you will need the following:

Basic knowledge of Python and deep learning.
Some familiarity with PyTorch and Comet, as these are the tools we will use to implement the GCN. Kindly consult the Comet documentation to better understand how to integrate Comet with PyTorch.
Some familiarity with NLP concepts such as tokenization, embeddings, and text classification is recommended.
Text editors like Jupyter Notebook or Visual Studio Code.

Now that we have the prerequisites, let’s dive into implementing GCNs for NLP.

Real-time model analysis allows your team to track, monitor, and adjust models already in production. Learn more lessons from the field with Comet experts.

Implementation

Install and import libraries

Let’s first set up Comet for experiment tracking. Comet is a platform for experiment tracking and reproducibility in machine learning. It allows users to easily track and compare different experiments, visualize results, and collaborate with others.

First, we need to create an account and install the Comet package:

!pip install comet_ml

Then we will initiate a Comet experiment:

from comet_ml import Experiment

experiment = Experiment(api_key="your_api_key", project_name="gcns-for-nlp")

Replace your_api_key with your Comet API key, which can be found on the settings page of your Comet account. Replace gcns-for-nlp with the name of your project.

Load and Preprocess Cora Dataset

Next, we would load the dataset for our text classification project. We will load the Cora dataset and preprocess it by constructing a graph based on the citation links between the papers. Download the Cora dataset here.

After downloading and extracting the dataset, we can load it using the following code:

import os
import numpy as np
import networkx as nx
import torch
from torch.utils.data import Dataset, DataLoader

class CoraDataset(Dataset):
    def __init__(self, root):
        self.root = root
        self.words = []
        self.labels = []
        self.edges = []
        self.word2id = {}
        self.label2id = {}
        self.num_words = 0
        self.num_labels = 0

        self._load_data()

    def _load_data(self):
        # Load the data files
        with open(os.path.join(self.root, "cora.content"), "r") as f:
            for line in f:
                parts = line.strip().split("\t")
                words = parts[1:-1]
                label = parts[-1]

                # Add the words to the vocabulary
                for word in words:
                    if word not in self.word2id:
                        self.word2id[word] = self.num_words
                        self.num_words += 1
                    self.words.append(self.word2id[word])

                # Add the label to the vocabulary
                if label not in self.label2id:
                    self.label2id[label] = self.num_labels
                    self.num_labels += 1
                self.labels.append(self.label2id[label])

        with open(os.path.join(self.root, "cora.cites"), "r") as f:
            for line in f:
                parts = line.strip().split("\t")
                source = int(parts[0])
                target = int(parts[1])
                self.edges.append((source, target))

    def __len__(self):
        return len(self.words)

    def __getitem__(self, idx):
        return self.words[idx], self.labels[idx]

    def get_graph(self):
        graph = nx.Graph()
        graph.add_nodes_from(range(self.num_words))
        graph.add_edges_from(self.edges)
        return graph

    def get_adjacency_matrix(self):
        graph = self.get_graph()
        adjacency_matrix = nx.adjacency_matrix(graph)
        return torch.Tensor(adjacency_matrix.todense())

# Load the data
dataset = CoraDataset("cora")

This code defines a PyTorch dataset class for the Cora dataset. The dataset consists of two files:

cora.content contains the bag-of-words representations and labels for each paper.
cora.cites contains the citation links between the papers. We load these files and store the data in a format that can be easily used by PyTorch.

Constructing the Graph

Constructing the graph typically comes before building the GCN model. To apply GCNs to an NLP task, we first need to represent the text data as a graph. This involves constructing nodes and edges that capture the relationships between the words or documents in the dataset. Once we have constructed the graph, we can then use it as input to a GCN model for classification, regression, or any other downstream NLP task.

To construct the graph, we will use the NetworkX library, a Python language package that provides a convenient way to create and manipulate graphs. We will first create an empty graph and then add nodes for each paper:

import networkx as nx

# Create an empty graph
graph = nx.Graph()

# Add nodes for each paper
for i in range(len(paper_ids)):
    graph.add_node(paper_ids[i])

Next, we will add edges between papers based on their citation links:

# Add edges based on citation links
for i in range(len(paper_ids)):
    for j in range(len(paper_ids)):
        if i != j and citations[i][j] == 1:
            graph.add_edge(paper_ids[i], paper_ids[j])

Here, citations is a matrix where each entry represents the presence or absence of a citation link between papers. We iterate over all pairs of papers and add an edge between them if there is a citation link.

Finally, we will add node attributes to represent the bag-of-words features of each paper:

# Add node attributes for bag-of-words features
for i in range(len(paper_ids)):
    graph.nodes[paper_ids[i]]["features"] = features[i]

Here, features is a matrix where each row represents the bag-of-words features of a paper. We iterate over all papers and add a features attribute to their corresponding nodes.

With these steps, we have constructed the graph representation of the Cora dataset. We can now use this graph as input to a GCN model for text classification.

Build GCN Model

Now that we have our data, let’s build the GCN model. We will define a simple GCN model that consists of two GCN layers followed by a fully connected layer.

class GCN(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(GCN, self).__init__()
        
        self.gc1 = GraphConvolution(input_dim, hidden_dim)
        self.gc2 = GraphConvolution(hidden_dim, output_dim)
        self.fc = nn.Linear(output_dim, output_dim)
        
    def forward(self, adj_matrix, features):
        x = self.gc1(adj_matrix, features)
        x = F.relu(x)
        x = self.gc2(adj_matrix, x)
        x = F.relu(x)
        x = self.fc(x)
        return x

We also need to define the GraphConvolution layer, which applies the graph convolution operation to the input features.

class GraphConvolution(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(GraphConvolution, self).__init__()
        
        self.linear = nn.Linear(input_dim, output_dim)
        
    def forward(self, adj_matrix, features):
        # Normalize the adjacency matrix
        deg = torch.diag(torch.sum(adj_matrix, dim=1))
        adj_matrix = torch.inverse(torch.sqrt(deg)) @ adj_matrix @ torch.inverse(torch.sqrt(deg))
        
        # Apply the graph convolution operation
        x = adj_matrix @ features @ self.linear.weight
        x = x + self.linear.bias
        return x

Train the model using Comet

After defining the model, we can train it with the following code:

import torch.optim as optim
from torch.optim.lr_scheduler import StepLR

optimizer = optim.Adam(model.parameters(), lr=0.01)
scheduler = StepLR(optimizer, step_size=20, gamma=0.5)

for epoch in range(1, 101):
    train_loss = train(model, device, train_loader, optimizer, epoch)
    test_loss, test_acc = test(model, device, test_loader)
    scheduler.step()
    experiment.log_metric("train_loss", train_loss, step=epoch)
    experiment.log_metric("test_loss", test_loss, step=epoch)
    experiment.log_metric("test_acc", test_acc, step=epoch)

We use the Adam optimizer with a learning rate of 0.01 and a step size of 20 epochs. The learning rate is reduced by a factor of 0.5 every 20 epochs using a StepLR scheduler. We train the model for 100 epochs, logging the training and testing loss and accuracy metrics to Comet after each epoch.

Evaluation

After training the model, we can evaluate its performance on the test set using the following code:

from sklearn.metrics import f1_score

model.eval()
y_true = []
y_pred = []
with torch.no_grad():
    for data in test_loader:
        data = data.to(device)
        output = model(data)
        _, predicted = torch.max(output.data, 1)
        y_true.extend(data.y.cpu().numpy())
        y_pred.extend(predicted.cpu().numpy())

f1 = f1_score(y_true, y_pred, average="micro")
experiment.log_metric("test_f1_micro", f1)

We set the model to evaluation mode and disable gradient computations using torch.no_grad(). We then iterate over the test data, computing the model’s predictions and accumulating the true and predicted labels. Finally, we compute the F1 score on the test set and log it to Comet.

Conclusion

In this article, we have introduced Graph Convolutional Networks (GCNs) for natural language processing (NLP) tasks and demonstrated their implementation using PyTorch and Comet. We have shown how to use GCNs to classify academic papers based on their citation links, achieving competitive performance on the Cora dataset.

GCNs offer a powerful framework for NLP tasks that can leverage the graph structure of text data to capture rich relationships between words and sentences. By combining GCNs with Comet , we can track and optimize our models’ performance, making it easier to experiment with different architectures and hyperparameters.

References

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.