View on GitHub

maximal

A TensorFlow-compatible Python library that provides models and layers to implement custom Transformer neural networks. Built on TensorFlow 2.

A Transformer Neural Network for Sentiment Analysis.

Author: Ivan Bongiorni - 2022-09-25.

Open this tutorial on Google Colaboratory.

The structure of this tutorial is loosely based on this official Keras Notebook.

Let’s see how to build and train a Keras model containing a TransformerLayer from maximal, using the OriginalTransformerSchedule.

First, I will import the main libraries I need:

import warnings

import numpy as np
warnings.filterwarnings("ignore", category=np.VisibleDeprecationWarning)

We need TensorFlow for the model structure:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, GlobalAveragePooling1D, Dropout, Dense

And then we can import maximal layers.

The central class in this tutorial is the Transformer layer. In order to include a TransformerLayer in a Keras model we need to import a PositionalEmbedding layer too. This layer will produce embeddings of words and their relative positions to inform our Attention mechanism.

Additionally, the learning rate schedule of the original Transformer paper is added for demonstration purposes.

import maximal
from maximal.layers import PositionalEmbedding, TransformerLayer

Load IMDB dataset for Sentiment Analysis

vocab_size = 20000
maxlen = 200  # input length

(x_train, y_train), (x_val, y_val) = tf.keras.datasets.imdb.load_data(num_words=vocab_size)

print(len(x_train), "Training sequences")
print(len(x_val), "Validation sequences")

x_train = tf.keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = tf.keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)

Build the model

Let’s first specify some hyperparams about Transformer’s hidden size (depth), the number of Attention Heads, and the size of the internal Pointwise Feed-Forward Net:

model_depth = 32
num_heads = 4
ff_dim = 32

And then we can specify the model:

model = Sequential([
    Input(shape=(maxlen,)),
    PositionalEmbedding(maxlen, vocab_size, model_depth),

    TransformerLayer(model_depth, num_heads, ff_dim),

    GlobalAveragePooling1D(),
    Dropout(0.1),
    Dense(20, activation="relu"),
    Dropout(0.1),
    Dense(2, activation="softmax")
])

We are now ready to compile our model:

model.compile(
    optimizer = tf.keras.optimizers.Adam(),
    loss = tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=["accuracy"]
)

Training

Now the model is ready for training:

history = model.fit(
    x_train, y_train, batch_size=32, epochs=4, validation_data=(x_val, y_val)
)