View on GitHub

maximal

A TensorFlow-compatible Python library that provides models and layers to implement custom Transformer neural networks. Built on TensorFlow 2.

SelfAttention()

Implements Scaled Dot-Product Attention as in the original Transformer paper, where Q, K, V are the same tensor.

Inherits from tensorflow.keras.layers.Layer.

Arguments

__init__ arguments:

call arguments:

Returns