Two-tower Sparse Neural Network

Two-tower Sparse neural network (or TTSN for short) is a NN architecture that consists of two separate branches or “towers” of layers. Each tower processes a different type of input, and the outputs from each tower are then combined in some way to produce the final output.

The term “sparse” is often used in ML to refer to models that have a large number of parameters but only a relatively small number of active (non-zero) parameters at any given time. See this for more info.

Applications

TTSN has a wide range of applications in various fields. Here are some examples:

Recommendation systems: Two-tower neural networks can be used to build recommendation systems that suggest products, services, or content to users based on their past behavior or preferences. In this case, one tower can represent the user, while the other tower represents the item being recommended.
Natural language processing: Two-tower neural networks can also be used for various NLP tasks, such as sentiment analysis, language translation, and named entity recognition. For example, one tower can represent the input text, while the other tower represents the context or task.
Computer vision: Two-tower neural networks are often used in computer vision tasks, such as image classification, object detection, and segmentation. In this case, one tower can represent the image features, while the other tower represents the object class or category.
Fraud detection: Two-tower neural networks can also be used to detect fraudulent activities, such as credit card fraud, insurance fraud, or cyber attacks. One tower can represent the user behavior, while the other tower represents the historical patterns of fraudulent behavior.

Implementation

Each tower consists of a sequence of layers, and we can represent the output of the k-th layer in the first tower as h_k(x) and the output of the k-th layer in the second tower as g_k(y).

The outputs of the two towers are then combined in some way to produce the final output. One common approach is to concatenate the outputs of the last layer in each tower and feed them into a final layer. We can represent this final layer as f([h_L(x), g_L(y)]), where L is the number of layers in each tower and [h_L(x), g_L(y)] denotes the concatenation of the outputs of the last layer in each tower.

To enforce sparsity in the model, we can use regularization techniques such as L1 regularization or dropout, which encourage some of the weights in the network to be zero. This can help reduce the number of active parameters in the model and improve its generalization performance.

The specific mathematical details of a TTSN will depend on the specific architecture and regularization techniques used, but the above should give you a general idea of how the model works.