Why I Chose Graph Neural Networks Over Transformers for Financial Context

Everyone’s building with transformers. I get it - the attention mechanism is elegant, scaling laws are generous, and the tooling is mature. But when I set out to build a context management system for financial advisory workflows, I went with graph neural networks instead. Not because GNNs are fashionable (they’re really not), but because the data structure of financial advisory relationships is fundamentally a graph. Forcing it into a sequence destroys information. Simple as that.

Here’s the technical case, the architecture, and what I found.

The Problem: Context in Financial Advisory

When a wealth manager sits with a client, the conversation pulls from a dense web of stuff: the client’s portfolio, their family structure, tax situation, upcoming life events, risk tolerance, past decisions and how those played out, regulatory constraints, market conditions. A good advisor holds all of this in their head and draws on whatever’s relevant in the moment.

Building an AI system that does this means maintaining context across multiple dimensions at once. So the question becomes: what’s the right computational structure to represent all of that?

Why Transformers Struggle Here

Transformers represent context as a sequence of tokens with pairwise attention. This works beautifully for language, code, and a lot of reasoning tasks. But financial advisory context has properties that make sequential representation lossy.

Long-horizon client history. A client relationship might span 10+ years. The relevant context includes decisions made in 2015, portfolio changes in 2018, a conversation about estate planning in 2021, and a market event last week. In a transformer, this becomes a very long sequence with critical information scattered across distant positions. Yes, context windows are growing (GPT-4 Turbo at 128K tokens, Claude at 200K), but the issue isn’t window size - it’s attention dilution. The model has to figure out that a tax-loss harvesting decision from three years ago is highly relevant to today’s conversation about capital gains, despite thousands of tokens of intervening stuff between them.

Multi-entity relationships. Financial advisory rarely involves a single individual. You’ve got joint accounts, family trusts, dependent portfolios, corporate-personal linkages - a whole web of entities where actions on one node affect constraints and opportunities at other nodes. A husband’s retirement triggers rebalancing in the wife’s portfolio. A parent’s estate plan affects the children’s tax exposure. In a transformer, these relationships exist only implicitly in the text. In a graph, they’re first-class structural elements.

Product interdependencies. Financial products interact in ways that aren’t sequential at all. Insurance coverage affects optimal equity allocation (higher coverage enables higher risk tolerance). Tax-saving instrument lock-ins affect liquidity planning. Loan EMIs constrain SIP capacity. These are structural dependencies. Change one node in the product graph and the constraints propagate through edges to other nodes.

The Knowledge Graph Structure

Here’s how I represent a client’s financial context as a graph:

graph TD
    C[Client Node<br/>Demographics, Risk Profile] --> P1[MF Portfolio<br/>Rs 45L, 60% Equity]
    C --> P2[Insurance<br/>Term: 1Cr, Health: 10L]
    C --> P3[Real Estate<br/>Flat: 1.2Cr, Loan: 40L]
    C --> P4[FDs & Debt<br/>Rs 15L across 3 banks]

    C --> L1[Life Event<br/>Child entering college 2025]
    C --> L2[Life Event<br/>Retirement target 2035]

    C --> G1[Goal<br/>Education Fund: 25L by 2025]
    C --> G2[Goal<br/>Retirement Corpus: 5Cr by 2035]

    C --> F1[Family: Spouse<br/>Separate portfolio: Rs 30L]
    C --> F2[Family: Parent<br/>Dependent, medical costs]

    M[Market Context<br/>Nifty, Rates, Inflation] --> P1
    M --> P4

    P1 -.->|funds goal| G1
    P1 -.->|funds goal| G2
    P2 -.->|enables risk in| P1
    P3 -.->|EMI constrains| P1
    L1 -.->|triggers| G1
    L2 -.->|triggers| G2
    F2 -.->|requires liquidity from| P4
    F1 -.->|joint tax planning| C

    style C fill:#e94560,stroke:#1a1a2e,color:#fff
    style M fill:#0f3460,stroke:#1a1a2e,color:#fff
    style P1 fill:#1a1a2e,stroke:#e94560,color:#fff
    style P2 fill:#1a1a2e,stroke:#e94560,color:#fff
    style P3 fill:#1a1a2e,stroke:#e94560,color:#fff
    style P4 fill:#1a1a2e,stroke:#e94560,color:#fff
    style L1 fill:#16213e,stroke:#0f3460,color:#fff
    style L2 fill:#16213e,stroke:#0f3460,color:#fff
    style G1 fill:#16213e,stroke:#0f3460,color:#fff
    style G2 fill:#16213e,stroke:#0f3460,color:#fff
    style F1 fill:#16213e,stroke:#e94560,color:#fff
    style F2 fill:#16213e,stroke:#e94560,color:#fff

Each node has a feature vector encoding its current state. Each edge has a type (owns, constrains, funds, enables, triggers) and a weight representing relationship strength. The graph evolves over time - nodes update as market conditions change, life events occur, or the client takes action.

GNN Message Passing for Financial Context

The core mechanism is message passing - each node updates its representation by aggregating information from its neighbors. So a portfolio node’s representation ends up being informed by the goals it funds, the constraints imposed by loans, the risk capacity that insurance enables, and the market conditions affecting its value. Everything talks to everything it’s connected to.

Here’s a simplified implementation:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch_geometric.nn import MessagePassing
from torch_geometric.data import Data

class FinancialContextGNN(MessagePassing):
    """
    GNN layer for financial entity context propagation.
    Each node updates its representation by aggregating messages
    from connected financial entities, weighted by edge type.
    """
    def __init__(self, node_dim: int, edge_types: int, hidden_dim: int = 128):
        super().__init__(aggr='add')

        # Edge-type-specific message transforms
        self.edge_transforms = nn.ModuleList([
            nn.Linear(node_dim, hidden_dim) for _ in range(edge_types)
        ])

        # Node update function
        self.node_update = nn.GRUCell(hidden_dim, node_dim)

        # Attention mechanism for neighbor importance
        self.attention = nn.Sequential(
            nn.Linear(2 * node_dim, hidden_dim),
            nn.ReLU(),
            nn.Linear(hidden_dim, 1)
        )

    def forward(self, x, edge_index, edge_type):
        """
        x: Node features [num_nodes, node_dim]
        edge_index: Graph connectivity [2, num_edges]
        edge_type: Edge type indices [num_edges]
        """
        # Propagate messages
        aggregated = self.propagate(
            edge_index, x=x, edge_type=edge_type
        )

        # Update node representations using GRU
        # (preserves information across message-passing rounds)
        x_updated = self.node_update(aggregated, x)
        return x_updated

    def message(self, x_i, x_j, edge_type):
        """
        Compute messages from source nodes (j) to target nodes (i).
        Messages are transformed based on relationship type.
        """
        # Apply edge-type-specific transform
        messages = torch.zeros(x_j.size(0),
                              self.edge_transforms[0].out_features,
                              device=x_j.device)

        for etype in range(len(self.edge_transforms)):
            mask = edge_type == etype
            if mask.any():
                messages[mask] = self.edge_transforms[etype](x_j[mask])

        # Compute attention weights
        alpha = self.attention(torch.cat([x_i, x_j], dim=-1))
        alpha = F.leaky_relu(alpha)

        return messages * torch.sigmoid(alpha)


class FinancialContextManager(nn.Module):
    """
    Multi-layer GNN that builds rich context representations
    for financial advisory conversations.
    """
    def __init__(self, node_dim=64, edge_types=6, num_layers=3):
        super().__init__()

        self.node_encoder = nn.Linear(128, node_dim)  # raw features → embedding

        self.gnn_layers = nn.ModuleList([
            FinancialContextGNN(node_dim, edge_types)
            for _ in range(num_layers)
        ])

        self.context_head = nn.Linear(node_dim, node_dim)

    def forward(self, data: Data) -> torch.Tensor:
        x = self.node_encoder(data.x)

        for layer in self.gnn_layers:
            x = layer(x, data.edge_index, data.edge_type)
            x = F.relu(x)

        # The client node (index 0) now contains propagated context
        # from all connected entities
        client_context = self.context_head(x[0])
        return client_context


# Edge type encoding
EDGE_TYPES = {
    'owns': 0,
    'constrains': 1,
    'funds': 2,
    'enables': 3,
    'triggers': 4,
    'family_link': 5,
}

One design choice I want to call out: using a GRUCell for node updates rather than a simple linear layer. Financial context has temporal dynamics. A node’s representation should evolve incrementally as new information arrives, not get recomputed from scratch every time. The GRU’s gating mechanism lets the model decide how much new information to incorporate versus how much existing context to keep. This maps well to how advisory context actually works - a market crash doesn’t invalidate everything you know about a client, but it does change which portfolio attributes matter most right now.

Three message-passing layers means each node can incorporate information from entities up to three hops away. In practice, that’s enough for most context chains you’d care about: “Client’s parent (hop 1) has medical expenses (hop 2) that require liquidity from the client’s debt portfolio (hop 3).”

The India-Specific Graph Structures

There are patterns in Indian financial advisory that make graph representation especially valuable.

Joint family financial structures. The Hindu Undivided Family (HUF) is a uniquely Indian entity - a tax unit comprising multiple individuals with shared ancestral property. The financial graph of an HUF involves multiple clients with partially overlapping portfolios, shared tax optimization goals, and messy inheritance dynamics. Try representing this in a transformer: “this SIP is owned by Rajesh individually, but this property is owned by the HUF of which Rajesh is the karta, and the rental income flows to the HUF account but Rajesh can draw from it for portfolio rebalancing.” That needs elaborate textual description. In a graph, it’s just a set of ownership edges with entity-type attributes.

NRI dual portfolios. An NRI (Non-Resident Indian) client typically has a domestic portfolio (NRO/NRE accounts, Indian mutual funds, property) and a foreign portfolio (401k/pension, foreign equities, foreign property). These are linked by repatriation rules, FEMA regulations, DTAA tax treaty provisions, and currency hedging considerations. The graph captures these cross-border edges naturally. And when something changes - say FEMA limits on repatriation get updated - the impact ripples through to the domestic investment strategy via message passing, without needing to re-encode the entire context from scratch.

GNN vs. Transformer: Empirical Results

I ran both architectures on the same task: given a client graph and a new conversation turn, predict which context elements are most relevant to surface for the advisor. The metric was recall@5 - of the context elements that human advisors actually referenced in their responses, how many were in the model’s top-5 predictions?

Architecture	Recall@5	Latency (p95)	Context Staleness
GPT-4 (full context in prompt)	0.71	2.8s	N/A (stateless)
Fine-tuned transformer (sequential)	0.74	180ms	High after 30 turns
GNN (3-layer, graph context)	0.82	45ms	Low (incremental updates)

A few things stood out.

The GNN’s structural bias turns out to be the right inductive bias here. By encoding relationships explicitly, the model doesn’t have to learn from data that “insurance enables equity risk” - the edge type encodes that structural prior, and the model just learns the magnitude. More data-efficient. More interpretable.

Latency matters more than I expected. A wealth manager using an AI copilot during a client meeting can’t wait 2.8 seconds for context retrieval. At 45ms, the GNN is viable as an always-on context engine that updates after every conversation turn. That difference between 2.8 seconds and 45 milliseconds changes what you can build.

Context staleness is the transformer’s hidden failure mode. This was the finding that surprised me most. The sequential transformer’s performance degraded noticeably after 30+ conversation turns - context from early turns just got diluted by later information. The GNN doesn’t have this problem because the graph persists. Entities updated early in the conversation keep their representation. New information modifies node states incrementally through message passing rather than competing for attention with all previous tokens.

The Hybrid Architecture

In practice, I don’t use the GNN in isolation. The production system is a hybrid - the GNN maintains the persistent client context graph and retrieves relevant subgraphs, which get serialized and included in the LLM’s prompt as structured context. The LLM generates the natural language advisory response; the GNN makes sure it has the right context to work with.

This separation matters. LLMs are great at natural language generation, reasoning over provided context, and handling novel queries. GNNs are great at maintaining structured state, propagating constraint updates, and retrieving relevant subgraphs efficiently. Asking either to do the other’s job produces worse results.

So the flow is: conversation turn arrives, GNN identifies the relevant subgraph (45ms), nodes and edges get serialized into structured context, and the LLM generates a response grounded in that context. When new information comes in - a new life event, an updated portfolio value, a changed risk preference - the graph updates incrementally and the GNN propagates implications through message passing. No need to reprocess the entire history.

When Not to Use GNNs

To be fair about the limitations: if your financial application is primarily about analyzing documents (earnings calls, filings, research reports), transformers are the right choice. If you’re building a chatbot for retail banking FAQ, you don’t need a GNN. The graph approach pays off specifically when you have persistent entity relationships that evolve over time and constrain each other. Which is exactly what financial advisory is.

The broader point is about matching architecture to data structure, not chasing what’s trending on arXiv. Financial advisory data is a graph. Represent it as one.