How to Process 10K Bank Transactions Thru an LLM

How to Process 10K Bank Transactions Thru an LLM

What are LLM Context Limits and How to Overcome Them?

LLMs like GPT have changed how we think about AI, providing insights and generating text based on the context provided by extensive training data. However, these models have a limitation known as the "context window". This can be thought of as “working memory”. This article explores the significance of context windows, their limitations, and strategies to overcome these constraints.

What is the Context Window?

The context window of an LLM is the amount of text it can "remember" or consider when generating a response. This limit is measured in “tokens”. In the english language these are equivalent units:

  • 1 token ≈ 4 characters
  • 1 token ≈ 0.75 words
  • 100 tokens ≈ 75 words

The latest GPT-4 can consider significantly more tokens at once than its predecessors. Detailed context window sizes for various models are available on platforms like OpenAI's model guide.

Why is the Context Window Important?

Imagine you want to analyze a document of 500 words, but your model's context window can only accommodate 250 words at a time. In such cases, only the latter half of the document remains in the model’s context, meaning any analysis or response generated will only have knowledge on the second half. This can result in incomplete understanding and responses that might seem out of context.

Imagine you're cooking a complex recipe but can only remember a few steps and ingredients at a time. Like a limited context window, this might cause you to forget key spices or miss important steps, potentially leading to a less successful dish.

Another analogy of what the effect of a smaller context window is like is trying to compose a coherent text message on your phone while under the influence of alcohol. Often, when you revisit what you've written the next morning, you might find the sentences disjointed and lacking in cohesion.

How do you pack more into your Context Window?

At a high level the goal of overcoming context window limitations is to reduce the number of tokens required to convey the exact same message and meaning. There are two benefits:

  1. Preserve context on a large amount of data or text.
  2. Reduce API charges since most LLMs charge on a per token basis.

Data Compression

At Truewind, we are developing an assistant for a client that interfaces with a database of financial records. This assistant can access and "call" tools, which in this scenario, involve database queries to fetch data. A typical query might involve asking for the "evolution of accounts in 2023," where the assistant would display financial data in a graph format.

However the data comes back in the form of transactions like this:

ID Date Description Debit Credit
0001 2024-04-01 Opening Balance 0.00 0.00
0002 2024-04-02 Purchase of office supplies 150.00 0.00
0003 2024-04-03 Client Invoice #123 0.00 2,000.00
0004 2024-04-04 Bank Fee 15.00 0.00
0005 2024-04-05 Rent Payment 1,200.00 0.00
0006 2024-04-06 Sale of Product XYZ 0.00 1,500.00
0007 2024-04-07 Coffee for office 45.00 0.00
0008 2024-04-08 Internet bill 100.00 0.00
0009 2024-04-09 Transfer to savings 500.00 0.00
... ... ... ... ...
9999 2024-12-28 Client Invoice #124 0.00 750.00

You can imagine this table extending out to hundreds and thousands of transactions, well over any context window limit. To manage data efficiently, the assistant uses a data aggregation pipeline that aggregates account balances into annual summaries with growth rates, significantly reducing the volume of data processed. The result of such an aggregation looks like the following:

Month Balance (USD) MoM Growth Rate (%)
January 10,000.00 -
February 12,345.67 23.46
March 9,876.54 -19.99
April 13,210.78 33.79
May 10,005.89 -24.27
June 14,567.12 45.59
July 11,234.56 -22.85
August 15,000.33 33.61
September 12,250.00 -18.34
October 16,789.45 37.06
November 13,555.55 -19.29
December 18,000.99 32.85

The Future of Context Windows

In my opinion, the issue of context window limits in large language models is a temporary technical limitation. Drawing a parallel to the early days of computing where memory was scarce, I see a similar trajectory with context windows. As technology advances, these limits are likely to expand significantly, making current optimization concerns for context windows less relevant over time.

Similarly, the evolution of programming practices offers a relevant analogy. Early programmers had to optimize code to fit within strict CPU and memory limits due to hardware constraints. Today, with substantial improvements in hardware, these constraints are less pressing. Likewise, as context windows in LLMs expand, the focus on optimizing within these limits will likely decrease, allowing engineers to focus on the business problems.

Read more