AI word count limit – Why it sucks for legal


A major technical limitation for AI in the legal industry is word count. Large language models (like GPT) and products that are built on top just cannot process more than roughly 10 pages competently. It might sound great for most use cases, but that is a short contract or decision in the legal context. The 10-page limit places real constraints on when and how legal professionals can use AI.

How does an AI word count limit affect legal tech?

You might be able to name some AI tools that claim to summarize documents far longer than 10 pages. In short, they use tricks to get past this limit but they just don’t perform very well. The tools often lose the ability to consider overarching themes in the text, and they’re also particularly poor at looking at concepts sprinkled throughout documents. For law, this is problematic because the start of the document may make references to the very end of the document by naming provisions, exhibits, definitions, etc.

Even today, this structural limit strangles GPT4. The difficulty is that it is often glossed over by startups that advertise their “world class AI” unscrupulously, and it is easily missed when testing a product for the first time. For example, we’ve all likely given ChatGPT a spin. You might have picked a short blog post or news article for ChatGPT to summarize and were impressed with the results. Suddenly, the idea of summarizing long textbooks or court decisions seem like a real time saver, with no one to warn you that your mileage may vary once the document gets longer.

Try other uses instead of summarizing!

As of the publication of this blog post, I would not recommend any AI summarizer for the legal profession. We’ll discuss some promising developments closer to the end of this post, but the key takeaway is that using AI effectively requires an understanding of the technology. With the correct tools, you can do a lot with long documents, just not great summaries. Even past 10 pages, there are very effective tools for pulling information out, responding to questions about the text, writing short headnotes, etc.

AI word count? Token size?

The technical term for this AI word count limit is token size. GPT3 supports 4K tokens, and the latest GPT4 supports 8K. Roughly converted, you get about 3/4 of a word for each token, so GPT4 has roughly a 6,000 word limit, which equates to about 10 pages single spaced. The 10 pages are shared between the input and output. This means that if you put a 9 page article into GPT, it will only have 1 page remaining to write the summary. GPT will literally cut itself off mid-sentence when it hits the token limit, while commercial tools sometimes try to smooth out the end with another algorithm.

If you search online, there will be lots of folks who ask why OpenAI imposes this limit when it’s clear GPT could at least finish its sentence. It’s true that OpenAI imposes the 4K and 8K token limits, and that the model can extend it beyond those limits. However, the limit is not arbitrarily placed and is based on technical limitations of how much information can be kept in memory with the software and hardware that OpenAI uses. It is also partly because of the content of the training material. AI needs to learn, and there are far more sub 10-page materials out there. As a result, the 4K and 8K limits are hard limits placed to ensure the quality of the response. This is a limit that is shared across virtually all AI providers, but it can be improved upon.

How do the current tools get around AI word count?

Recursion is the most common method to get around this limit. The idea is to break down the long document into chunks, process each chunk separately, and join them together. If we look are recursive summaries, you might start with a 20-page document and break them down into 4 separate 5-page documents. You summarize each one to be half the length, and join the results together. If that isn’t enough, you take your shortened 10-page document, split it into two, and try again. This approach is why AI generated summaries are totally ineffective for the legal profession. The first chunk could be the definitions, and it gets ignored when you reach the third chunk where the term is actually used. It is not able to look at the document as a whole.

More sophisticated providers build on top of this simplistic approach to band-aid over this limitation, but still perform poorly. In order to maintain some level of cohesion, they pre-analyze the text by noting important ideas and storing it in a separate space that the AI can access. There are many approaches to this. Whether you store keywords, important ideas, or even manually choose what might show up in the future. This is becoming a lucrative area with tools like Langchain, Weaviate, and Pinecone providing different solutions. Legal tech AI either gravitates towards the cheapest tool (in-house use of Langchain), or they make assumptions for what might show up in the document.

The case of ChatGPT

ChatGPT and other chatbots are another interesting case because the very medium masks the word count limitation. What many don’t realize is that ChatGPT is a long running transcript that gets fed in its entirety into the model each time you ask a question. ChatGPT has no inherent memory, you are just submitting the history of the conversation for your next question. To put this in clearer terms, if you are halfway through a conversation with ChatGPT, you are actually asking ChatGPT to complete the next line in transcript with 2 people talking and they got to this point of the conversation. For other chatbots, this means that you have fewer words to work with the further along you are in the conversation. ChatGPT however, keeps a rolling limit, only “remembering” the last 10 pages of your conversation.

Larger Token Limits Incoming (with Limitations)

The reason I started thinking about AI word count is because Anthropic recently released a 100K token limit for their AI model Claude. This is a whopping 125-page limit to work with, and suddenly, summarizing documents with LLMs makes a lot more sense. Claude is a competitor to ChatGPT and GPT4. Others have conducted detailed comparisons online, but the performance is somewhere in between GPT3.5 and GPT4. I conducted my own tests with very impressive summaries for short documents, and I have a hard time differentiating the responses I get from Claude when compared to GPT4. When it comes to longer documents, Claude seems to have trained on fewer documents. So while very useable and smart, the results are not as impressive as GPT4.

Unfortunately, training material is not the only problem with Claude 100K. Unlike every other LLM model out there, Claude 100K cannot output more the 2K tokens. You can put 125 page document into the model as an input, but your summary can’t be longer than 2.5 pages. It was incredibly frustrating to see such a powerful model dangled in front, just to suffer from a crippling drawback. And I was disappointed because the short summaries it generated for long documents were actually very good, just not detailed enough for most purposes.

After the initial disappointment however, I realized the Claude is actually capable of a lot, even if summaries aren’t on the list. I found that submitting long documents into the model, and asking questions is probably the best way to use it. 2.5 pages of text is a healthy answer to a focused question. And you can ask multiple questions to get what you need.

Don’t forget about GPT4!

Another development worth mentioning is that GPT4 has a 32K token model upcoming. I haven’t been able to get access. Even among the more qualified within my network, OpenAI has severely limited invites. I’m excited to give this one a try because a 40-page limit will help considerably. Even if it doesn’t match Claude’s 125-page limit, I believe GPT4 will allow you to split the token limit however you’d like which gives it an edge for generating summaries in some cases.

Final Thoughts

If you aren’t excited for AI, you aren’t digging into it deep enough. It’s not magic and we can’t expect the impossible. However, someone is addressing limitations and impossible hurdles just a few months ago. Claude and GPT4 are big leaps for AI in the legal industry. I can only hope that we continue to keep open minds about what is incoming.