Legal tech

How good is GPT-4 at marking up contracts?

The ClauseBase team recently experimented with developing a GPT-4 powered contract review & markup module. Read on to find out how GPT-4's abilities hold up!

Senne Mennes

June 6, 2023

Unless you have been living under a rock, you have probably seen huge amounts of buzz around large language models like GPT4 and their impact, potential or otherwise, on the legal sector.

And yes, there is huge potential there. But there is also a lot of buzz where maybe there shouldn’t be. GPT4 is a mind-blowing advancement of AI, but it’s not synonymous with magic.

Recently, we have enjoyed creating applications for GPT-4 to the work of lawyers and testing their practicality. We did a whole webinar on applications to the legal drafting process and summarized our findings in one of our recent blog posts.

This time, we looked at contract review, and integrated GPT-4 into ClauseBuddy, to enable it to do fully automated markup based on general instructions. You can compare it to being a senior lawyer asking a junior lawyer to make specific changes.

Though our own product focus is on legal drafting and not reviewing, our experience with MS Word plugins (thanks to ClauseBuddy) allowed us to quickly build a GPT-4 powered review module inside of MS Word. Since we have no horse in that race, we can provide a more unbiased view of the state of the technology.

Below is a summary of our findings. If you prefer watching our video on this topic, you can find it here.

Changing the definition of confidential information

We started out with a basic NDA between two fictional entities – “Bank” and “Counterparty”.

Our first markup request was to amend the definition of Confidential Information, so that it applies to all information that should reasonably be considered “confidential”, as opposed to just information that is explicitly marked as “confidential”.

Here’s what GPT-4 did with that:

Not a bad response. That got us intrigued to test this markup functionality further across an entire document.

Making the NDA unilateral

In our case, we are dealing with a mutual NDA. We therefore asked GPT-4 to “make the entire NDA unilateral in favour of Bank Ltd”. Here is how that went:

In short:

GPT-4 changes the title of the document to “Unilateral Non-Disclosure Agreement”. Not necessary, but not wrong either.
It removes Counterparty from the introduction clause – taking the word “unilateral” a bit too far. A contract with a single party is not much of a contract... Later on, it does reference Counterparty again, though.
In the definition list, and throughout a large part of the rest of the document, it replaces the term “Disclosing Party” with “Bank” and the term “Receiving Party” with “Counterparty”. This is exactly what we want and showcases that GPT-4 does understand to a certain degree what a “unilateral” NDA is.
Throughout the rest of the document, however, we see that it is inconsistent in its amendments to the terminology. Sometimes it replaces “Disclosing/Receiving Party”, sometimes it does not. It is unclear whether this is just sloppiness or whether it has a reason for skipping some terms.

In conclusion, GPT-4 seems to know what “unilateral (NDA)” means, but does not understand the term deeply enough on a legal-technical level to consistently make the correct changes. It also suffers from some sloppiness issues.

The latter could not have stemmed from the length of the document, though that would be a reasonable assumption to make. As you can also see in the video of this test, the document is only 3 pages long (or 8000 characters), and we are using the special 32.000 token version of GPT-4. The normal GPT-4 (which you can get access to as a consumer) limits itself at 8.000 tokens. You can think of tokens as pieces of words, where 1.000 tokens is about 750 words, meaning this version can process 24.000 words. That sounds OK, but note that both the input document and the comments/modifications, as well as some hidden instructions that the software prepares, need to fit together within the limits. So in practice, we are able to review contracts up to about 10-15 pages.

Do note that it’s just a matter of time before this limit gets improved. In fact, Anthropic’s LLM “Claude” already offers up to 100.000 tokens.

General analysis of an NDA

In a third interaction with the NDA, we decided to up the ante again by asking GPT-4 to do abstract analysis of the document, formulated as “Identify the most problematic clauses”.

We deliberately kept this prompt vague, to see what GPT-4 would come up with and how it defines “problematic for itself”. While stopping short of a full legal/commercial review, it does offer a few useful suggestions, if not always correct ones.

Problem 1: the definition of “Affiliate is outdated”

GPT-4 starts off by saying that the definition of “Affiliate” is outdated. Not only is this a good reflex to have when considering “problematic” clauses, it is also correct. The definition references the old Belgian Company Code, which is no longer in force.

Problem 2: the definition of “Confidential Information” may be overly broad

GPT-4 provided the following comment to the base definition of Confidential Information we looked at earlier.

At face value, this is the right reflex. Indeed, an overly broad definition of “Confidential Information” could very well be problematic. But is the concern valid here?

Taking a look at the definition itself, we see that there is indeed a very lengthy list of information that could potentially be considered “Confidential Information”, but that this is only the case where that information was explicitly marked “confidential”. This actually greatly diminishes the scope. On average, most of the examples provided by the definition will not carry such an explicit “confidential” marker.

Problem 3: the obligations for the receiving party may be too strict

This is a close one. It is right to flag the Obligations clause – it is the meat of the agreement, after all. Especially for the receiving party. Whether the obligations are actually problematic is another question. One could argue for and against.

Problem 4: the Inside Information annex is not attached

This is a very good find by GPT-4. There is indeed no annex on “Inside Information”, even though the document references such an annex. This is the kind of thing you have to go through the document for with a fine-tooth comb. GPT-4’s advanced language processing skills understand what an annex is, and understand that it is absent in this document. A highly useful tip!

In conclusion, GPT delivers some useful information, although not all of it is correct and should be reviewed critically. This confirms what most, frequent GPT users have already experienced for themselves: do not rely on GPT for anything outside of your own area of expertise. GPT is a great co-pilot, but only if the pilot knows what he/she is doing.

General analysis of a licensing agreement

Now, NDAs are highly common documents and so GPT should have been trained on a vast amount of them. We therefore asked for another general “problematic clauses” review, but this time of a software licensing agreement.

Problem 1: There is no definition for “[…]”

The first comment we receive is that there apparently is no definition for Confidential Information. This is another great example of that “close but no cigar” feeling we had earlier. It understands capitalised terms to mean definitions, it even understands that these definitions are typically set out in a definition list, but it fails to see the bold in-line definition a few lines into the paragraph.

Funnily enough, it does get it right for a lot of other definitions, for example the term “Intellectual Property Rights”.

No real substantive comments on legal/commercial issues are provided, though this is surely useful.

Detailed analysis of a licensing agreement

So what if we helped GPT along a bit and zoomed in on the issues we wanted its assistance with?

For this exercise, GPT was given the prompt:

Identify the problematic clauses. A clause is problematic if:

Liability cap is equal to or below 50,000 EUR
Confidentiality obligations are mutual
Dispute resolution is not done via arbitration under CEPANI rules
Notice period for termination for the client is equal to or more than 1 month

In short, GPT is able to correct two of the four issues:

(1) it includes the CEPANI arbitration rules into the dispute resolution clause and

(2) it implements the liability cap. However, it only follows these instructions superficially, without paying attention to the finer points of nuance and layers in the clause. GPT includes the liability cap, replacing the initial “amount equal to the annual fees”. Yet, GPT made this change without knowing what the annual fees are, whether they were below or above 50,000 EUR. With regards to the other two issues, GPT did not make any corrections, instead only identifying that indeed, the confidentiality obligations are mutual and that the notice period is too long.

Anonymization

Finally, we wanted to know if GPT could reliably anonymize a document. This is one of the most difficult tasks when it comes to document alteration on a grand scale. We have extensively scoured the market for tools that can do this with the hope of incorporating it into our own solutions. The result? No company is, as of yet, able to solve it, with 100% accuracy guaranteed.

Is GPT-4 better?

It starts out by doing a lot of things right.

GPT-4 is able to correctly recognise that the document is a Software License Agreement and accordingly changes the parties to “Licensor” and “Licensee”. However, the parties are still referred to as their respective names, specifically the party “Johnson”. The agreement also references an Acquired Company where the full name (“Chell LLC”) is anonymized, but not the reference name throughout the document (“Chell”).

Over the course of the document, we again see some of the sloppiness that we saw in previous exercises. Some references are anonymised, others are not. Finally, the signature clause, which contains the full name of the parties, is not anonymised.

The problem, unfortunately, has not been solved yet.

Conclusion

From these exercises, we can confirm what we already uncovered from our experiments with unleashing GPT-4 on the legal drafting process – it is really good at language manipulation exercises but should not be exclusively relied on for strategic and logical reasoning. It knows how to find and adjust specific language, it has an understanding of the language it is changing, it knows to make the necessary changes and, in fact, gets close to being accurate. But on quite a few more nuanced exercises, it falls short.

At this moment, GPT-4 cannot replace legal professionals in the context of a full legal review — the results simply are not there. At most, it can be helpful to do your own review and then ask GPT-4 to come in to see whether you have missed anything. Of course, the danger is that people will start with GPT-4’s analysis and, because it is so confident in its answer, automatically assume it knows everything, and then neglect to do a detailed analysis themselves.

However, rather than being a technological challenge, this raises challenges with regards to process and organization: How should we apply GPT? How should individual lawyers use it?

At this moment in time, the technology is not ready. But that may very well be different in a few months. The key difference maker in legal service delivery going forward is certainly going to be the kind of practical experience that isn’t readily available in GPT’s training data. For example: how to structure the parameters of a license in a license agreement (like global or territorial, remunerated or unremunerated, etc. in light of the commercial position of the parties), the different kinds of ways you can embed different legal nuances in a liability clause (like having a normal cap, carve-outs, supercaps, etc. and when to choose which), and just generally how to trade bargaining chips in the negotiation process.

This is the kind of knowledge experienced lawyers develop over the course of years when reviewing these kinds of documents. It is expertise that is typically not available for LLMs to ingest. Lawyers develop that expertise through osmosis. Some very good senior associate or partners take the time to deliberately train juniors in this kind of stuff, but most expect you to just pick it up as you go. Obviously, not everything like this can be put to paper, but that does not mean there is not a lot of unrealized potential.

The bottom line is that now is the time to start collecting this knowledge in a structured database so that you are ready for when this technology is completely up to speed, and, especially for law firms, to retain your competitive advantage over a tool that all your clients will be able to use basically for free going forward.

Unless you have been living under a rock, you have probably seen huge amounts of buzz around large language models like GPT4 and their impact, potential or otherwise, on the legal sector.

And yes, there is huge potential there. But there is also a lot of buzz where maybe there shouldn’t be. GPT4 is a mind-blowing advancement of AI, but it’s not synonymous with magic.

Below is a summary of our findings. If you prefer watching our video on this topic, you can find it here.

Changing the definition of confidential information

We started out with a basic NDA between two fictional entities – “Bank” and “Counterparty”.

Here’s what GPT-4 did with that:

Not a bad response. That got us intrigued to test this markup functionality further across an entire document.

Making the NDA unilateral

In our case, we are dealing with a mutual NDA. We therefore asked GPT-4 to “make the entire NDA unilateral in favour of Bank Ltd”. Here is how that went:

In short:

GPT-4 changes the title of the document to “Unilateral Non-Disclosure Agreement”. Not necessary, but not wrong either.
It removes Counterparty from the introduction clause – taking the word “unilateral” a bit too far. A contract with a single party is not much of a contract... Later on, it does reference Counterparty again, though.
In the definition list, and throughout a large part of the rest of the document, it replaces the term “Disclosing Party” with “Bank” and the term “Receiving Party” with “Counterparty”. This is exactly what we want and showcases that GPT-4 does understand to a certain degree what a “unilateral” NDA is.
Throughout the rest of the document, however, we see that it is inconsistent in its amendments to the terminology. Sometimes it replaces “Disclosing/Receiving Party”, sometimes it does not. It is unclear whether this is just sloppiness or whether it has a reason for skipping some terms.

Do note that it’s just a matter of time before this limit gets improved. In fact, Anthropic’s LLM “Claude” already offers up to 100.000 tokens.

General analysis of an NDA

In a third interaction with the NDA, we decided to up the ante again by asking GPT-4 to do abstract analysis of the document, formulated as “Identify the most problematic clauses”.

Problem 1: the definition of “Affiliate is outdated”

Problem 2: the definition of “Confidential Information” may be overly broad

GPT-4 provided the following comment to the base definition of Confidential Information we looked at earlier.

At face value, this is the right reflex. Indeed, an overly broad definition of “Confidential Information” could very well be problematic. But is the concern valid here?

Problem 3: the obligations for the receiving party may be too strict

Problem 4: the Inside Information annex is not attached

General analysis of a licensing agreement

Problem 1: There is no definition for “[…]”

Funnily enough, it does get it right for a lot of other definitions, for example the term “Intellectual Property Rights”.

No real substantive comments on legal/commercial issues are provided, though this is surely useful.

Detailed analysis of a licensing agreement

So what if we helped GPT along a bit and zoomed in on the issues we wanted its assistance with?

For this exercise, GPT was given the prompt:

Identify the problematic clauses. A clause is problematic if:

Liability cap is equal to or below 50,000 EUR
Confidentiality obligations are mutual
Dispute resolution is not done via arbitration under CEPANI rules
Notice period for termination for the client is equal to or more than 1 month

In short, GPT is able to correct two of the four issues:

(1) it includes the CEPANI arbitration rules into the dispute resolution clause and

Anonymization

Is GPT-4 better?

It starts out by doing a lot of things right.

The problem, unfortunately, has not been solved yet.

Conclusion

However, rather than being a technological challenge, this raises challenges with regards to process and organization: How should we apply GPT? How should individual lawyers use it?

How good is GPT-4 at marking up contracts?

Changing the definition of confidential information

Making the NDA unilateral

General analysis of an NDA

General analysis of a licensing agreement

Detailed analysis of a licensing agreement

Anonymization

Conclusion

How good is GPT-4 at marking up contracts?

Changing the definition of confidential information

Making the NDA unilateral

General analysis of an NDA

General analysis of a licensing agreement

Detailed analysis of a licensing agreement

Anonymization

Conclusion

products

Help

RESOURCES

COMPANY

Sign up to our newsletter