Unpacking GenAI’s Role in Contract Testing

Hello and welcome to part 2 of our PactFlow AI Testing Blog Series.

In our last blog “The Case for Contract Testing: Cutting Through API Integration Complexity“, we looked at how isolated testing leads to redundancy. We also explored the testing pyramid, highlighted overlapping concerns, and explained how leveraging contract testing (CT) in your testing arsenal enables graceful system evolution.

To read more about the traditional pain points of contract testing, you can recap by reading part 1 of this blog series.

With the rise of Generative AI, organizations and practitioners have been wondering how best to harness these emerging technologies to alleviate common problems across many workflows, with contract testing not an exception.

The differences between a good contract testing implementation and a bad one are stark. Whilst the utopia is being able to make changes with confidence and being able to sleep soundly, whilst on-call, as you know the phone won’t be ringing because of one of your endpoints, contract testing hell can feel like you wish you never started, and will invariably have the bean-counters at your door, asking about why that ROI never came to fruition. Read Alejandro Pena’s guide to understanding the ROI of Contract testing here

The potential of AI in Contract Testing

So, one may be wondering what potential may lie in the generative AI generation, and how it may tackle some of the complexities we see. Let us look at some potential examples.

Automating repetitive tasks
Quickly scaffolding contract tests for users, can allow them to concentrate on the contract content, and intent of the test, rather than how to write the generic boilerplate required every time.
Reducing human error
By providing context of the code that is to be tested, generated code can ensure that that request and response expectations captured in the tests, match with the data used in the code, limiting stale data and ensure that tests can stay current, as API clients change.
Predictive Insights
Augmenting AI models. through RAG (Retrieval-Augmented Generation), with data points from your organization, such as API documentation, API observability (OTel or similar) and contract tests, would potentially allow users to interrogate the dataset in a way previously deemed highly exhaustive manually. Imagine for instance, a RAG AI creating a workflow diagram of your API’s, with the ability to drill down into individual service usage. If it could help you identify zombie API’s or reduce API duplication.
We may see models with knowledge of contract testing or specific tools to provide guidance on the makeup of a contract test and highlight areas for improvement, acting as a contract-test pairing buddy.
Improved time to market
Contract testing tools themselves are ever evolving, with new feature requests, and issues being raised. Generative AI may allow for rapid prototyping of ideas and proof of concepts, to bring initial value to users. Once initial value is found, especially in the open source, we often find contributors will improve upon the solution, until it meets their needs. In this scenario, Generative AI would play a role in empowering and enticing users with some tangible code, rather than a GitHub ticket for a new feature request.

Now to be clear, this is not a definitive or exhaustive list. There will be many use cases to be discovered, ratified and validated. Hopefully, this may whet your appetite.

So, if this list is more aspirational, what about the here and now, as there are current tools on the market that offer AI capabilities, for all things, if you believe the marketing hype. In our next section, we will look at some of them along with the benefits & caveats.

Limitations of Current Generative AI Solutions

If you are a gamer, you will have noticed that graphics cards have been scarce for several years now, firstly snapped up by cryptocurrency miners, and more recently introduced in huge swathes in server farms all over the world. The reason has also made headline news and become dinner table conversation. It is of course Generative AI, whether its curated prompting via OpenAI’s ChatGPT, image creation by Midjourney, or code-creation via Microsoft’s CoPilot, everyone seems to have tried it in some fashion, even if it has been only for 5 minutes to pique their curiosity.

Large Language Models (LLM’s) are vast datasets which take time to curate and require vast amounts of data to be processed. Once these models are created, they are then released and used.

The software components that make up the building blocks of our applications evolve over time, interfaces may change, documentation will be updated, breaking changes will occur. Best practices may emerge, which are not within the documentation itself, but are passed on through users / blogs posts and other mediums.

Now if you consider that an LLM is a snapshot of a certain view of the world, at a certain point in time. It only knows the data is has been trained on, it is not topped up, as the world moves around it.

ChatGPT 3.5 for example, which is used in CoPilot is said to have a cut-off date of 2021.

Most projects have had significant improvements and advances in the space of 3 years. In that time, the Pact contract testing framework has migrated from a Ruby shared core to a Rust shared core, and the client languages such as Pact-JS, Pact-JVM and others, have had DSL changes. New features have also been introduced to the Pact framework.

All of this knowledge is not present in the Generative AI models you’ll find on the market today, which means which they can generate a Pact test that looks plausible, it will either flatly refuse to compile with the latest Pact libraries or require you to leverage older versions of the code which is compatible.

Once we have the right versions of the libraries and corrected the DSL to allow the test to compile, we often find that Pact best practice is not being followed. The test may not generate provider states or leverage the power of matchers. The tests may be too strict, or too loose.

All of this diagnosing takes time, which detracts away from the main aim of writing the test in the first place, which in a Test-Driven development context, where Pact is often used, to apply pressure to the design of our code. We have many times found Pact (and other types of testing) have forced us to rethink how our code is structured, to make it more testable.

We understand that generative AI for creating the initial scaffolding for a test has great potential, but it requires up to date knowledge of the following to be truly effective:

Pact Client Language DSL’s
Pact Consumer test best practice
Pact latest features
Your own code, in order to ensure that the Pact expectations are correct, to avoid putting undue burden on your provider, in verifying the generated contracts.

Conclusion: AI’s Potential but Not a Silver Bullet

So wrapping up, generative AI offers a stepping-stone toward a more automated future in contract testing but is not the complete solution just yet.

It could can aid practitioners engaging in contract testing in a multitude of diverse ways, helping them tackling their biggest pain points around the learning curve, test creation, maintenance and scalability however to fully leverage the potential, we would need to augment today’s solutions, by at least providing up to date knowledge of Pact’s client DSL’s, features, and best practices.

In our third and final post of this blog series, Matt Fellows will be explaining how the PactFlow team explored the value curve of combining contract testing and AI, to deliver a PactFlow AI-augmented contract testing solution builds to offer a more effective, specialized toolset for contract testing, you don’t want to miss it.

Thanks for reading, we cannot wait to see what you build!