Table of Contents

Recently I wrote about writing a Golang application with a conversational Artificial Intelligence service - ChatGPT. This is a follow on article about trying to use a similar approach for testing. I am not presuming a familiarity with Golang for this article or claiming a high degree of expertise in it myself. The code discussed in both articles is here.

Context

I won’t repeat what I said previously about being a platform engineer, but one area where the discipline often differs from application software development is unit tests. Unit tests can be valuable in application development to check that behaviour matches expectations. Consider the following examples from wikipedia. We specify a feature, e.g.:

Feature: Withdraw Money from ATM

    A user with an account at a bank would like to withdraw money from an ATM.

    Provided he has a valid account and debit or credit card, he is allowed to make the transaction. The ATM will tend the requested amount of money, return his card, and subtract amount of the withdrawal from the user's account.

We can then devise scenarios to test our implementation of that feature, e.g. we might specify the following (in Gherkin language). The utility here is obvious:

Scenario: Eric wants to withdraw money from his bank account at an ATM
    Given Eric has a valid Credit or Debit card
    And his account balance is $100
    When he inserts his card
    And withdraws $45
    Then the ATM returns $45
    And his account balance is $55

This can be an emotive area, but frankly it doesn’t make sense to write unit tests in many cases when deploying infrastructure at scale. Suppose you are deploying a new AWS account, with networking and kubernetes clusters, in code. How could you meaningfully test it? You could and should deploy a test example somewhere but it isn’t feasible to test every available feature of each of these resources each time. There are simply too many features in each to meaningfully test, and if you did try to do so it would take so long that you wouldn’t be doing anything else. Take as an example Terraform, a popular tool for deploying cloud-based infrastructure. There have been various efforts to implement testing frameworks for it, but none of them have really taken off for unit testing, and largely for this reason. Yes, you can test a single resource declaration, e.g. a firewall rule. but as soon as you move on to full-scale plans and modules it becomes pointless. It’s a similar story with any other tool interacting with those APIs- Ansible, CloudFormation, etc. You trust the cloud provider’s API responses and that your thing is ’there’, you’ll soon know if it isn’t!

Partly because of this I am not in the habit of writing unit tests, despite seeing their value in application development. Application software development itself is a different beast however and GoLang integrates unit testing in a very smooth way. You don’t have to pick and evaluate some external testing framework. With GoLang, testing comes included.

The Implementation

I decided to extend the application that I had recently written in Golang together with ChatGPT to include some on-board unit tests, again using ChatGPT. The start was great, I presented the application code,asked for some tests and ChatGPT:

  • Suggested some tests
  • Started creating them

It didn’t suggest how to integrate the tests - for a basic use case I would have expected a companion file something_test.go (the _test.go part is important). OK so I knew that myself and could do that.

The First Problem

ChatGPT was able to create tests for a couple of the functions written in the code, but ran into trouble with functions provided by external packages, e.g. promptuiwhich had been wrapped for this specific use case. It didn’t seem to recognise the difference between our own functions (which we could edit) and external package functions (which we couldn’t). It wanted to modify the input to the package functions to support proposed tests and got stuck in repetitive loops trying to resolve errors in tests or in the modified application code. It seemed it couldn’t understand that it didn’t ‘own’ those parts of the code, even when told.

At this point I can see sceptics smiling at the naivety of the situation. I could see that it wasn’t working and I had some idea why but I wasn’t sure what I ‘should’ do. ChatGPt of course didn’t have meta-awareness- for better or worse it didn’t know when to give up. At this point I did my own research the old fashioned way.

There is a school of thought that you should not be testing features provided by external functions, e.g. for my specific use case You should not try to test promptui as it is expected to be tested by its author.. Having seen the difficulty of trying to do otherwise I decided to follow this advice, I decided to ask ChatGPT to test each function in turn ‘without changing it’ and giving up on testing my function based around promptui.

The Second Problem

ChatGPT was unable to effectively test the function askForNumber as it stood. Naively this is a simple, generic and obvious function which you would expect to come up frequently:

// ask the user for a number and suggest a default
func askForNumber(prompt string, defaultValue int) int {
	fmt.Printf("%s %d]: ", prompt, defaultValue)

	// Read the user's input
	var input string
	fmt.Scanln(&input)

	// If the input is empty, return the default value
	if input == "" {
		return defaultValue
	}

	// Otherwise, try to convert the input to an integer
	value, err := strconv.Atoi(input)
	if err != nil {
		// If the conversion fails, return the default value
		return defaultValue
	}
}

The simplicity is obviously deceptive- tests were suggested but invariably couldn’t produce a suitable test value that in turn would not break when tested. I again researched this separately and found that simple testing wasn’t going to work. Essentially I’d have to set up a proper mocking environment to test ‘properly’. There are philosophical questions by the time you get to this point as to what in fact you are testing, but anyway. Mocking would also likely tackle the third problem.

The Third Problem

Doing individual unit tests for individual functions was ‘ok’ for the three functions I could get ChatGPT to write simple tests for but it wound up leaving func main and 2 others (the promptui wrapper and the askForNumber functions above) uncovered. It’s a good practice to decompose your code to separate functions anyway, but it didn’t seem ‘worth it’ to pursue this route further with my main function, especially considering the second problem. With these 3 unit tests I got the following test coverage reported:

go test -cover
PASS
coverage: 26.3% of statements
ok      github.com/joshuamkite/tarot-card-shuffle-draw  0.003s

Can ChatGPT Do Mocking?

I asked ChatGPT

can you produce a mock environment to test this application

and I got

Yes, to create a mock environment to test this application, you could use a mocking framework such as mockgen or gomock to generate mock implementations of any dependencies or external services that the application uses. …

Unfortunately this turned out to be a tar pit beyond ChatGPT’s capacity. It asked me to install mockgen but then I had exchanges like:

[Me, presenting error from a suggestion]:

mockgen -version
v1.6.0

mockgen -destination=promptui_mock_test.go -source=github.com/manifoldco/promptui -type=Select
flag provided but not defined: -type
...

[ChatGPT]:

It looks like the version of the mockgen command you are using does not support the -type flag. The -type flag was introduced in version 1.4.0 of the mockgen command, and you are using version 1.6.0, which does not support this flag.

This is clearly nonsensical. I suppose I could have proceeded at this point to do my own deep dive into mocking with Go, and maybe at some point I will, but this wasn’t the main thing here. I had wanted to see how far I could get implementing straightforward testing of what was fundamentally a toy-level application with AI assistance.

Conclusions

It seemed I had arrived at a limit for ChatGPT’s current ability:

  • It was possible to generate simple unit tests for some (26% of) cases, but I had to apply my own external rules on:
    • What degree of departure from the original application code was acceptable - I wound up with ’none’.
    • How long to pursue a working test before researching elsewhere/ giving up
    • Which parts of the code to test
  • (Perhaps unsurprisingly) ChatGPT was not at this time able to originate mocking as a solution for testing, or when prompted to effectively implement it in this case.

As per ‘The Takeaway’ from my previous article:

  • Context presented, dialogue, and interactive iteration is important.
  • A clear and well understood goal is certainly helpful.
  • Design choices became far more important than implementation choices.

I still think that, as I previously wrote, AI as demonstrated by ChatGPT is an amazing tool that demonstrates a major inflection point in tech (and elsewhere). I also think that this exercise has reinforced my sentiment that ‘As ever, the people who are able, adaptable, ready and willing to move up the stack will be fine.’