No art in code

Move fast and try not to break too much

  • There’s a lot of gatekeeping coming from programmers these days. It seems that now software engineering isn’t just about writing code, but about having an architectural overview and product insight. Don’t worry our jobs are safe, cleaning up code after AI makes a mess! While possibly true, it is of note that these statement are being made en masse as AI coding becomes more prominent.

    There are 2 major flaws in this logic. Firstly an assumption that AI coding is not going to get better. It will. Fast.

    Secondly, in the land of AI, code quality isn’t an issue. We invented code quality to essentially a) stop ourselves from making mistakes b) make it easier to build larger systems using patterns.

    Take an example of duplicating a “magic” string. In human terms that’s a no-no because it is easy to make mistakes while having to update it. It could also get mistakenly used in different places e.g “colour” in one part of the code, and “color” in another, leading to some bugs. So we protect our human selves by creating reference variables.

    Machines don’t have that problem. There is basically no effort to update strings in multiple place. Correctly. That’s what they do, what they’ve always done, and what they’re very good at.

    Yes that’s just one example of a pattern we use that machines don’t need to care about. The rest of the examples are easy to extrapolate.

    Now, as long as humans and AI are in the same codebase, code quality does matter. Kinda in the same way a parent has to mash food for a baby to make it easier to eat. AI is expected to “write good code” so that we can understand it and contribute too.

    Codebases where it’s only AI writing? When last did you peruse some byte code generated by compiler to confirm the level of code quality was acceptable to you?

  • Software Engineering is a product bottleneck. We need to be ruthless in our pursuit of, well, not being that bottleneck. I’ve been using (for long enough I consider it battle-tested) a mechanism that helps a lot and wanted to share it.

    tl;dr Unit tests can be a great tool but, depending on how they’re implemented, can:

    • slow down development of a reasonably-sized system, by discouraging healthy ongoing refactoring
    • provide false comfort and lose a lot of regression safety during refactors.

    Let’s start by being sure we are on the same page about what unit tests are.

    A unit test is generally considered to be a test of the smallest functional piece or unit of code . In the practical sense of the work environment, this often comes down to a set of tests per file, where the “export” of that file is a class or function(s).

    The idea here is that if each unit of code is behaving as it should, then the system as a whole will behave correctly. The logic here is pretty much flawless in it’s simplicity. It’s probably the reason why unit tests have become the key ingredient of minimum viable software engineering – there is a legitimate confidence that can be allowed here. i.e they’re not useless!

    Unit tests can be used for 2 purposes.

    1. TDD: Making sure the code is matching your mental model of what it should do, and for flushing out dependencies.
    2. Regression Protection: Making sure a change in one part of the tested unit does not negatively affect existing functionality in the rest of the unit.

    TDD is fine. Do it if it helps you. Move on.

    The bit about regression protection has a flaw (as marked in bold) in that it only actually focuses on the “unit” not the actual user/usable functionality of the system.

    If the codebase were never to change again, or indeed only require minor bug fixes, then everything is fine. But living/growing/working codebases are not static. As the needs of the business evolve, and therefore so too does the codebase.

    The crux of this argument is the following: Any value in a unit test is lost as soon as that code is no longer used.

    Sounds obvious. And it is.

    That piece of code is likely to have (edge?) cases for bugs or requirements, where it has been deemed to be the most sensible place to make a change. There might be only a few tests or there might be many. If a refactor is introduced which no longer uses this code path then in theory those tests should be transferred their new home (or more likely – homes) in a dilligent way. This doesn’t work because the tests are built around the structure of the code, and not the functional requirements.

    Inevitably what happens is the bugs resurface. New tests are written. New code is changed. Old bugs are fixed. Again.

    Even if the change was made in the most diligent way and all tests were correctly transferred, this would be a completely unnecessary (and very slow) step if only the tests were written higher up.

    Where is “higher up”? For most of us writing web or native mobile apps with some kind of a back end service, that means:

    • tests at the API layer and
    • tests at the highest page/screen level visual component

    Mostly we can get by with these*.

    Now these higher level tests of course only work if they are:

    • Fast – they need to run pre-commit. Every commit
    • Solid- they need to run pre-commit. Every commit

    This basically means dependencies need to be local to the dev machine and ideally in-memory. There will always be cases where this isn’t possible, and I’m not solving for those here. But for most of us writing some SaaS or SaaS-backed app, with a relatively standard database, this is pretty easily achievable. If in-memory dependencies are not possible for databases, I’d got as far as to argue that running it in a container in the background on the dev machine is fine, as long as CI tests are run the same way. The point is we need to move quickly, in a practical way, not a perfect way.

    The process is straightforward: For each test, reset the data, populate what is needed, and test

    And it ONLY works in this way. This wouldn’t work if for example we used a backup of production data. There would be too many assumptions about the state of the data in there, and the tests would have to be too simplistic or general in order to avoid false negatives.

    On the flipside, once we have the mechanism where the data is populated in the test setup, we can always populate just enough data reproduce the state in order to test a very specific issue.

    Is this going to be slower than pure unit tests. Sure. So slow that it hurts our dev time? No chance.

    With all logic tested at a higher level, we are freed up in a way that encourages small ongoing refactoring, which naturally leads to a healthier codebase, which lends itself to moving faster (and breaking more things!). We can build a bit more freely knowing the functional requirement is still solid.

    Now, let’s be fair and talk about drawbacks. I don’t think there are to be honest. It could be argued that unit tests help “signpost” the bug at the relevant point where it is fixed. That they make it more obvious at the code being changed. This has some value, but not much…

    In a reasonably-sized codebase there are likely many places where a fix(/hack/workaround depending who you ask) can be implemented. Even if the code is notionally similar, unit tests won’t enforce the fix being in the “correct” location. Whether unit tests are there or not, that level of grouping would require familiarity with the codebase.

    One potential downfall is that there will be a LOT of tests in relatively FEW places. That means big test files. Likely this will manifest as a lot of sample-data-with-expected-outcome type tests. This potentially makes it harder to see the edge cases being tests while “reading code”. (<-nobody does that). It does mean some reasoning will have to be implemented when the test turns red in order to be clear on why it is broken. This is a real thing, but in practise I have found it to be very manageable.

    Controversial summary incoming: Testing at the top definitely does not encourage “good” architecture. Neither does it require it, nor does it care about it. The point is that whatever is happening under the hood is producing the correct results. Speed will follow.

    Thanks for reading this far.

    * In practise unit tests are great for calculation-heavy sections of the system e.g a price comparison algorithm, or a parser of some sort. I find these tend to work mainly because that code becomes pretty central because of it’s complexity, so is less likely to be made redundant. It still has the inherent problem of potentially being refactored out of existence.