My Testing Philosophy when Coding
When programming, what tests should we write?
Never Forget What Tests Should Do
A good test suite should help us more than it hurts us.
Tests help us when they give us confidence to refactor, by failing when we neglected to properly handle the same inputs that the old code was known to handle correctly. They help us by saving us time that we might otherwise spend manually testing those inputs.
Tests hurt us when they “flake” (fail sporadically and without good reason). Tests hurt us when the suite takes too long to run, wasting developer time and focus and slowing iteration speeds. Tests hurt us when they add undue friction to development. If tests fail every time we make an improvement to our product, then those tests are not testing the right things.
The absolute clearest “win” a test can provide is a guarantee that a particular bug will never present itself again. For this reason, it’s a good rule of thumb to expect that any bug fixes should come with an “initially failing test,” that is, a test that would fail if it were run before the fix was committed to the codebase.
On the other end, programmers should not write tests that test incidental effects or side-products of their code. For a web product, there is no value in testing that an element has the same class names on it that it did yesterday. What matters is that a button still looks like a button. Tests shouldn’t fail just because the CSS was refactored a bit, or because the auto-generated classes from a technology like CSS Modules or Fela have changed.
Other Thoughts on Testing
The Elm community has been excited about fuzz testing for a while now.1 The idea is that functions that expect to take a number (say) should be able to handle any number as input, so instead of hard-coding one or two example input numbers into the test, we could have the test framework throw a lot of random numbers at it and check that it doesn’t blow up. This makes even more sense with strings than with numbers. Either way, it’s a check on the programmer’s tendency to only write code that deals with the idealized inputs the programmer had in mind at the time.
It’s better to use a more reliable abstraction than to write a test. Again, the Elm community is leading here, with their emphasis on correctly modeling the problem with expressive types. This eliminates whole categories of errors at compile time. Why test when you can make a failure mode impossible instead?
Sometimes, linting can be a good alternative to writing tests. For example, we could write tests that assert that a certain component features an
<img> tag with a proper
alt attribute description, or we could just enable eslint-plugin-jsx-a11y and get this and many more checks across our whole codebase.
In front-end development, a good “style guide” or “component library” should include example usages that demonstrates all the different supported configurations and states of a component should be considered to be a form of testing. This can be quickly scrolled through to check that things still look and work as expected, which is more meaningful than attempting to test visual components with code, and more likely to catch issues. Automated tests should target behavior and functionality, not appearance.
Visual regression testing (pixel snapshotting & diffing) of relatively low-level components is a good mix of automation and visual checking.
Snapshot testing (asserting that the rendered HTML is unchanged) is of less value the higher level it runs on. If we are snapshot testing an entire form, and that form uses a submit button component, radio buttons, checkboxes, and a rich text editor, then when our co-worker updates any one of those components, now our snapshot test will fail. Is that necessary? Our form should still work just fine, assuming that we can trust that its sub-components still work. We don’t want our co-worker to now update our snapshot, do we? What value is there? It’s just friction.
Be skeptical of code coverage tools, which more or less encourage programmers to write a test for every line of non-test code they write, because these tools don’t take linting, type systems, or common sense into account.