An Eager Approach to Refactoring
Published: 2024-09-03
Updated: 2024-09-03
Any software engineer with more than a couple years of experience has stumbled upon spaghetti code, a Frakenstein or God class, or some neglected piece of code that is overburdened by years of added functionality that engineers heaped upon it without care. I am talking about code that wasn't designed, but grew organically, and has become something that has lost all coherence with itself to the point where it needs an overhauling refactor.
Likewise, any software engineer who has encountered this sort of code has had to wrestle with what to do about it or whether to do something about it, and reaching the right conclusions in that critical moment will have long lasting and sometimes unforeseen consequences.
Reasoning your way out of the various logical traps that might dissuade you from making needed refactors is half the battle. Not refactoring is usually the quickest path, but I believe it is usually the wrong one. Unfortunately, engineers are often under heavy time constraints to get tasks done, and refactoring is rarely a task that gets prioritized. Even if refactoring work was prioritized more, it requires a more advanced skill-set and level of experience to accomplish safely than the average junior engineer possesses. That means that it demands attention from the engineers who already have the greatest demands on their time, and those engineers are generally more risk averse because they have more to be responsible for. Nevertheless, all of these reasons to not make needed changes to poorly written code don't really stand up in the face of one undeniable fact: bad code only gets worse over time.
This post makes the case that as software engineers, we should take a more eager approach to refactoring. It also addresses some of the most common reasons we convince ourselves not to refactor bad code, namely time and risk, and suggests ways we can mitigate those costs to make doing the needed refactors immediately a no-brainer.
The Reasons we Don't Refactor
Following the Well Trodden Path
Less experienced engineers may recognize the problem of poorly architected or poorly written code when they come across it, but then proceed to make it worse in order to accomplish the task they have been assigned. After all, it was done like that before so it must be an accepted practice, right? This viewpoint is understandable up to a point. Engineers like to follow patterns, even bad ones, because they are well trodden paths that one can reasonably assume are safe (until one day they bring production down due to some obscure corner case). In the face of complexity, and when we lack the confidence or clarity to make the necessary design decisions, we often fall back on this habit to guide us.
Refactoring Takes Time
More experienced engineers may see the problem and know that it really should be taken care of, but tell themselves they don't have the time. That's also understandable - lack of time is probably how the problem started, and it's usually how the problem continues to go unaddressed. This excuse (because that's what it boils down to), often makes the engineer do the exact same thing the junior engineer would do (just add to the problem instead of fixing it), in which case the greater experience and technical ability of the senior engineer brings absolutely no benefit. This begs the question: if a senior engineer isn't being paid a higher salary to apply their greater experience and capability to situations like these, then what are they being paid more for? I know this sounds overly harsh, and of course a senior engineer's salary isn't only justified by their ability to identify and refactor bad code, but this is a scenario I would expect a senior engineer's greater technical ability to really make an impact, and every time that doesn't happen is a missed opportunity.
As for the lack of time, well that to me suggests wider problems pertaining to the business and how work is prioritized. If an organization doesn't prioritize maintenance and code quality improvements, then there is probably a lack of communication between those who can identify problems that need to be fixed and those who make decisions about what work gets done. I am not going to address the organizational problems that may prevent us from making needed refactors - that's a bigger topic. But I have found that the communication breakdown often lies with the individual engineer who is in the code. We as engineers often have a lot more sway than we realize, especially when dealing with people in other departments who rely on us to enact whatever plan or strategy they have concocted for the business.
Experienced engineers should be vocal about the health of critical software the business relies on. The business ignores us at its own peril. Whenever I think about this role that engineers should play in their organization, I remember the story told by Mathew Heusser in the Forward of Robert C. Martin's book, The Clean Coder, in which he relates how a product manager he was working with would happily pressure the development team to get things done faster, but refused to pressure the legal team because he considered them to be "professionals". The lack of credibility development teams have in the eyes of the rest of their organization can be a big obstacle when it comes to getting maintenance and code improvement work prioritized. That credibility can be rebuilt, in part, by not folding under pressure when priorities from other departments are pushed at the expense of needed maintenance. Part of being "professional" as an engineer is to trust your knowledge and experience of how software must be maintained, and bring those assets to bare to help the organization make good choices. Non-engineering types don't have the full picture, and the software development process involves many, many nuances that only experienced engineers can grasp. It's our job to help our organizations understand the consequences of their decisions from a technical standpoint, including the long term consequences of not refactoring code that is in need of a refactor.
Refactors are Risky
The last reason that we often don't engage in needed refactors is the risk involved. If the messy code is critical to the wider system or application, it can be hair raising to make big changes to it, and so we tend to error on the side of caution by avoiding those changes. This mindset, however, seems a bit backwards to me. Messy code that might be risky to refactor will usually become more so over time, so, to the risk avoidant, shouldn't it make more sense to fix the problem sooner rather than later? And I am by no means advocating for a cavalier or reckless approach to making changes to improve code quality - far from it. Any refactor that I might advocate for that would change mission critical code would be backed by thorough tests and carried out with extreme caution. Any amount of risk can be mitigated, which is why risk shouldn't be something that holds us back from making needed changes.
The risk involved with refactoring code is also why it takes experienced, savvy engineers to do this work. But more experienced engineers can sometimes be more risk averse, which again can have the unfortunate effect of making their experience and greater technical ability go to waste. Who better to manage the risk involved with refactoring mission critical code than a senior engineer who knows how to test thoroughly and ensure the changes don't break things? If a senior engineer takes the same path as a junior engineer, working around code that needs refactoring instead of performing the needed refactor because they are afraid of the risks, then are they truly living up to the "senior" part of their title? Again I know that sounds harsh, and context matters - not every decision to leave bad code unchanged is the wrong one - but this is what I believe it means to be "professional" as a software engineer when it comes to risk. We should manage risk like professionals, not run away from it or use it as an excuse to not do what needs doing.
The Costs of Not Refactoring
Refactoring Becomes Impossible
Have you ever worked in code that literally couldn't be refactored due to its behavior depending on it's poor structure? I have seen this happen when conditional logic is overused to the point where the mere order of conditional expressions dictates control flow, rather than the conditions themselves, making consolidation or separation of concerns impossible. This situation negatively impacted the organization I was a part of when I couldn't implement new features in a timely manner, or at all, because the code I needed to change was literally unchangeable due to its fragility and obscure logic. Knowing what I know now (many years later), there is a slim chance I would be able to isolate every behavior that would need to be preserved and cary out an overhauling refactor, but it would be an extremely painful and arduous task. It would have been far, far better for someone to take the initiative and implement a good, scalable pattern for that code way before it got to that state.
Intended Behavior Becomes a Mystery
Have you ever followed the logic of spaghetti code and found that half the possible outcomes were completely accidental (or seemed that way), making it impossible to know what behavior to preserve and what can be removed? This can happen when code is reused inappropriately and separation of concerns is lost, and when niche cases are injected carelessly and not properly isolated. This can lead to similar consequences where the code can't be changed, because nobody knows what behavior is actually being used. I encountered this scenario as well, and the only way I could determine what behaviors needed to be preserved was to analyze logs to see what permutations of initial inputs were ever passed so that I could safely disregard permutations that were handled by accident. Had I simply tacked on the new behavior the business was asking for without doing the necessary refactor, the problem would have only gotten worse for my future self and other engineers who would need to make further changes. Eventually, the original intent of the code would become so opaque to the uninitiated that the code itself couldn't even tell them what it was for, and our sadly non-existent documentation for this particular application wasn't going to come to the rescue.
Time is Repeatedly Wasted Trying to Debug and Understand Bad Code
I am sure you can remember coming across a piece of poorly written code that you have had to look at before and not remembering how it worked, so you find yourself diving into the rabbit hole once more to understand it. Sometimes even well-written code can be like that because the problems we solve can be complicated and don't always have simple solutions. But poorly written code is guaranteed to be like that even for simple problems that do have simple solutions. And if you have wasted time rediscovering what some gnarly mess of code does, imagine how many people before you have wasted that time, and how many people after you will waste that time if you don't take the initiative and fix it. Now imagine how much that time is costing the business, and think about how to articulate that to stakeholders.
Mitigating Risk and Saving Time
Building a Thorough Testing Harness
The nice thing about refactoring is that you should already have a baseline of behavior. Your number one priority during a refactoring project should be to preserve the functionality the code already provides. The only exception to this is any behavior that you can prove beyond doubt isn't being used (this is something that can happen a lot when code is written poorly).
Establishing the baseline behavior of your code takes a lot of analysis. Use tests to verify that your code adheres to that baseline as you refactor. How effective your tests are at doing this is entirely dependent on how thorough they are. Use every possible parameter and write test cases for every possible path your code may take. Only by doing this will your tests provide the assurance you need while making the needed and often drastic changes to your code. Isolating every possible logical path will also help you know what to look for when trying to identify code paths that are entirely unused and can be removed. This step is a lot of work, but absolutely critical. Depending on the quality of tests you already have in place, it can easily take up the majority of the time you spend on the refactoring project.
As you analyze the code you plan to refactor and ensure that you have a complete testing harness around it, look for behavior that you may not need. If you have decent logs from your code running in production, use them to determine which behavior is actually used. When code gets messy, there can be many branching pathways that come into existence by accident. By getting clarity on exactly what behavior is intentional, you can save time and energy that would otherwise be spent trying to preserve functionality that isn't used.
Releasing Changes Strategically
Refactoring can and should take time, which is why it is hard to prioritize and make time for. Complicated refactors shouldn't be rushed, especially during the implementation and testing phases. Flip side of this, however, is that time can and should be saved in the review and release steps of the process.
Refactors that are especially difficult or complicated are best approached incrementally and iteratively during implementation. When it comes to code review and deployment, however, it can save time and limit risk to condense your changes into fewer pull requests and fewer code releases and deployments. This is a scenario where I actually favor making larger pull requests rather than following the common advice to keep them small. You should still plan and cary out your work in small increments, but present it for review and release it in a more wholistic way where the improvements you are bringing to the code are clear and obvious to the reviewer rather than half-baked.
Half-done refactors often don't look great in code-review. Refactoring messy code often involves making repeated rounds of small changes, and when those changes are viewed by themselves without the context of the larger on-going process they can appear aimless and random. Only when the refactor is complete does the full picture come together and the drastic improvements being made become self-evident. This is why making many pull requests with incremental changes can be a poor use of time for both you and reviewers.
Releasing incremental changes to production as part of an ongoing refactor is problematic for similar reasons. Not only does it cost time to release code multiple times instead of just once (depending on how streamlined your release and quality assurance processes are), but releasing code that is in a transitionary state where the intended design is not fully realized could be more risky than finalizing that design before releasing any changes.
Each pull request requires a certain amount of overhead in terms of releasing, quality assurance, risk, etc. Especially when it comes to quality assurance, it can be better to focus your time and energy on the final product - what will actually live in production long term - than to spend time and energy testing incremental stages of the refactor. With each step of the refactoring process that introduces new changes, all bets are off in terms of validating that the code does what it should. Better then, to do most the testing and validation at the final stage when it will mean the most.
When to Refactor
So far, I have been speaking as if we should always be refactoring bad code as soon as we become aware of it. But in reality, we have other priorities to manage. The approach I advocate for, then, is to build into your planning the flexibility that allows you to stop and implement a needed refactor if you would otherwise be making the code worse in order to accomplish your assigned tasks. In other words, we should always leave the code better than we found it, or at the very least, no worse than it was when we found it. Sometimes that means changing our immediate plans, as we realize that a refactor is necessary before we can make progress. This should be the default expectation - just part of doing our job as professionals - and whatever systems or processes we have around us should be flexible enough to handle it.
This approach requires a high degree of trust for engineers as well as professionalism on their part, meaning the engineers need to earn the credibility required to be convincing when they explain to stake holders why a task is taking a bit longer than it otherwise would. Engineers need to have the soft-skills necessary to articulate the benefits of carrying out a refactor before implementing other changes the business wants, and explain why the the refactor should be prioritized as a prerequisite for those changes. Software engineers are masters of their domain and stewards over the code they spend every day working with, so they should act that way.
If soft-skills and professionalism fail to create the needed space to refactor bad code that will ultimately harm the business, and finding another company to work for isn't an option (not even joking - poor engineering culture shouldn't be tolerated by self-respecting software engineers), then you can always pad your time estimates and do refactors on the down-low. Obviously this isn't ideal for anyone, but at least it keeps your conscience clean.
Conclusion
Clearly there will be time to make exceptions to the rule, and engineers should be able to consider all aspects of the organization's interests, but as professionals we must raise concerns that nobody else can raise, and solve the problems that nobody else can solve. That means we should refactor bad code eagerly and stop making excuses. If we don't, our organizations will operate with incomplete information and a lower quality of code that can be end up being the difference between overall success and overall failure, or perhaps greatness and mediocrity. Many businesses have failed due the inability to innovate and produce quality software products in a timely manner - problems that refactoring poorly written code can go a long way to mitigate.
So to all my fellow software engineers, go forth and refactor! Thanks for reading.