• Blog >
  • Fix it quick then fix it right

I write bad code. So do you. So do our colleagues. Whether you realise it yet or not, the project you’re working on right now contains bad code. This isn’t as terrible as it first seems. The code behind a site doesn’t have to be perfect to provide value. In fact, the bad code might never cause a problem. It might only become a problem when new functionality is required later and the way something has been done makes it more difficult to add the required change. Situations like this are ripe for refactoring.

Wait, what’s refactoring?

For our purposes, refactoring code means to change the code in some way without changing what it does (developers like arguing about the exact definition). When refactoring code as part of a bug fix or new functionality, it is much better to do the refactoring changes, check everything still works and then to make the functionality change. Any change to code has the potential to introduce a new bug and separating refactoring from other changes makes it much easier to identify the source of new bugs.

Hang on, if the functionality doesn’t change why bother?

Good question, voice in my head. Removing unnecessary complexity, making variable or function names clearer or simply fixing coding standards all provide benefits the next time someone has to work on that code. These common bad practices are known as code smells. Various resources, such as Sourcemaking, provide lists of code smells with descriptions, common causes and the benefits of fixing them.

Why are you telling me all this?

Jeez, impatient much? Fine, I’ll get to the point. Nothing I’ve said so far is groundbreaking. However, I often use the principles of refactoring before the code I’m writing gets anywhere near the live site or even the QA/development environment.

The hardest part of fixing bugs can be finding the exact source of the problem. My first reaction to a bug description can often be “There’s no way that could happen”. Except, well, whatever it is is happening so clearly my understanding of the system is flawed. In this situation it can be quite difficult to work out where to start so I turn to my trusty friend xdebug.

I’m not going to explain how to use xdebug (this blog post needs to finish at some point) but it allows you to step through parts of the code line by line and to check values at specific moments during the execution of the system. However, it is safe to say that the place where you first notice something wrong will not be the right place to fix it. This is especially true with complex frameworks like Drupal or Magento. After all, “Never hack core” is a common refrain in Drupal for a reason.

However, it is a valid approach to hardcode a change to some problematic code there and then to prove that it does fix the problem. At this point, a bad developer would call it a day, commit the change to their version control system (or push it straight to live if they’re really bad) and call it a day.

We’re not bad developers though so we want to work back from the quick, hacky fix to the right fix. It’s at this point that I would suggest reframing the problem as a refactoring. Immediately, I find this relaxing. I’m no longer trying to fix something that’s broken. I’m just making it better. This is a far less urgent which allows adrenaline to drop and makes it easier for me to look at the bigger picture.

Can you explain that with a hypothetical example?

It’s like you read my mind. Let’s imagine a Drupal site that has a custom content locking system to only allows one user to edit a node at once. However, QA are reporting that multiple users can edit a node at the same time.

Suppose we’ve not worked on this site for 12 months and, from a quick look at the codebase, we can’t find a module that seems obviously related to content locking so we break out xdebug. Stepping through the code we find that when the node is loaded a custom property $node->is_lckd exists but is set to NULL. We check the database and there’s a custom column on the node table called is_lckd and, for this node, it is set to TRUE. Aha! We have found a problem. Perhaps if the property was correct then the rest of the locking process would kick in and everything would work again.

The most hacky but quickest fix would be to do add a new implementation of hook_node_load() that was something like:

function project_util_node_load(&$nodes, $types) {
 foreach ($nodes as &$node) {
   $is_locked = db_query("SELECT is_lckd FROM {node} WHERE nid =
:nid", array(':nid' => $node->nid));
   $node->is_lckd = reset($is_locked->fetchCol())
 }
}

We clear caches, load the page, edit the content and excellent, everything still runs. We test another user in a different browser and they can’t edit the content. Job done. High fives all round.

Well, hang on. We’re better than this, remember? However, we now know that if the worst happened we could hotfix the bug while finding a better fix. If the locking functionality was business critical this we could indeed commit this fix while we worked on it. However, it should be considered a priority to find out the root cause rather than it becoming a separate ticket to go into a backlog that will never be prioritised.

So, deep breath, back to the problem. Arbitrary functions in a utility module normally mean that the functions could live somewhere better. This is similar to The Blob code smell applied to functions in files/modules rather than methods in classes. In fact, looking at the code we’ve written, it feels like the module that implements the locking should be doing this already.

Now that we know there’s a custom property involved and we know the name of it, we should be able to find the code that was meant to be doing this in the first place. Turns out the original developer had used the abbreviation lckd for locked which is why we couldn’t find the code from looking at the codebase. Within the module content_lckd we find a function suspiciously similar to ours but where they had mixed the uses of is_lckd and is_locked so that the value was always NULL.

We can safely delete our code, fix this code and, while we’re at it, refactor the module to use locked for every usage of locked (remembering an update hook and schema changes for the database) so that similar problems are easier to find next time. We quietly ignore the fact that the developer who committed the original code was us and tell everyone we fixed the problem.

That was a very convenient example.

I know, I know. To be honest, even with the abbreviated variable name we’d have probably found the module at the start (or asked someone who knew the project better). Examples in blog posts always end up being perfect for illustrating the point at hand and less good at representing the real world. Hopefully, it helped to show that fixing the problem in a crude way first can give us the mental space to step back and find a better solution.

● ● ●