Patching Production Drupal Sites With hook_update_N() is Risky

Submitted by david.stinemetze on Fri, 12/07/2018 @ 8:57am

I've been using Drupal in some capacity for a decade now. I'm always learning new things about all the new capabilities and features, yet I was slightly caught off guard the other day because I realized something rather important about a feature that has been around Drupal longer than I have.

hook_update_N()

Anyone who has done any sort of custom module development in Drupal is probably familiar with hook_update_N() functions. If you've only spent your time site building or theming then you may not have been exposed to them directly, but you've definitely seen the effects of them. Whenever you navigate to /update.php or run drush updb to perform module updates, you are presented with a list of all of the available updates. hook_update_N() functions are one of the ways Drupal is made aware of those updates, and were the only way prior to Drupal 8.

I won't spend time explaining Drupal's hook system, but I will explain the N portion of that function. N is merely a placeholder for some 4 digit number. The first digit aligns with the Drupal version the module works with (e.g. Drupal 8 would be 8###). The second digit aligns with the the major module release version (e.g. 8.x-2.x would be 82##), and the last two digits are just sequential counters. 

For example, the 8.x-1.25 version of the admin_toolbar module has a admin_toolbar_update_8001() function located in admin_toolbar.install. Those updates can be seen here:update.phpdrush updb

Once these updates are ran, Drupal stores the numeric value of the last update ran in the database for each module. This prevents Drupal from rerunning the same update twice. If the last update ran for a module was 8103, then it will only attempt to update if it sees a hook_update_N() function with a value of N greater than that. For instance, hook_update_8009() would not run, but hook_update_8104() would.

Patching Modules

Drupal has a very active community. The issue queues for core and contrib modules are ever-growing. Contributions to modules are typically made in the form of patches. Unfortunately the mere presence of a patch, doesn't mean project maintainers will ever use that code. The amount of time between when a patch is uploaded, code review is performed and the patch is merged is indeterminate. In fact, many patches never make it into the project, often for very good reasons. Yet business needs sometime necessitate some expedience on delivering the desired functionality a patch may provide. As a result, these patches get applied to production environments.

Many composer-based projects make use of cweagans/composer-patches to patch packages, including Acquia's Lightning Drupal Distribution.

Here's an example composer.json file from that project's documentation: 

{
  "require": {
    "cweagans/composer-patches": "~1.0",
    "drupal/drupal": "~8.2"
  },
  "config": {
    "preferred-install": "source"
  },
  "extra": {
    "patches": {
      "drupal/drupal": {
        "Add startup configuration for PHP server": "https://www.drupal.org/files/issues/add_a_startup-1543858-30.patch"
      }
    }
  }
}

This simple mechanism allows you to patch a module without giving much extra thought, but that mind-set can be problematic.

hook_update_N() + patching = risky

Imagine a module called ms_frizzle (My two-year-old loves watching the old Magic School Bus cartoons on Netflix, so yes, it is a relevant pop-culture reference, thank you very much).

The version you currently have installed has previously executed one hook_update_N() function:

/**
 * Surfing on a sound wave.
 */
function ms_frizzle_update_8001() {
  // This function has some code allowing you to surf on a sound wave through some sciencey magic.
}

At this point Drupal should have 8001 stored as the schema version for this module.

Now suppose you apply a patch that adds a second hook_update_N() function, so your code looks something like this:

/**
 * Surfing on a sound wave.
 */
function ms_frizzle_update_8001() {
  // This function has some code allowing you to surf on a sound wave through some sciencey magic.
}

/**
 * Take your second right past Mars.
 */
function ms_frizzle_update_8002() {
  // Go past mars, skip a street and turn right. If you've hit Jupiter, you've gone too far.
}

This will update this module's schema version to 8002

Now imagine, a few weeks later, the module maintainers update the module again with completely unrelated code. The module maintainers are completely unaware that you've applied this patch to your system so they end up using the same hook_update_N() function name, but it does something completely different.

/**
 * Surfing on a sound wave.
 */
function ms_frizzle_update_8001() {
  // This function has some code allowing you to surf on a sound wave through some sciencey magic.
}

/**
 * Swinging through the stars.
 */
function ms_frizzle_update_8002() {
  // This function has some code swings through the stars with a bunch of grade school students.
}

So hopefully you see the issue. 8002 has already ran so it won't ever run the new code. You may think, "Wait! This would cause a merge conflict, so we'd catch it before the update hooks even run." But consider what typically happens in this type of scenario: The patch gets rerolled. That means our original 8002 hook now becomes 8003.

/**
 * Surfing on a sound wave.
 */
function ms_frizzle_update_8001() {
  // This function has some code allowing you to surf on a sound wave through some sciencey magic.
}

/**
 * Swinging through the stars.
 */
function ms_frizzle_update_8002() {
  // This function has some code swings through the stars with a bunch of grade school students.
}

/**
 * Take your second right past Mars.
 */
function ms_frizzle_update_8003() {
  // Go past mars, skip a street and turn right. If you've hit Jupiter, you've gone too far.
}

What this means is 8003 will then run, but we've actually already ran that code. 

So not only have we missed swinging through the stars, but we've taken the second right past Mars twice.

For good measure, suppose this happens again (mainly because I just need to finish this verse of the theme song with the correct lyrical sequence).

You'd end up with: 

/**
 * Surfing on a sound wave.
 */
function ms_frizzle_update_8001() {
  // This function has some code allowing you to surf on a sound wave through some sciencey magic.
}

/**
 * Swinging through the stars.
 */
function ms_frizzle_update_8002() {
  // This function has some code swings through the stars with a bunch of grade school students.
}

/**
 * Take a left at your intestine.
 */
function ms_frizzle_update_8003() {
  // Whoa, see that intestine right there? The school bus needs to turn left.
}

/**
 * Take your second right past Mars.
 */
function ms_frizzle_update_8004() {
  // Go past mars, skip a street and turn right. If you've hit Jupiter, you've gone too far.
}


So what we wanted to happen, and what actually happened are two different things: 

Desired behavior

  1. Surfing on a sound wave.
  2. Swinging through the stars.
  3. Take a left at your intestine.
  4. Take your second right past Mars.

Actual behavior

  1. Surfing on a sound wave.
  2. Take your second right past Mars.
  3. Take your second right past Mars.
  4. Take your second right past Mars.

The Magic School Bus is not where it is supposed to be!

So What Do We Do?

1. Vet the patch

If you find yourself in this scenario, the first question should always be, "Do I really need this patch?" Don't just blindly apply patches. Look at them and see what they do. This doesn't just apply to patches containing hook_update_N() functions. Patches should always be properly vetted before getting applied. Only apply a patch if you're sure you can trust it.

2. Rewrite the patch

Drupal 8 introduced a separate hook for performing updates: hook_post_update_NAME(). It is useful because it allows you to define hooks using alphanumeric characters. This can help reduce the risk of defining the same update hook twice across multiple issues. Look again at this screenshot from above:

drush updb

As you can see here, while admin_toolbar is using hook_update_N(), the node and system modules are using hook_post_update_NAME() functions.

I like to take things a step further and add Drupal issue numbers or JIRA tickets as part of the function name to provide a unique namespace (just be careful as execution sequence is alphanumeric).

For example: 

/**
 * LIZ-994: Navigate a Nostril.
 */
function ms_frizzle_post_update_liz_994_navigate_a_nostril() {
}

/**
 * Drupal.org Issue #2937227: Raft a River of Lava.
 */
function ms_frizzle_post_update_2937227_raft_a_river_of_lava() {
}

hook_post_update_NAME() is a nice and useful alternative to hook_update_N() in many cases, but it can't be used all the time. hook_update_N() should still be used if the data model is changing. That means you may not be able to always do this.

3. Relocate patched update hooks to a custom module

If you absolutely decide you need that hook_update_N() function, you could strip it from the patch and relocate the functionality into your own custom module. However, you need to ensure that you know what will happen if the original patch, or some derivative version of it, is eventually merged into a future version of the module. That code, or competing code, could end up getting executed twice, which could have adverse effects.

4. Investigate existing patches

If you are already patching either core or contrib modules on your site, you should investigate your current patches. To help with this, I took this opportunity to build a small little composer plugin called widgetsburritos/drupal-patch-checker (PRs welcome!). It looks through patch files for references to new hook_update_N() functions.

Install it by running:

composer require --dev widgetsburritos/drupal-patch-checker

Then add the following to the your composer.json file:

    "scripts": {
        "check:patch": [
              "WidgetsBurritos\\DrupalPatchChecker\\DrupalPatchChecker::checkComposerFile"
         ],
    }


This allows you to investigate your patches at any time by running the following command: 

composer run check:patch

Example output:

$ composer run check:patch
> WidgetsBurritos\DrupalPatchChecker\DrupalPatchChecker::checkComposerFile
Script WidgetsBurritos\DrupalPatchChecker\DrupalPatchChecker::checkComposerFile handling the check:patch event terminated with an exception

  [Exception]                                                                                                              
  patches/language_hierarchy/language_hierarchy-limit_views_results-2825851-14.patch contains hook_update_N() on Line 50.  

If you find that you're currently running any patches that contain a hook_update_N() function, evaluate your options. Take a look at any changes that may have been made to the module and update accordingly.  

This can also be used with CI tools like Jenkins or Travis.

If you want to take things a step further and prevent patches containing hook_update_N() from getting added to your project altogether, modify your composer.json file one more time: 

    "scripts": {
        "check:patch": [
              "WidgetsBurritos\\DrupalPatchChecker\\DrupalPatchChecker::checkComposerFile"
         ],
         "post-install-cmd": [
              "composer run check:patch"
         ],
         "post-update-cmd": [
              "composer run check:patch"
         ],
    }

This will cause composer install and composer update to fail if it detects invalid patches.

5. Help project maintainers

The best solution is to not patch modules at all, but rather continue to get new functionality into the modules themselves. Project maintainers are busy people. They can't review every patch right away. Help by thoroughly testing and reviewing patches. If a module has gone stale, consider trying to assume ownership of the project. Drupal's community is great, and it's incumbent upon all of us to keep it that way.

Have you experienced this type of issue before? How did you tackle it? Got any better ideas on how to solve for it? Leave your thoughts below.