Upgrade procedure

Upgrades use helm upgrade against the same release name and namespace you installed with. The chart's migration Job runs first as a Helm pre-upgrade hook; if it fails, the upgrade aborts and the old deployment keeps running. There is no half-upgraded state.


When to upgrade

  • Build expiration approaching. Each image carries a 45-day build expiration; the application warns at 14 days and refuses to start past expiration. Plan upgrades on a cadence comfortably inside that window — typically every 30 days.
  • Security patches. Vendor releases out-of-band when a CVE affects a shipped component. Subscribe to release notifications.
  • Feature releases. Quarterly minor versions bring new functionality and chart values; check the release notes for values you may want to set.

Pre-upgrade checklist

Before running helm upgrade:

  • [ ] Read the release notes for every version between your current and target. Pay attention to:
    • Required values.yaml changes (new required fields, renamed fields, deprecations).
    • Database migration scope (any long-running migrations? data backfills?).
    • Feature flags that change default behavior.
  • [ ] Back up the database. Run a pg_dump (or use your postgres operator's backup mechanism) and verify the dump is restorable. The migration Job is irreversible without a backup.
  • [ ] Back up Active Storage if the release notes call out changes to upload handling.
  • [ ] Capture current state. helm get values <release> -n <ns> and save a copy of the actual running values. Diff against your stored values.yaml to catch drift.
  • [ ] Confirm cluster capacity. A rolling upgrade temporarily doubles the application pod count; verify your namespace ResourceQuota allows it.
  • [ ] Plan a maintenance window if migrations are heavy. The release notes call out migrations expected to take longer than ~30 seconds.

Upgrade procedure

From a published Helm repository

helm repo update
helm upgrade <release-name> <vendor>/<chart-name> \
  --namespace <namespace> \
  --values values.yaml \
  --version <new-chart-version>

From a chart tarball

helm upgrade <release-name> <chart-name>-<new-version>.tgz \
  --namespace <namespace> \
  --values values.yaml

What happens during upgrade

  1. Helm renders the new templates against your values.yaml.
  2. The migration Job runs first (Helm pre-upgrade hook).
    • On success: Helm continues to step 3.
    • On failure: Helm aborts. The running deployment is unchanged. You have time to investigate without an outage. See "Migration failure recovery" below.
  3. Application Deployment rolls out the new image. By default, Kubernetes uses RollingUpdate strategy — one new pod comes up ready before an old pod is terminated, so there's no downtime for multi-replica deployments. Single-replica deployments have a brief (~30s) outage during the rollout.
  4. PostgreSQL / Redis subcharts (Bundled mode) upgrade only if their chart version changed in your values.yaml. Check release notes for postgres major-version upgrades, which may require manual data migration.

A successful upgrade typically completes in 2–10 minutes; long migrations can stretch longer.

What persists across an upgrade

State Behavior on upgrade
Database content (users, content, audit history, license assignments) Preserved. Migrations are additive.
Active Storage uploads Preserved (PVC has helm.sh/resource-policy: keep).
White-label brand name, logo, banner text, use-agreement, logging mode Preserved if the admin saved them via /admin/site_settings/edit. The brand-name field shows an "admin-locked" badge once saved — that lock survives upgrades and prevents future builds from overwriting your customization. If the brand still shows "build-provisioned," the next upgrade carrying a different build APP_NAME will change it; click Save once on the form to lock it.
License assignments Preserved. The 14-day minimum-hold floor is enforced across upgrades — locked assignments stay locked at their original assigned_until timestamp.
values.yaml Preserved by Helm (--reuse-values if you don't pass new ones). Always re-read the release-notes diff between versions before passing fresh --values.

Verifying the upgrade

# Helm release reports the new revision
helm list -n <namespace>

# Application reports the new build SHA at the bottom of the page
# in the running UI, or via:
oc get deploy/<release-name> -n <namespace> \
  -o jsonpath='{.spec.template.spec.containers[0].image}'

# Migration Job for this revision succeeded
oc get jobs -n <namespace>

# Pods are running the new image
oc get pods -n <namespace> -o wide

Sign in to the application and confirm the UI loads without errors. Smoke-test the workflows your team relies on.


Rollback

Rolling back the chart

helm rollback <release-name> <previous-revision> -n <namespace>

helm history <release-name> -n <namespace> shows the revision numbers.

Important: helm rollback reverts the chart manifests and re-applies the previous image. It does not roll back the database. If the upgrade ran a migration that altered schema or data, the previous image may not be able to run against the new schema.

Rolling back including the database

If the migration was destructive (e.g., dropped a column the old image still reads), you need to:

  1. helm rollback to the previous chart revision.
  2. Restore the database from the backup taken in the pre-upgrade checklist.
  3. Verify application starts and reads data correctly.

This is a manual procedure with downtime; release notes will explicitly call out when an upgrade is "rollback-safe" (no backwards-incompatible migrations) vs "forward-only" (rollback requires database restore).


Migration failure recovery

When the pre-upgrade migration Job fails, Helm aborts and the running deployment is unchanged. The failed Job remains for inspection.

Investigating

# Find the failed Job
oc get jobs -n <namespace>

# Read its logs
oc logs job/<release-name>-migrate -n <namespace>

# Read the description for events
oc describe job/<release-name>-migrate -n <namespace>

Common causes:

  • Insufficient database privileges. The migration Job runs as the application's database user; if the new release requires schema changes (CREATE TABLE, ALTER TABLE), the user must have those privileges.
  • Lock timeout. A heavy concurrent query held a lock the migration needed. Retry after the contending workload completes.
  • Validation failure. Migration tried to enforce a constraint on existing data that doesn't satisfy it. Release notes will call out required pre-migration data cleanup.

Retrying

  1. Fix the root cause (grant privileges, wait out the lock, clean up data).
  2. Delete the failed Job: oc delete job/<release-name>-migrate -n <namespace>
  3. Re-run the upgrade: helm upgrade ...

The Job is a Helm hook with hook-delete-policy: before-hook-creation,hook-succeeded, so deletion before retry is the expected workflow.


Schema-dependent code in releases

A release's image bundles two things together: the new code and the new migration files. The chart sequences them so migrations run before the deployment rolls the new image (see "What happens during upgrade" above) — the pre-upgrade hook is what makes this work. There are two failure modes worth understanding so you don't accidentally take down a deployment that would otherwise be fine:

Initializer that reads a not-yet-migrated column

If a Rails initializer (anything in config/initializers/) reads a column that the migration in the same release is about to add, the new image's Rails.env boot will NoMethodError on the missing column. With pre-upgrade migrations this isn't a problem — the column exists by the time the deployment rolls — but the same code running in a bin/rails runner or any out-of-band invocation against an unmigrated DB will crash. The host's initializers guard against this with a column-presence check:

unless SiteSetting.column_names.include?("app_name_source")
  Rails.logger.info "[SiteSetting] column not present yet; deferring"
  next
end

The pattern: any initializer touching a recently-added column should gate on column_names.include?(...) so a Rails boot against an older schema (asset precompile, debug pod, manual rails console against a restored backup) doesn't crash. The standard rescue ActiveRecord::StatementInvalid / NoDatabaseError chain catches the missing-table case but not the missing-column-on-existing-table case — that surfaces as plain NoMethodError on the AR record getter.

Destructive migrations during a rolling upgrade

pre-upgrade migrations run while the old application pods are still serving traffic. Additive changes (new column, new table, new index) are safe — the old code simply ignores the new fields. But a destructive migration (drop column, rename column, change column type, add a NOT NULL constraint to existing data) can break the old pods mid-upgrade.

The canonical safe pattern for destructive schema changes is the two-deploy approach:

  1. Ship a release that stops reading the column (or starts writing the new shape and reads either old or new). Deploy. Verify all pods are running this code.
  2. Ship a release with the migration that drops/renames/retypes the column. Now no running code references the dropped shape.

Release notes call out destructive migrations explicitly. If you see a "this release contains a destructive schema change" line, plan for the two-deploy sequence rather than a single helm upgrade.


Upgrading between modes

See two-modes.md §"Migrating from Bundled to Production". This is a substantive data-migration exercise, not a routine upgrade — plan it as a project rather than a helm upgrade.


Version-specific upgrade notes

Release-specific notes — breaking changes, renamed values, required data cleanups — accumulate here as the chart evolves. Always read the entries between your current chart version and the target before running helm upgrade.

(No version-specific notes yet — current chart line is 0.1.x.)