Upgrade procedure
Upgrades use helm upgrade against the same release name and
namespace you installed with. The chart's migration Job runs first
as a Helm pre-upgrade hook; if it fails, the upgrade aborts and
the old deployment keeps running. There is no half-upgraded state.
When to upgrade
- Build expiration approaching. Each image carries a 45-day build expiration; the application warns at 14 days and refuses to start past expiration. Plan upgrades on a cadence comfortably inside that window — typically every 30 days.
- Security patches. Vendor releases out-of-band when a CVE affects a shipped component. Subscribe to release notifications.
- Feature releases. Quarterly minor versions bring new functionality and chart values; check the release notes for values you may want to set.
Pre-upgrade checklist
Before running helm upgrade:
- [ ] Read the release notes for every version between your
current and target. Pay attention to:
- Required
values.yamlchanges (new required fields, renamed fields, deprecations). - Database migration scope (any long-running migrations? data backfills?).
- Feature flags that change default behavior.
- Required
- [ ] Back up the database. Run a
pg_dump(or use your postgres operator's backup mechanism) and verify the dump is restorable. The migration Job is irreversible without a backup. - [ ] Back up Active Storage if the release notes call out changes to upload handling.
- [ ] Capture current state.
helm get values <release> -n <ns>and save a copy of the actual running values. Diff against your storedvalues.yamlto catch drift. - [ ] Confirm cluster capacity. A rolling upgrade temporarily doubles the application pod count; verify your namespace ResourceQuota allows it.
- [ ] Plan a maintenance window if migrations are heavy. The release notes call out migrations expected to take longer than ~30 seconds.
Upgrade procedure
From a published Helm repository
helm repo update
helm upgrade <release-name> <vendor>/<chart-name> \
--namespace <namespace> \
--values values.yaml \
--version <new-chart-version>
From a chart tarball
helm upgrade <release-name> <chart-name>-<new-version>.tgz \
--namespace <namespace> \
--values values.yaml
What happens during upgrade
- Helm renders the new templates against your
values.yaml. - The migration Job runs first (Helm
pre-upgradehook).- On success: Helm continues to step 3.
- On failure: Helm aborts. The running deployment is unchanged. You have time to investigate without an outage. See "Migration failure recovery" below.
- Application Deployment rolls out the new image. By default,
Kubernetes uses
RollingUpdatestrategy — one new pod comes up ready before an old pod is terminated, so there's no downtime for multi-replica deployments. Single-replica deployments have a brief (~30s) outage during the rollout. - PostgreSQL / Redis subcharts (Bundled mode) upgrade only if their
chart version changed in your
values.yaml. Check release notes for postgres major-version upgrades, which may require manual data migration.
A successful upgrade typically completes in 2–10 minutes; long migrations can stretch longer.
What persists across an upgrade
| State | Behavior on upgrade |
|---|---|
| Database content (users, content, audit history, license assignments) | Preserved. Migrations are additive. |
| Active Storage uploads | Preserved (PVC has helm.sh/resource-policy: keep). |
| White-label brand name, logo, banner text, use-agreement, logging mode | Preserved if the admin saved them via /admin/site_settings/edit. The brand-name field shows an "admin-locked" badge once saved — that lock survives upgrades and prevents future builds from overwriting your customization. If the brand still shows "build-provisioned," the next upgrade carrying a different build APP_NAME will change it; click Save once on the form to lock it. |
| License assignments | Preserved. The 14-day minimum-hold floor is enforced across upgrades — locked assignments stay locked at their original assigned_until timestamp. |
values.yaml |
Preserved by Helm (--reuse-values if you don't pass new ones). Always re-read the release-notes diff between versions before passing fresh --values. |
Verifying the upgrade
# Helm release reports the new revision
helm list -n <namespace>
# Application reports the new build SHA at the bottom of the page
# in the running UI, or via:
oc get deploy/<release-name> -n <namespace> \
-o jsonpath='{.spec.template.spec.containers[0].image}'
# Migration Job for this revision succeeded
oc get jobs -n <namespace>
# Pods are running the new image
oc get pods -n <namespace> -o wide
Sign in to the application and confirm the UI loads without errors. Smoke-test the workflows your team relies on.
Rollback
Rolling back the chart
helm rollback <release-name> <previous-revision> -n <namespace>
helm history <release-name> -n <namespace> shows the revision
numbers.
Important: helm rollback reverts the chart manifests and
re-applies the previous image. It does not roll back the
database. If the upgrade ran a migration that altered schema or
data, the previous image may not be able to run against the new
schema.
Rolling back including the database
If the migration was destructive (e.g., dropped a column the old image still reads), you need to:
helm rollbackto the previous chart revision.- Restore the database from the backup taken in the pre-upgrade checklist.
- Verify application starts and reads data correctly.
This is a manual procedure with downtime; release notes will explicitly call out when an upgrade is "rollback-safe" (no backwards-incompatible migrations) vs "forward-only" (rollback requires database restore).
Migration failure recovery
When the pre-upgrade migration Job fails, Helm aborts and the
running deployment is unchanged. The failed Job remains for
inspection.
Investigating
# Find the failed Job
oc get jobs -n <namespace>
# Read its logs
oc logs job/<release-name>-migrate -n <namespace>
# Read the description for events
oc describe job/<release-name>-migrate -n <namespace>
Common causes:
- Insufficient database privileges. The migration Job runs as
the application's database user; if the new release requires
schema changes (
CREATE TABLE,ALTER TABLE), the user must have those privileges. - Lock timeout. A heavy concurrent query held a lock the migration needed. Retry after the contending workload completes.
- Validation failure. Migration tried to enforce a constraint on existing data that doesn't satisfy it. Release notes will call out required pre-migration data cleanup.
Retrying
- Fix the root cause (grant privileges, wait out the lock, clean up data).
- Delete the failed Job:
oc delete job/<release-name>-migrate -n <namespace> - Re-run the upgrade:
helm upgrade ...
The Job is a Helm hook with hook-delete-policy:
before-hook-creation,hook-succeeded, so deletion before retry is
the expected workflow.
Schema-dependent code in releases
A release's image bundles two things together: the new code and the
new migration files. The chart sequences them so migrations run
before the deployment rolls the new image (see "What happens
during upgrade" above) — the pre-upgrade hook is what makes this
work. There are two failure modes worth understanding so you don't
accidentally take down a deployment that would otherwise be fine:
Initializer that reads a not-yet-migrated column
If a Rails initializer (anything in config/initializers/) reads a
column that the migration in the same release is about to add, the
new image's Rails.env boot will NoMethodError on the missing
column. With pre-upgrade migrations this isn't a problem — the
column exists by the time the deployment rolls — but the same code
running in a bin/rails runner or any out-of-band invocation against
an unmigrated DB will crash. The host's initializers guard against
this with a column-presence check:
unless SiteSetting.column_names.include?("app_name_source")
Rails.logger.info "[SiteSetting] column not present yet; deferring"
next
end
The pattern: any initializer touching a recently-added column should
gate on column_names.include?(...) so a Rails boot against an older
schema (asset precompile, debug pod, manual rails console against a
restored backup) doesn't crash. The standard rescue
ActiveRecord::StatementInvalid / NoDatabaseError chain catches the
missing-table case but not the missing-column-on-existing-table
case — that surfaces as plain NoMethodError on the AR record getter.
Destructive migrations during a rolling upgrade
pre-upgrade migrations run while the old application pods are
still serving traffic. Additive changes (new column, new table, new
index) are safe — the old code simply ignores the new fields. But a
destructive migration (drop column, rename column, change column
type, add a NOT NULL constraint to existing data) can break the old
pods mid-upgrade.
The canonical safe pattern for destructive schema changes is the two-deploy approach:
- Ship a release that stops reading the column (or starts writing the new shape and reads either old or new). Deploy. Verify all pods are running this code.
- Ship a release with the migration that drops/renames/retypes the column. Now no running code references the dropped shape.
Release notes call out destructive migrations explicitly. If you see
a "this release contains a destructive schema change" line, plan for
the two-deploy sequence rather than a single helm upgrade.
Upgrading between modes
See two-modes.md §"Migrating from Bundled to Production".
This is a substantive data-migration exercise, not a routine
upgrade — plan it as a project rather than a helm upgrade.
Version-specific upgrade notes
Release-specific notes — breaking changes, renamed values, required
data cleanups — accumulate here as the chart evolves. Always read
the entries between your current chart version and the target
before running helm upgrade.
(No version-specific notes yet — current chart line is 0.1.x.)