Red Hat Developer Sandbox — gotchas
The Sandbox is Red Hat's free 30-day-renewable shared OpenShift
cluster (https://sandbox.redhat.com). It's the cheapest place to
run a Bundled-mode install for evaluation or for our own end-to-end
validation runs against a real OpenShift, but it has a handful of
behaviors that don't apply on a customer-managed OpenShift cluster
and have caused real time-to-recover during validation work.
This doc captures them so the next pass through the validation queue doesn't have to rediscover them. None of these issues affect customer deployments — they're Sandbox-tenant policy, not chart or application defects.
member-operator owns .spec.replicas
The Sandbox runs a controller called member-operator that
auto-scales idle developer workloads. It claims field ownership of
.spec.replicas on every Deployment it watches via Kubernetes
server-side apply. After the first scale-down, any subsequent
helm upgrade that tries to set replicas — even implicitly via
replicaCount in the chart values — fails with:
UPGRADE FAILED: conflict occurred while applying object
caldredge-dev/<release>-deltamap-host: conflict with "member-operator"
using apps/v1: .spec.replicas
Workarounds:
- Don't set
replicaCountinhelm upgrade --setflags. If you need to scale, useoc scale deployment/<name> --replicas=N— that command takes ownership of the field cleanly without triggering the server-side-apply conflict. - Use
helm upgrade --reuse-valueswith no--set replicaCount=.... The replica field stays whatever member-operator (or your lastoc scale) set it to. --forceon helm discards the field-manager state but is destructive of other in-flight changes; not recommended.
This isn't a problem on a customer-managed OpenShift — there's no
external controller fighting .spec.replicas.
Idle-eviction to zero
After ~24h of no traffic to the Route, member-operator scales the
Deployment + StatefulSets to 0 replicas. PVCs are kept; helm
release status stays deployed. When you come back, you'll see:
oc get pods -n <namespace> # → No resources found
oc get deploy -n <namespace> # → READY 0/0 AVAILABLE 0
Recovery:
oc scale deployment/<release>-deltamap-host --replicas=1 -n <ns>
oc scale statefulset/extdb-postgresql --replicas=1 -n <ns> # if Bundled mode
oc scale statefulset/extredis-master --replicas=1 -n <ns>
The application pod won't go Ready until postgres + redis are back —
the wait-for-postgres init container handles this gracefully but
the app's normal liveness probe will start failing after ~60 s on a
slow Sandbox node, so don't panic if you see 0/1 Running for a
minute or two while the data-store pods finish booting.
Token TTL is short
The oc login --token=... session is good for a few hours, then the
cluster starts returning:
the server has asked for the client to provide credentials
Refresh: open
https://oauth-openshift.apps.<cluster>.openshiftapps.com/oauth/token/display
in a browser; it shows a fresh oc login command line you can paste.
oc debug doesn't help when the initializer crashes
oc debug deploy/<name> -- <command> clones the deployment's pod
spec but overrides the command. Useful for one-off bin/rails
invocations. Limit: the override only changes the command, not
the boot path. bin/rails db:migrate still has to load
Rails.application.config.environment, which loads every
initializer. If an initializer crashes (e.g. reads a column that
doesn't exist yet), oc debug -- bin/rails db:migrate fails the same
way the deployment's normal pod fails.
The runbook oc debug is right for: connecting to a working DB,
rendering a one-off report, exec'ing an isolated maintenance task.
Wrong for: bypassing a boot-time crash to "just run migrations" —
the boot is the crash.
Wedged-release recovery
When helm upgrade times out waiting for a crashlooping deployment
to become Ready, the release lands in STATUS: pending-upgrade and
no further upgrade can run until it's cleared.
Cleanest path on Sandbox:
helm rollback <release> <last-good-revision> -n <ns> --no-hooks
--no-hooks skips the post-rollback migration Job. Useful for
clearing the wedge fast; you'd typically follow with a real
helm upgrade once the underlying issue (image bug, missing migration,
broken values) is resolved.
If rollback can't recover (very stale schema vs. very new code), full uninstall + reinstall is the Sandbox-only escape hatch:
helm uninstall <release> <db-release> <redis-release> -n <ns>
oc delete pvc -n <ns> # only if you want a clean DB
helm install <db-release> ... -f values.yaml
helm install <redis-release> ... -f values.yaml
helm install <release> ./<chart-path> -f values.yaml
This is destructive — loses application data, license file, admin user. Acceptable on Sandbox; never the right call on a customer-managed cluster.
Pull policy + the :latest tag
The Sandbox install uses image.tag: latest and pullPolicy:
Always. With those defaults, oc rollout restart always pulls the
newest digest under :latest. On customer clusters the recommended
pattern is to pin to a specific version tag (e.g. :v0.2.3) and
bump it via helm upgrade --set image.tag=v0.2.4 — gives proper
release boundaries and rollback semantics. The :latest flow is fine
for our own validation but not what a customer should run.
Reauth + token rotation
The Sandbox's oc token rotates every ~24h. Scripts that exec
oc commands need to re-login when their token expires. Symptom:
memcache.go:265 ... unhandled Error ... the server has asked for
the client to provide credentials on the first oc call after a
gap. Re-run oc login per the "Token TTL" section above.
What this doc is not
These are Sandbox-specific issues. None of them apply to a customer running OpenShift on their own infrastructure:
| Sandbox issue | Customer-managed OpenShift |
|---|---|
member-operator .spec.replicas ownership |
No such controller |
| Idle-eviction to zero | No idle eviction |
| 24h token TTL | Customer-controlled (typically days/weeks via cluster auth integration) |
:latest image tag |
Customers should pin to specific versions |
So when validation runs surface a Sandbox-specific behavior, log it
here, not in troubleshooting.md — the customer-facing docs stay
focused on the customer environment.