Skip to content

Conversation

@Julien-Ben
Copy link
Collaborator

@Julien-Ben Julien-Ben commented Dec 17, 2025

Summary

Fixes CLOUDP-306333: When scaling down, removed hosts keep appearing in OM UI and monitoring agents keep trying to reach them.

Problem

The operator was not sending DELETE requests to the /hosts endpoint when scaling down. This affected multiple deployment types (the ticket was initially opened for AppDB).

Solution

On each reconcile, we now:

  1. Get all monitored hosts from OM API
  2. Compute desired hosts from the current desired state
  3. Remove hosts that are monitored but not desired

When fetching monitored hosts, we rely on the assumption that one OM project = one deployment.
The goal of this design if to be indempotent. If the operator crashes in the middle of a reconciliation, we always compare what we have (OM state) with what we want.

Some previous approaches in controllers were doing a diff inside the reconciliation loop itself.
For example in RS controller:

hostsBefore := getAllHostsForReplicas(rs, membersNumberBefore)
hostsAfter := getAllHostsForReplicas(rs, scale.ReplicasThisReconciliation(rs))

Decisions to highlight

  • We do not error out when host removal fails, only log a warning.
  • If a cluster is unreachable (unhealthy), we do not remove its hosts from monitoring.
  • The indempotent approach implies that if a host is added to monitoring manually (outside of kubernetes), the operator will clean it up on reconciliation.

New Tests

  • Unit tests for GetAllMonitoredHostnames and RemoveUndesiredMonitoringHosts
  • E2E tests verify host count after scale-down

Proof of Work

Tests pass.

Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

@github-actions
Copy link

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.6.2 Release Notes

Bug Fixes

  • Fixed an issue where hosts were not removed from Ops Manager monitoring when scaling down MongoDB or AppDB deployments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant