CLOUDP-306333: Remove monitoring hosts on downscaling #652

Julien-Ben · 2025-12-17T16:02:03Z

Summary

Fixes CLOUDP-306333: When scaling down, removed hosts keep appearing in OM UI and monitoring agents keep trying to reach them.

Problem

The operator was not sending DELETE requests to the /hosts endpoint when scaling down. This affected multiple deployment types (the ticket was initially opened for AppDB).

Solution

On each reconcile, we now:

Get all monitored hosts from OM API
Compute desired hosts from the current desired state
Remove hosts that are monitored but not desired

When fetching monitored hosts, we rely on the assumption that one OM project = one deployment.
The goal of this design if to be indempotent. If the operator crashes in the middle of a reconciliation, we always compare what we have (OM state) with what we want.

Some previous approaches in controllers were doing a diff inside the reconciliation loop itself.
For example in RS controller:

hostsBefore := getAllHostsForReplicas(rs, membersNumberBefore)
hostsAfter := getAllHostsForReplicas(rs, scale.ReplicasThisReconciliation(rs))

Decisions to highlight

We do not error out when host removal fails, only log a warning.
If a cluster is unreachable (unhealthy), we do not remove its hosts from monitoring.
The indempotent approach implies that if a host is added to monitoring manually (outside of kubernetes), the operator will clean it up on reconciliation.

New Tests

Unit tests for GetAllMonitoredHostnames and RemoveUndesiredMonitoringHosts
E2E tests verify host count after scale-down

Proof of Work

Tests pass.

Checklist

Have you linked a jira ticket and/or is the ticket in the title?
Have you checked whether your jira ticket required DOCSP changes?
Have you added changelog file?

github-actions · 2025-12-17T16:02:58Z

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.6.2 Release Notes

Bug Fixes

Fixed an issue where hosts were not removed from Ops Manager monitoring when scaling down MongoDB or AppDB deployments.

Julien-Ben added 18 commits December 15, 2025 17:01

Add unit tests for host removal on downscaling

46aefdf

E2E test for host removal

1995665

Modify tests

9c914a5

Update mocked OM client

a01ff0e

Shared function

918bf9e

Fix for AppDB

a88fb9b

Fix for MC RS

fce8fcf

Fix for RS

0c6ef10

Fix for sharded clusters

5e03870

Changelog entry

8a9ce8d

Fix E2E test expected host count for multi-cluster mode

10b5b5f

Cleanup hosts on every reconciliation

f0f7932

RemoveUndesiredMonitoringHosts

e5bf6ea

Orphaned hosts test

84dd908

Rely on constraint one OM project = one deployment

c2947fb

Rely on OM API for sharded too

aa35ff1

Cleanup hosts in Multi replica set controller

cf3990a

Warn only when monitoring removal fails

a57ce56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLOUDP-306333: Remove monitoring hosts on downscaling #652

CLOUDP-306333: Remove monitoring hosts on downscaling #652

Uh oh!

Julien-Ben commented Dec 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CLOUDP-306333: Remove monitoring hosts on downscaling #652

Are you sure you want to change the base?

CLOUDP-306333: Remove monitoring hosts on downscaling #652

Uh oh!

Conversation

Julien-Ben commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Decisions to highlight

New Tests

Proof of Work

Checklist

Uh oh!

github-actions bot commented Dec 17, 2025

MCK 1.6.2 Release Notes

Bug Fixes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Julien-Ben commented Dec 17, 2025 •

edited

Loading