mirror of https://github.com/helm/helm
The previous fix (increasing timeout / reducing deletion delay) did not
work because the flakiness is not a timing problem at all.
Root cause: fluxcd/cli-utils HasSynced() returns true after the initial
list item is *popped* from DeltaFIFO, which is before AddFunc delivers
the ResourceUpdateEvent to the collector. This creates a race where the
SyncEvent can arrive at the statusObserver *before* the pod's Current
status is recorded. When that happens:
- statusObserver sees pod as Unknown
- Unknown is skipped for WaitForDelete (by design, to handle resources
that were already deleted before watching started)
- AggregateStatus([], NotFoundStatus) == NotFoundStatus → cancel()
- The watch context is cancelled before DeleteFunc can fire
- Final check: pod still Current → error
The test intent is to verify that waitForDeleteCtx (not the cancelled
generalCtx) is selected. A non-existent resource satisfies this:
- With waitForDeleteCtx=Background(): informer syncs with empty list
→ Unknown → cancel → success ✓
- With generalCtx (cancelled, wrong): context immediately done
→ ctx.Err() appended → error returned ✓
Remove the goroutine-based deletion and the pod creation to eliminate
the race while preserving the context-selection assertion.
Signed-off-by: Terry Howe <terrylhowe@gmail.com>
pull/32016/head
parent
4c0d21f53f
commit
a7f84439aa
Loading…
Reference in new issue