Skip to content

Automated Test: dual-storage-enhanced #334

Closed

Conversation

admin-coderabbit
Copy link
Owner

@admin-coderabbit admin-coderabbit commented Feb 4, 2026

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

  • New Features

    • Added Mode 3 dual-write functionality with asynchronous writes to legacy storage backend, improving write performance.
    • Implemented List operation support for storage reads.
  • Improvements

    • Enhanced observability with detailed duration tracking and contextual logging for all storage operations.
  • Tests

    • Expanded test coverage for Mode 3 dual-writer across file, unified, and etcd storage backends.

* Dual writer: mode 3

* Add integration tests for playlits in mode 3

* Remove todo

* Update pkg/apiserver/rest/dualwriter_mode3.go

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>

* Admin: Fixes an issue where user accounts could not be enabled (#88117)

Fix: unable to enable user

* [REVIEW] FInish mode 3 and add tests

* Improve logging

* Update dependencies

* Update pkg/apiserver/rest/dualwriter_mode3_test.go

Co-authored-by: maicon <maiconscosta@gmail.com>

* remove test assertion

* Use mode log when dual writer is initiated

---------

Co-authored-by: Arati R. <33031346+suntala@users.noreply.github.com>
Co-authored-by: gonvee <gonvee@qq.com>
Co-authored-by: maicon <maiconscosta@gmail.com>
@coderabbit-eval
Copy link

coderabbit-eval bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

The changes implement asynchronous dual-write semantics for mode 3 in an API server's dual-writer, enabling synchronous writes to storage with delegated legacy writes to background goroutines with timeout handling. Tests are comprehensively rewritten and expanded to cover mode 3 across multiple storage backends.

Changes

Cohort / File(s) Summary
Mode 3 Core Implementation
pkg/apiserver/rest/dualwriter_mode3.go
Introduces async dual-write semantics where Create/Update/Delete/DeleteCollection perform synchronous storage operations and defer legacy writes to background goroutines with 10-second timeouts. Adds List method for storage-only reads. Includes extensive duration tracking and logging with mode label throughout all operations.
Mode 3 Test Suite
pkg/apiserver/rest/dualwriter_mode3_test.go
Replaces scaffold with comprehensive table-driven tests exercising Create, Get, List, Delete, DeleteCollection, and Update operations. Uses mock storage interfaces and real Prometheus registry, covering both success and error paths.
Mode 1 Test Cleanup
pkg/apiserver/rest/dualwriter_mode1_test.go
Removes local Prometheus registry creation, relying instead on package-level registry scope.
Integration Test Coverage
pkg/tests/apis/playlist/playlist_test.go
Adds mode 3 dual-writer test variants across file, unified, and etcd storage backends with KubernetesPlaylists feature toggle and explicit collection clearing for etcd paths.

Sequence Diagram

sequenceDiagram
    participant Client
    participant DualWriterMode3
    participant Storage
    participant LegacyStorage
    participant Background as Background<br/>Goroutine
    participant Metrics

    Client->>DualWriterMode3: Create(ctx, obj)
    DualWriterMode3->>Metrics: recordStartTime()
    DualWriterMode3->>Storage: Create(ctx, obj)
    Storage-->>DualWriterMode3: result, error
    DualWriterMode3->>Metrics: recordStorageDuration()
    DualWriterMode3->>Background: spawn goroutine<br/>(10s timeout)
    Background->>LegacyStorage: Create(ctx, obj)
    LegacyStorage-->>Background: result, error
    Background->>Metrics: recordLegacyDuration()
    DualWriterMode3-->>Client: Storage result + error

    Client->>DualWriterMode3: Delete(ctx, name, opts)
    DualWriterMode3->>Storage: Delete(ctx, name, opts)
    Storage-->>DualWriterMode3: result, error
    DualWriterMode3->>Metrics: recordStorageDuration()
    DualWriterMode3->>Background: spawn goroutine<br/>(10s timeout)
    Background->>LegacyStorage: Delete(ctx, name, opts)
    LegacyStorage-->>Background: result, error
    Background->>Metrics: recordLegacyDuration()
    DualWriterMode3-->>Client: Storage result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A rabbit hops through dual-write dreams so bright,
Mode three awakens—async paths take flight!
Storage springs forward, legacy trails behind,
With goroutines dancing and timeouts aligned.
Lists bloom new, and tests proliferate wide,
Hippity-hop, the dual-write with pride! 🌟

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Automated Test: dual-storage-enhanced' is vague and does not clearly describe the main changes, using non-descriptive terms that obscure the actual modifications. Clarify the title to explicitly describe the primary changes, such as 'Add async dual-write semantics for Mode3 and expand integration tests' or similar to make the main objective clear.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dual-storage-enhanced

Comment @coderabbit eval help to get the list of available commands and usage tips.

@coderabbit-eval coderabbit-eval bot closed this Feb 4, 2026
Copy link

@coderabbit-eval coderabbit-eval bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@pkg/apiserver/rest/dualwriter_mode3.go`:
- Line 106: The metric label is inconsistent: replace the incorrect use of name
with options.Kind in the success path so both error and success calls use the
same label; update the call to d.recordStorageDuration(false, mode3Str,
options.Kind, method, startStorage) (the function/variable names to locate are
recordStorageDuration, mode3Str, name, method, startStorage and options.Kind) so
metrics use options.Kind uniformly.
- Line 97: The context is being created with the struct logger d.Log instead of
the method-scoped logger variable log (which carries the contextual fields
name/kind/method); update the klog.NewContext call in dualwriter_mode3.go (the
line currently using d.Log) to use the local variable log so it matches Create,
Get, Update, and DeleteCollection and preserves the method-specific context.
- Around line 165-166: The legacy delete call is recording its duration with
recordStorageDuration instead of the legacy-specific metric; update the call
after d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions) to
call recordLegacyDuration(err != nil, mode3Str, options.Kind, method,
startLegacy) (i.e., replace recordStorageDuration with recordLegacyDuration) so
the legacy operation duration is recorded correctly for the legacy path tied to
startLegacy.
- Around line 44-46: The error branch after Storage.Create is recording the
wrong metric: replace the call to d.recordLegacyDuration(true, mode3Str,
options.Kind, method, startStorage) with d.recordStorageDuration(true, mode3Str,
options.Kind, method, startStorage) so storage failures are logged with the
storage duration metric; locate the error handling immediately after the
Storage.Create call where log.Error(err, "unable to create object in storage")
and return created, err occur and swap the metric function accordingly.
- Around line 128-130: The error handling after Storage.Update is recording the
wrong metric: replace the call to d.recordLegacyDuration(...) with
d.recordStorageDuration(...) so storage failures are recorded correctly;
specifically in the Update flow where Storage.Update returns err, change the
invocation of recordLegacyDuration to recordStorageDuration with the same
arguments (e.g. true, mode3Str, options.Kind, method, startStorage) to ensure
the storage-duration metric is emitted on error.
🧹 Nitpick comments (4)
pkg/apiserver/rest/dualwriter_mode3_test.go (3)

119-120: Inconsistent Prometheus registry usage.

This test creates a local p variable that shadows the package-level p defined in dualwriter_mode1_test.go. Other tests in this file (TestMode3_Create, TestMode3_List, TestMode3_Delete, TestMode3_DeleteCollection, TestMode3_Update) use the package-level p consistently.

♻️ Proposed fix to use package-level registry
 		if tt.setupStorageFn != nil {
 			tt.setupStorageFn(m, tt.input)
 		}

-		p := prometheus.NewRegistry()
 		dw := NewDualWriter(Mode3, ls, us, p)

37-44: Test case setup is misleading for Mode3 behavior.

The test case "error when creating object in the unified store fails" uses setupLegacyFn to configure the mock, but in Mode3, Storage is called synchronously first while Legacy is called asynchronously. Since both legacyStoreMock and storageMock share the same mock.Mock instance, this works by coincidence.

For clarity and to align with Mode3's semantics (Storage is primary), consider renaming the setup function or using setupStorageFn to make the test intent clearer.

♻️ Proposed fix to clarify test intent
 		{
 			name:  "error when creating object in the unified store fails",
 			input: failingObj,
-			setupLegacyFn: func(m *mock.Mock, input runtime.Object) {
+			setupStorageFn: func(m *mock.Mock) {
 				m.On("Create", mock.Anything, failingObj, mock.Anything, mock.Anything).Return(nil, errors.New("error"))
 			},
 			wantErr: true,
 		},

47-77: Consider verifying async legacy operations.

Mode3 delegates legacy writes to background goroutines. The current tests don't verify that the legacy store mock was called, which means the async behavior isn't being validated. If the legacy call fails silently or isn't executed at all, these tests wouldn't catch it.

Consider adding a short sleep or using mock.AssertExpectations with a timeout to verify the async legacy calls complete as expected.

pkg/apiserver/rest/dualwriter_mode3.go (1)

50-57: Async goroutines capture parent context which may be cancelled.

The background goroutines capture ctx from the parent scope and derive a timeout context from it via context.WithTimeoutCause. If the parent context gets cancelled after the function returns (e.g., when the HTTP request completes), the derived context will inherit that cancellation.

While this pattern appears intentional for fire-and-forget legacy writes and is consistent with Mode1, consider using context.WithoutCancel(ctx) (available in Go 1.21+) as the parent to ensure legacy operations aren't prematurely cancelled when the request completes.

Also applies to: 108-114, 134-141, 161-167

Comment on lines 44 to 46
log.Error(err, "unable to create object in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect metric recording on storage error.

When Storage.Create fails, the code calls recordLegacyDuration but should call recordStorageDuration. The error occurred in storage, not legacy.

🐛 Proposed fix
 	if err != nil {
 		log.Error(err, "unable to create object in storage")
-		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return created, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
log.Error(err, "unable to create object in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
log.Error(err, "unable to create object in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return created, err
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 44 - 46, The error
branch after Storage.Create is recording the wrong metric: replace the call to
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage) with
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage) so
storage failures are logged with the storage duration metric; locate the error
handling immediately after the Storage.Create call where log.Error(err, "unable
to create object in storage") and return created, err occur and swap the metric
function accordingly.

func (d *DualWriterMode3) Delete(ctx context.Context, name string, deleteValidation rest.ValidateObjectFunc, options *metav1.DeleteOptions) (runtime.Object, bool, error) {
var method = "delete"
log := d.Log.WithValues("name", name, "kind", options.Kind, "method", method)
ctx = klog.NewContext(ctx, d.Log)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent logger usage in context.

This line uses d.Log instead of the method-scoped log variable that includes contextual values (name, kind, method). Other methods like Create (line 39), Get (line 66), Update (line 123), and DeleteCollection (line 150) correctly use klog.NewContext(ctx, log).

🐛 Proposed fix
-	ctx = klog.NewContext(ctx, d.Log)
+	ctx = klog.NewContext(ctx, log)
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` at line 97, The context is being
created with the struct logger d.Log instead of the method-scoped logger
variable log (which carries the contextual fields name/kind/method); update the
klog.NewContext call in dualwriter_mode3.go (the line currently using d.Log) to
use the local variable log so it matches Create, Get, Update, and
DeleteCollection and preserves the method-specific context.

}
d.recordStorageDuration(false, mode3Str, name, method, startStorage)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent metric label: name used instead of options.Kind.

Line 103 correctly uses options.Kind for the error case, but line 106 passes name for the success case. This will result in inconsistent metric labels.

🐛 Proposed fix
-	d.recordStorageDuration(false, mode3Str, name, method, startStorage)
+	d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
d.recordStorageDuration(false, mode3Str, name, method, startStorage)
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` at line 106, The metric label is
inconsistent: replace the incorrect use of name with options.Kind in the success
path so both error and success calls use the same label; update the call to
d.recordStorageDuration(false, mode3Str, options.Kind, method, startStorage)
(the function/variable names to locate are recordStorageDuration, mode3Str,
name, method, startStorage and options.Kind) so metrics use options.Kind
uniformly.

Comment on lines +128 to +130
log.Error(err, "unable to update in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect metric recording on storage error.

Same issue as in Create: when Storage.Update fails, the code calls recordLegacyDuration but should call recordStorageDuration.

🐛 Proposed fix
 	if err != nil {
 		log.Error(err, "unable to update in storage")
-		d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
+		d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
 		return res, async, err
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
log.Error(err, "unable to update in storage")
d.recordLegacyDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
log.Error(err, "unable to update in storage")
d.recordStorageDuration(true, mode3Str, options.Kind, method, startStorage)
return res, async, err
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 128 - 130, The error
handling after Storage.Update is recording the wrong metric: replace the call to
d.recordLegacyDuration(...) with d.recordStorageDuration(...) so storage
failures are recorded correctly; specifically in the Update flow where
Storage.Update returns err, change the invocation of recordLegacyDuration to
recordStorageDuration with the same arguments (e.g. true, mode3Str,
options.Kind, method, startStorage) to ensure the storage-duration metric is
emitted on error.

Comment on lines +165 to +166
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Incorrect metric function for legacy duration.

This calls recordStorageDuration but is recording the legacy operation's duration (note startLegacy). Should call recordLegacyDuration.

🐛 Proposed fix
 		_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
-		d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
+		d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordStorageDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
_, err := d.Legacy.DeleteCollection(ctx, deleteValidation, options, listOptions)
d.recordLegacyDuration(err != nil, mode3Str, options.Kind, method, startLegacy)
🤖 Prompt for AI Agents
In `@pkg/apiserver/rest/dualwriter_mode3.go` around lines 165 - 166, The legacy
delete call is recording its duration with recordStorageDuration instead of the
legacy-specific metric; update the call after d.Legacy.DeleteCollection(ctx,
deleteValidation, options, listOptions) to call recordLegacyDuration(err != nil,
mode3Str, options.Kind, method, startLegacy) (i.e., replace
recordStorageDuration with recordLegacyDuration) so the legacy operation
duration is recorded correctly for the legacy path tied to startLegacy.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant