Skip to content

Automated Test: db-cleanup-optimized #324

Closed
wants to merge 1 commit into from

Conversation

admin-coderabbit
Copy link
Owner

@admin-coderabbit admin-coderabbit commented Feb 4, 2026

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

  • Bug Fixes

    • Improved the annotation cleanup mechanism to handle large batches more efficiently and reliably during concurrent database operations.
  • Chores

    • Increased the frequency of automated annotation cleanup from every 10 minutes to every 1 minute, enabling faster removal of outdated annotations.

… deadlocks on MySQL (#80329)

* Split subquery when cleaning annotations

* update comment

* Raise batch size, now that we pay attention to it

* Iterate in batches

* Separate cancellable batch implementation to allow for multi-statement callbacks, add overload for single-statement use

* Use split-out utility in outer batching loop so it respects context cancellation

* guard against empty queries

* Use SQL parameters

* Use same approach for tags

* drop unused function

* Work around parameter limit on sqlite for large batches

* Bulk insert test data in DB

* Refactor test to customise test data creation

* Add test for catching SQLITE_MAX_VARIABLE_NUMBER limit

* Turn annotation cleanup test to integration tests

* lint

---------

Co-authored-by: Sofia Papagiannaki <1632407+papagian@users.noreply.github.com>
@coderabbit-eval
Copy link

coderabbit-eval bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

Refactors annotation cleanup logic to process deletions in batches with context cancellation support, avoiding deadlocks under concurrent load. Updates cleanup trigger frequency from 10 to 1 minute and introduces integration tests with database state cleanup and batched insertion patterns.

Changes

Cohort / File(s) Summary
Batched Cleanup Refactoring
pkg/services/annotations/annotationsimpl/xorm_store.go
Replaces single-statement DELETE operations with looped batched deletion using fetchIDs and deleteByIDs helpers. Adds context cancellation awareness and SQLite parameter limit handling (inlines IDs when batch exceeds 999 for SQLite). Applies same pattern to time-based, count-based, and orphaned tag cleanup.
Cleanup Frequency Adjustment
pkg/services/cleanup/cleanup.go
Changes cleanup ticker interval from 10 minutes to 1 minute, increasing cleanup frequency.
Integration Test Refactoring
pkg/services/annotations/annotationsimpl/cleanup_test.go
Converts cleanup test to integration variant with test table-driven configuration (batch sizes, annotation counts). Introduces batched insertion using InsertMulti for test data. Adds context-aware database cleanup with error aggregation. Includes scenario testing for batches exceeding SQLite parameter limits.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 In batches small we hop and clean,
No deadlocks now in between,
One minute ticks, the schedule flies,
SQLite limits caught our eyes,
Annotations tidied, tests aligned! 🧹✨

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Automated Test: db-cleanup-optimized' is vague and generic, using non-descriptive phrasing that doesn't clearly convey the main changes in the changeset. Use a more specific title that describes the primary change, such as 'Refactor annotation cleanup with batched processing and integration tests' or 'Optimize database cleanup with batch operations and improved test coverage'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch db-cleanup-optimized

Comment @coderabbit eval help to get the list of available commands and usage tips.

@coderabbit-eval coderabbit-eval bot closed this Feb 4, 2026
Copy link

@coderabbit-eval coderabbit-eval bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pkg/services/annotations/annotationsimpl/xorm_store.go`:
- Around line 534-538: The current routine cleanup logging uses r.log.Error for
normal batch progress (calls like r.log.Error("Annotations to clean by time",
... ) and r.log.Error("cleaned annotations by time", ... )) which will flood
error alerts; change these to r.log.Info or r.log.Debug as appropriate, stop
logging the ids slice (log only its length or sample) and remove the redundant
"err" field where err is known to be nil before the log; apply the same change
pattern to the other occurrences around the deleteByIDs calls (references:
r.log.Error, deleteByIDs, the "Annotations to clean by time" and "cleaned
annotations by time" messages) so only real errors use r.log.Error and routine
progress logs use Info/Debug with compact payloads.

In `@pkg/services/cleanup/cleanup.go`:
- Line 77: The ticker interval in Run (ticker := time.NewTicker(...)) is 1
minute while clean() uses a 9-minute context timeout, causing
overlapping/back-to-back runs; update them to be consistent by either restoring
the ticker to the previous longer interval (e.g., set ticker :=
time.NewTicker(time.Minute * 10)) or reducing the context timeout inside clean()
to something under 1 minute (e.g., <1m) so the context timeout is always shorter
than the ticker interval; ensure you update both the ticker declaration and the
corresponding timeout logic used by clean() (and similar logic referenced in
lines 88-94) so they remain aligned.
🧹 Nitpick comments (3)
pkg/services/annotations/annotationsimpl/cleanup_test.go (1)

237-271: Manual ID assignment couples test setup to DB internals.

Line 241 sets ID: int64(i + 1), which relies on the DB accepting explicitly provided IDs and not conflicting with auto-increment sequences. This works for SQLite but may break if the test DB engine changes or if rows from a previous subtest leave the auto-increment counter advanced (though DELETE doesn't reset it). Since the IDs are also baked into newAnnotationTags at line 270, any mismatch would silently produce orphaned tags.

Consider either removing explicit ID assignment and querying back for inserted IDs, or documenting this assumption.

pkg/services/annotations/annotationsimpl/xorm_store.go (2)

536-538: Variable names x, y are cryptic for returned values.

x, y don't convey meaning. Use affected, err (or batchAffected, batchErr to avoid shadowing) for readability.

Example fix for one occurrence
-			x, y := r.deleteByIDs(ctx, "annotation", ids)
-			r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y)
-			return x, y
+			batchAffected, batchErr := r.deleteByIDs(ctx, "annotation", ids)
+			r.log.Debug("cleaned annotations by time", "count", len(ids), "affected", batchAffected, "err", batchErr)
+			return batchAffected, batchErr

Also applies to: 556-558, 578-580


605-618: SQLite ID-inlining branch uses O(n²) string concatenation.

Lines 609–612 build the ID list via repeated fmt.Sprintf("%s, %d", ...), which creates a new string allocation on every iteration. With the test batch size of 32767, this becomes expensive. Use strings.Builder or strings.Join instead.

Also, the condition at line 608 checks r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit (the configured maximum), not len(ids) (the actual count). This means even a batch of 5 IDs would take the inlining path if the configured batch size exceeds 999.

Proposed fix using strings.Builder
-	if r.db.GetDBType() == migrator.SQLite && r.cfg.AnnotationCleanupJobBatchSize > sqliteParameterLimit {
-		values := fmt.Sprint(ids[0])
-		for _, v := range ids[1:] {
-			values = fmt.Sprintf("%s, %d", values, v)
-		}
-		sql = fmt.Sprintf(`DELETE FROM %s WHERE id IN (%s)`, table, values)
+	if r.db.GetDBType() == migrator.SQLite && len(ids) > sqliteParameterLimit {
+		var b strings.Builder
+		b.WriteString(strconv.FormatInt(ids[0], 10))
+		for _, v := range ids[1:] {
+			b.WriteByte(',')
+			b.WriteString(strconv.FormatInt(v, 10))
+		}
+		sql = fmt.Sprintf(`DELETE FROM %s WHERE id IN (%s)`, table, b.String())

Comment on lines +534 to +538
r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)

affected, err := r.executeUntilDoneOrCancelled(ctx, sql)
x, y := r.deleteByIDs(ctx, "annotation", ids)
r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y)
return x, y

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

r.log.Error used for routine operational logging — will pollute error logs.

These log lines (lines 534, 537, 554, 557, 576, 579) all use Error level for what is normal batch-progress logging. In production, this will flood error dashboards/alerts every cleanup cycle. Use Debug or Info level instead.

Proposed fix (showing pattern for all occurrences)
-			r.log.Error("Annotations to clean by time", "count", len(ids), "ids", ids, "cond", cond, "err", err)
+			r.log.Debug("Annotations to clean by time", "count", len(ids), "cond", cond)

-			r.log.Error("cleaned annotations by time", "count", len(ids), "affected", x, "err", y)
+			r.log.Debug("cleaned annotations by time", "count", len(ids), "affected", x, "err", y)

-			r.log.Error("Annotations to clean by count", "count", len(ids), "ids", ids, "cond", cond, "err", err)
+			r.log.Debug("Annotations to clean by count", "count", len(ids), "cond", cond)

-			r.log.Error("cleaned annotations by count", "count", len(ids), "affected", x, "err", y)
+			r.log.Debug("cleaned annotations by count", "count", len(ids), "affected", x, "err", y)

-		r.log.Error("Tags to clean", "count", len(ids), "ids", ids, "cond", cond, "err", err)
+		r.log.Debug("Tags to clean", "count", len(ids), "cond", cond)

-		r.log.Error("cleaned tags", "count", len(ids), "affected", x, "err", y)
+		r.log.Debug("cleaned tags", "count", len(ids), "affected", x, "err", y)

Additionally, note that lines 534 and 554 log "err", err where err is always nil at that point (it was already checked on line 531/551 and would have returned). The ids slice is also logged, which can be enormous for large batches — consider omitting it or logging only the count.

Also applies to: 554-558, 576-579

🤖 Prompt for AI Agents
In `@pkg/services/annotations/annotationsimpl/xorm_store.go` around lines 534 -
538, The current routine cleanup logging uses r.log.Error for normal batch
progress (calls like r.log.Error("Annotations to clean by time", ... ) and
r.log.Error("cleaned annotations by time", ... )) which will flood error alerts;
change these to r.log.Info or r.log.Debug as appropriate, stop logging the ids
slice (log only its length or sample) and remove the redundant "err" field where
err is known to be nil before the log; apply the same change pattern to the
other occurrences around the deleteByIDs calls (references: r.log.Error,
deleteByIDs, the "Annotations to clean by time" and "cleaned annotations by
time" messages) so only real errors use r.log.Error and routine progress logs
use Info/Debug with compact payloads.

@@ -74,7 +74,7 @@ func (j cleanUpJob) String() string {
func (srv *CleanUpService) Run(ctx context.Context) error {
srv.cleanUpTmpFiles(ctx)

ticker := time.NewTicker(time.Minute * 10)
ticker := time.NewTicker(time.Minute * 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ticker interval (1 min) vs. cleanup timeout (9 min) mismatch.

The ticker now fires every 1 minute, but clean() still uses a 9-minute context timeout. Since Run is single-threaded, a long-running cleanup will cause the next tick to fire immediately upon completion, leading to back-to-back cleanup cycles. The previous 10-minute interval matched the 9-minute timeout, leaving a 1-minute buffer. Consider reducing the timeout to something below 1 minute, or keeping the ticker interval above the timeout.

Also applies to: 88-94

🤖 Prompt for AI Agents
In `@pkg/services/cleanup/cleanup.go` at line 77, The ticker interval in Run
(ticker := time.NewTicker(...)) is 1 minute while clean() uses a 9-minute
context timeout, causing overlapping/back-to-back runs; update them to be
consistent by either restoring the ticker to the previous longer interval (e.g.,
set ticker := time.NewTicker(time.Minute * 10)) or reducing the context timeout
inside clean() to something under 1 minute (e.g., <1m) so the context timeout is
always shorter than the ticker interval; ensure you update both the ticker
declaration and the corresponding timeout logic used by clean() (and similar
logic referenced in lines 88-94) so they remain aligned.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant