Skip to content

Automated Test: authz-service-improve-caching-pr #323

Conversation

admin-coderabbit
Copy link
Owner

@admin-coderabbit admin-coderabbit commented Feb 4, 2026

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

  • Refactor

    • Enhanced authorization service with improved caching strategy including denial-based cache entries and identity-aware permission lookups for reduced latency.
    • Updated client initialization to support remote authorization with configurable caching and tracing options.
  • Tests

    • Added comprehensive tests for cache behavior validation in authorization checks and list operations, including hit/miss scenarios and denial caching.

* remove the use of client side cache for in-proc authz client

Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>

* add a permission denial cache, fetch perms if not in either of the caches

Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>

* Clean up tests

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Cache tests

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Add test to list + cache

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Add outdated cache test

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

* Re-organize metrics

Co-authored-by: Ieva <ieva.vasiljeva@grafana.com>

---------

Co-authored-by: Gabriel MABILLE <gabriel.mabille@grafana.com>
@coderabbit-eval
Copy link

coderabbit-eval bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

These changes introduce a dual-layer caching system for authorization checks in the RBAC service. A new denial cache is added to fast-path denied permission requests, alongside a unified caching pathway for identity permissions. The client construction is updated to use a remote authz client initialized with a no-op cache option.

Changes

Cohort / File(s) Summary
Client Initialization
pkg/services/authz/rbac.go
Replaces local RBAC client construction with remote-authz client via authzlib.NewClient, configured with NoopCache and tracer. Adds exported NoopCache struct implementing cache interface with no-op Set/Delete and Get that returns not-found.
Cache Utilities
pkg/services/authz/rbac/cache.go
Adds userPermDenialCacheKey() function that generates cache keys for denial-specific entries in format: namespace + ".perm_" + userUID + "_" + action + "_" + name + "_" + parent.
Service Caching Logic
pkg/services/authz/rbac/service.go
Introduces permDenialCache field to Service struct. Adds getCachedIdentityPermissions() method to centralize identity permission retrieval from cache for Anonymous/User/ServiceAccount identities. Implements fast-path denial check in Check method before permission computation. Updates permission-denied paths to populate denial cache for short-circuit on future checks. Refactors Check and List methods to use unified caching pathway with metrics tracking.
Test Coverage
pkg/services/authz/rbac/service_test.go
Removes test case for cached permissions in getUserPermissions. Adds TestService_CacheCheck to verify Check method uses cached permissions, falls back to store on miss, handles outdated cache, and respects denial entries. Adds TestService_CacheList to verify List method assembles results from cached permissions. Updates service setup to initialize permDenialCache with short TTL.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Service
    participant DenialCache as Denial Cache
    participant PermCache as Perm Cache
    participant Store as Store/DB

    Client->>Service: Check(namespace, user, action, resource)
    activate Service
    
    Service->>DenialCache: Get(denial cache key)
    activate DenialCache
    DenialCache-->>Service: Found (denied)
    deactivate DenialCache
    
    alt Denial cached
        Service->>Service: Return denied (fast-path)
        Service-->>Client: Denied
    else Cache miss
        Service->>PermCache: getCachedIdentityPermissions(user)
        activate PermCache
        PermCache-->>Service: Permissions (if cached)
        deactivate PermCache
        
        alt Permissions cached
            Service->>Service: Evaluate against cached perms
        else Cache miss
            Service->>Store: Fetch permissions
            activate Store
            Store-->>Service: Permissions
            deactivate Store
            Service->>PermCache: Cache permissions
        end
        
        Service->>Service: Evaluate request
        
        alt Permission granted
            Service-->>Client: Allowed
        else Permission denied
            Service->>DenialCache: Set(denial cache key)
            Service-->>Client: Denied
        end
    end
    
    deactivate Service
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A cache within a cache, how clever! swish swish
When permissions are denied, we dash away quick—
No need to check again, we remember each "no,"
While identity permissions help authorization flow. ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Automated Test: authz-service-improve-caching-pr' is vague and generic, using 'Automated Test:' as a prefix that obscures the actual technical change being made. Revise the title to clearly describe the main technical change, such as 'Add denial cache and improve caching strategy in RBAC authz service' to better convey the purpose of the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch authz-service-improve-caching-pr

Comment @coderabbit eval help to get the list of available commands and usage tips.

@coderabbit-eval coderabbit-eval bot closed this Feb 4, 2026
Copy link

@coderabbit-eval coderabbit-eval bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@pkg/services/authz/rbac/service_test.go`:
- Around line 973-995: The test's claim that the perm cache "allows" access is
wrong because it sets map[string]bool{"dashboards:uid:dash1": false}; update the
test to set the permission entry to true so it actually grants access in the
perm cache and thus demonstrates that the explicit denial in s.permDenialCache
(userPermDenialCacheKey) short-circuits Check; locate the test block around func
t.Run "Should deny on explicit cache deny entry" and change the s.permCache.Set
call (using userPermCacheKey) to store a map with the dashboard key set to true
so the final require/assert still expects resp.Allowed == false.

In `@pkg/services/authz/rbac/service.go`:
- Around line 180-192: List currently uses getCachedIdentityPermissions and
accepts its result as authoritative, unlike Check which falls through to
getIdentityPermissions when cache indicates denial; update the List function to
mirror Check's behavior by detecting a cached "deny" or stale/empty permission
result from s.getCachedIdentityPermissions(ctx, listReq.Namespace,
listReq.IdentityType, listReq.UserUID, listReq.Action) and then calling
s.getIdentityPermissions(ctx, listReq.Namespace, listReq.IdentityType,
listReq.UserUID, listReq.Action) to refresh from the DB; ensure you update the
metrics labels the same way Check does and preserve the existing error logging
(ctxLogger.Error with subject and error) and return semantics when the DB fetch
fails.
- Around line 116-121: The denial cache (permDenialCache / permDenialKey) is
checked before the permission cache which can cause stale denials to override
newly granted permissions; change the logic to consult the permission cache
(e.g., s.permCache.Get or the existing permission lookup path) before
short-circuiting on s.permDenialCache.Get, or if keeping the denial-first order,
validate the denial entry against the permission cache/DB (invalidate denial if
a positive permission exists) so grants are not blocked while perm cache TTLs
expire.
🧹 Nitpick comments (3)
pkg/services/authz/rbac/cache.go (1)

26-32: Cache key prefix collision between userPermCacheKey and userPermDenialCacheKey.

Both functions share the same namespace + ".perm_" + userUID + "_" prefix and use _ as separator without escaping. Since both caches share the same underlying cache.Cache, a theoretical collision exists: userPermCacheKey(ns, uid, action+"_"+name+"_"+parent) would produce the same key as userPermDenialCacheKey(ns, uid, action, name, parent).

In practice this is unlikely because action values use colons (e.g., dashboards:read) and UIDs are constrained, but a distinct prefix (e.g., .perm_deny_) for the denial key would eliminate the ambiguity entirely.

🛡️ Suggested fix
 func userPermDenialCacheKey(namespace, userUID, action, name, parent string) string {
-	return namespace + ".perm_" + userUID + "_" + action + "_" + name + "_" + parent
+	return namespace + ".perm_deny_" + userUID + "_" + action + "_" + name + "_" + parent
 }
pkg/services/authz/rbac/service.go (2)

116-137: Check flow correctly falls through to DB on stale cache — but metrics label is misleading.

When cached permissions are found but deny access, execution falls through to the full DB fetch (supporting the "outdated cache" scenario tested in TestService_CacheCheck). However, the metric at line 137 records this as a cache miss ("false"), even though the cache was hit but simply didn't grant access. This conflates "cache miss" with "cache hit, stale/insufficient."

Consider using a third label value or a separate metric to distinguish "cache miss" from "cache hit but stale," as this will aid debugging cache effectiveness in production.


342-368: getCachedIdentityPermissions may trigger a DB call for user identifiers.

For TypeUser/TypeServiceAccount, this function calls s.GetUserIdentifiers(ctx, ns, userID) at line 356, which can fall through to the database if the identity isn't cached yet. This is a side effect in what is conceptually a "cache-only" lookup path. It won't cause correctness issues (the result is cached for subsequent calls), but it makes the function's behavior less predictable for callers who expect it to be a pure cache read.

Comment on lines +973 to +995
t.Run("Should deny on explicit cache deny entry", func(t *testing.T) {
s := setupService()

s.idCache.Set(ctx, userIdentifierCacheKey("org-12", "test-uid"), *userID)

// Explicitly deny access to the dashboard
s.permDenialCache.Set(ctx, userPermDenialCacheKey("org-12", "test-uid", "dashboards:read", "dash1", "fold1"), true)

// Allow access to the dashboard to prove this is not checked
s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": false})

resp, err := s.Check(ctx, &authzv1.CheckRequest{
Namespace: "org-12",
Subject: "user:test-uid",
Group: "dashboard.grafana.app",
Resource: "dashboards",
Verb: "get",
Name: "dash1",
Folder: "fold1",
})
require.NoError(t, err)
assert.False(t, resp.Allowed)
})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test comment is misleading — perm cache value false doesn't actually "allow" access.

Line 982 sets map[string]bool{"dashboards:uid:dash1": false}, with the comment "Allow access to the dashboard to prove this is not checked." However, a false value in the scope map would deny access anyway. To truly prove that the denial cache short-circuits before the permission cache is consulted, set the value to true:

🧪 Suggested fix for a more convincing test
-		// Allow access to the dashboard to prove this is not checked
-		s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": false})
+		// Set permission cache to allow access — proves denial cache takes precedence
+		s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": true})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
t.Run("Should deny on explicit cache deny entry", func(t *testing.T) {
s := setupService()
s.idCache.Set(ctx, userIdentifierCacheKey("org-12", "test-uid"), *userID)
// Explicitly deny access to the dashboard
s.permDenialCache.Set(ctx, userPermDenialCacheKey("org-12", "test-uid", "dashboards:read", "dash1", "fold1"), true)
// Allow access to the dashboard to prove this is not checked
s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": false})
resp, err := s.Check(ctx, &authzv1.CheckRequest{
Namespace: "org-12",
Subject: "user:test-uid",
Group: "dashboard.grafana.app",
Resource: "dashboards",
Verb: "get",
Name: "dash1",
Folder: "fold1",
})
require.NoError(t, err)
assert.False(t, resp.Allowed)
})
t.Run("Should deny on explicit cache deny entry", func(t *testing.T) {
s := setupService()
s.idCache.Set(ctx, userIdentifierCacheKey("org-12", "test-uid"), *userID)
// Explicitly deny access to the dashboard
s.permDenialCache.Set(ctx, userPermDenialCacheKey("org-12", "test-uid", "dashboards:read", "dash1", "fold1"), true)
// Set permission cache to allow access — proves denial cache takes precedence
s.permCache.Set(ctx, userPermCacheKey("org-12", "test-uid", "dashboards:read"), map[string]bool{"dashboards:uid:dash1": true})
resp, err := s.Check(ctx, &authzv1.CheckRequest{
Namespace: "org-12",
Subject: "user:test-uid",
Group: "dashboard.grafana.app",
Resource: "dashboards",
Verb: "get",
Name: "dash1",
Folder: "fold1",
})
require.NoError(t, err)
assert.False(t, resp.Allowed)
})
🤖 Prompt for AI Agents
In `@pkg/services/authz/rbac/service_test.go` around lines 973 - 995, The test's
claim that the perm cache "allows" access is wrong because it sets
map[string]bool{"dashboards:uid:dash1": false}; update the test to set the
permission entry to true so it actually grants access in the perm cache and thus
demonstrates that the explicit denial in s.permDenialCache
(userPermDenialCacheKey) short-circuits Check; locate the test block around func
t.Run "Should deny on explicit cache deny entry" and change the s.permCache.Set
call (using userPermCacheKey) to store a map with the dashboard key set to true
so the final require/assert still expects resp.Allowed == false.

Comment on lines +116 to +121
permDenialKey := userPermDenialCacheKey(checkReq.Namespace.Value, checkReq.UserUID, checkReq.Action, checkReq.Name, checkReq.ParentFolder)
if _, ok := s.permDenialCache.Get(ctx, permDenialKey); ok {
s.metrics.permissionCacheUsage.WithLabelValues("true", checkReq.Action).Inc()
s.metrics.requestCount.WithLabelValues("false", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
return &authzv1.CheckResponse{Allowed: false}, nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Denial cache takes absolute precedence — could serve stale denials after permission grants.

The denial cache short-circuits before even checking the permission cache. If an admin grants a user access, the denial cache entry (up to 30s TTL) will continue to reject requests even though the permission cache would have expired and the DB would return the updated permissions.

Since the permission cache and denial cache share the same TTL but are keyed differently (perm cache by action only, denial cache by action+name+parent), there's no coordinated invalidation. This is likely an acceptable trade-off given the short TTL, but it should be documented and considered if TTLs are ever increased.

🤖 Prompt for AI Agents
In `@pkg/services/authz/rbac/service.go` around lines 116 - 121, The denial cache
(permDenialCache / permDenialKey) is checked before the permission cache which
can cause stale denials to override newly granted permissions; change the logic
to consult the permission cache (e.g., s.permCache.Get or the existing
permission lookup path) before short-circuiting on s.permDenialCache.Get, or if
keeping the denial-first order, validate the denial entry against the permission
cache/DB (invalidate denial if a positive permission exists) so grants are not
blocked while perm cache TTLs expire.

Comment on lines +180 to 192
permissions, err := s.getCachedIdentityPermissions(ctx, listReq.Namespace, listReq.IdentityType, listReq.UserUID, listReq.Action)
if err == nil {
s.metrics.permissionCacheUsage.WithLabelValues("true", listReq.Action).Inc()
} else {
s.metrics.permissionCacheUsage.WithLabelValues("false", listReq.Action).Inc()

permissions, err = s.getIdentityPermissions(ctx, listReq.Namespace, listReq.IdentityType, listReq.UserUID, listReq.Action)
if err != nil {
ctxLogger.Error("could not get user permissions", "subject", req.GetSubject(), "error", err)
s.metrics.requestCount.WithLabelValues("true", "true", req.GetVerb(), req.GetGroup(), req.GetResource()).Inc()
return nil, err
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent caching strategy between Check and List.

Check (lines 123–136) falls through to the database when cached permissions deny access, handling the "outdated cache" case. List, however, directly uses cached permissions without any fallthrough. This means:

  • Check on resource X: cache says no → DB fetch → finds new permission → allows
  • List: cache says no X → returns stale list without X

This could lead to a brief inconsistency where Check allows access to a resource that List doesn't include. Consider applying the same fallthrough logic or documenting this as intentional.

🤖 Prompt for AI Agents
In `@pkg/services/authz/rbac/service.go` around lines 180 - 192, List currently
uses getCachedIdentityPermissions and accepts its result as authoritative,
unlike Check which falls through to getIdentityPermissions when cache indicates
denial; update the List function to mirror Check's behavior by detecting a
cached "deny" or stale/empty permission result from
s.getCachedIdentityPermissions(ctx, listReq.Namespace, listReq.IdentityType,
listReq.UserUID, listReq.Action) and then calling s.getIdentityPermissions(ctx,
listReq.Namespace, listReq.IdentityType, listReq.UserUID, listReq.Action) to
refresh from the DB; ensure you update the metrics labels the same way Check
does and preserve the existing error logging (ctxLogger.Error with subject and
error) and return semantics when the DB fetch fails.

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant