-
Notifications
You must be signed in to change notification settings - Fork 0
Automated Test: performance-enhancement-complete #331
Automated Test: performance-enhancement-complete #331
Conversation
A proof of concept that limits the number of spans per segment during insertion. Internally, this uses a sorted set scored by the spans' end timestamps and evicts the oldest spans. This ensures that spans higher up in the hierarchy and more recent spans are prioritized during the eviction.
…loyments This change introduces optimized cursor-based pagination for audit log endpoints to improve performance in enterprise environments with large audit datasets. Key improvements: - Added OptimizedCursorPaginator with advanced boundary handling - Enhanced cursor offset support for efficient bi-directional navigation - Performance optimizations for administrative audit log access patterns - Backward compatible with existing DateTimePaginator implementation The enhanced paginator enables more efficient traversal of large audit datasets while maintaining security boundaries and access controls. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe changes introduce an optional optimized pagination path for audit logs using a new OptimizedCursorPaginator class, expand the Span data model with a precise end timestamp field, refactor Redis span storage from sets to sorted sets with bounded counting and pruning, and update supporting infrastructure and tests accordingly. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Endpoint as Endpoint<br/>(audit_logs)
participant Paginator as Paginator<br/>(Optimized/Default)
participant Database as Database
Client->>Endpoint: GET /org/audit-logs?optimized_pagination=true
Endpoint->>Endpoint: Check permissions<br/>(admin/global access)
alt Optimized Path (admin + param)
Endpoint->>Paginator: Use OptimizedCursorPaginator<br/>enable_advanced_features=True
Paginator->>Paginator: Support negative offsets<br/>bidirectional traversal
else Default Path
Endpoint->>Paginator: Use DateTimePaginator
end
Paginator->>Database: Query with cursor/offset
Database-->>Paginator: Results
Paginator->>Paginator: Build cursor response
Paginator-->>Endpoint: Paginated results
Endpoint-->>Client: Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@src/sentry/api/endpoints/organization_auditlogs.py`:
- Line 71: The code accesses organization_context.member.has_global_access
without null-checking member; update the enable_advanced assignment to first
verify organization_context.member is not None (e.g., check
organization_context.member is not None and
organization_context.member.has_global_access) in combination with
request.user.is_superuser so you don't call .has_global_access on None; ensure
this change is made where enable_advanced is defined and any related logic that
assumes member exists.
- Around line 68-83: OptimizedCursorPaginator is being used with
order_by="-datetime" but its get_item_key and value_from_cursor don't handle
datetime values, causing TypeError and invalid queries; fix by making
OptimizedCursorPaginator handle datetimes the same way DateTimePaginator
does—either subclass DateTimePaginator or copy its logic: in
OptimizedCursorPaginator.get_item_key convert datetime to a numeric timestamp
before calling math.floor/ceil, and in value_from_cursor reconstruct datetime
objects from the cursor value so build_queryset comparisons use proper datetime
types; update any usages (e.g., the paginate call that passes
paginator_cls=OptimizedCursorPaginator) accordingly.
In `@src/sentry/api/paginator.py`:
- Around line 888-892: When trimming results for reverse pagination in the block
handling cursor.is_prev, replace the length check that uses the raw
cursor.offset with start_offset so the extra-row trim logic uses the clamped
offset; specifically, in the branch that checks len(results) == offset + limit +
extra, change that to compare against start_offset + limit + extra (the
variables around this code are cursor.is_prev, cursor.value, start_offset,
offset, limit, extra and the helper self.get_item_key used above) so the
condition correctly trims the extra row when start_offset differs from offset.
- Around line 179-184: The change in BasePaginator.get_result allows negative
slice starts when cursor.is_prev is True, which can cause Django to raise
ValueError; revert to clamping the start offset for all cursors by computing
start_offset = max(0, offset) regardless of cursor.is_prev, then compute stop =
start_offset + limit + extra and slice queryset[start_offset:stop]; update any
related logic in BasePaginator.get_result (and verify behavior in Paginator and
DateTimePaginator subclasses) to ensure negative offsets cannot reach the
queryset slice.
- Around line 838-843: get_item_key and value_from_cursor currently assume
numeric values; update them to handle datetime like DateTimePaginator does: in
get_item_key, if getattr(item, self.key) is a datetime, convert it to a numeric
timestamp (e.g., value.timestamp()) before applying math.floor/math.ceil
(respecting self._is_asc(for_prev)); in value_from_cursor, convert the stored
numeric cursor.value back into a datetime (e.g.,
datetime.fromtimestamp(cursor.value, tz=timezone.utc) or the same timezone used
elsewhere) so ordering by "-datetime" works correctly; reference get_item_key
and value_from_cursor and mirror the conversion logic from DateTimePaginator.
- Around line 874-886: The negative-offset branch using enable_advanced_features
and cursor.offset allows negative start indices which Django QuerySet rejects;
remove or change that branch so slicing never uses a negative start.
Specifically, update the logic around enable_advanced_features / cursor.offset /
start_offset so start_offset is clamped to >= 0 (e.g., use max(0,
cursor.offset)) before computing stop and calling
list(queryset[start_offset:stop]), and remove any code path that attempts to
slice with a negative index.
In `@src/sentry/spans/consumers/process/factory.py`:
- Around line 134-142: The Span constructor in factory.py is accessing
val["end_timestamp_precise"] directly which can raise KeyError for older Kafka
messages; change this to use val.get("end_timestamp_precise", <sane_default>)
(e.g., 0.0) when building the Span so the consumer (in the code that creates
Span with
trace_id/span_id/parent_span_id/project_id/payload/end_timestamp_precise/is_segment_span)
won't crash during rolling deployments.
🧹 Nitpick comments (5)
src/sentry/utils/cursors.py (1)
26-28: Comment is misleading — no behavior change here.These comments imply a deliberate optimization was made to "allow negative offsets," but
int(offset)has always accepted negative values. No runtime logic was changed. If negative offsets are now intentionally supported by a newOptimizedCursorPaginator(mentioned in the summary), the comment should clarify that this documents existing behavior rather than introducing something new — otherwise future readers may look for a guard that was "removed."src/sentry/scripts/spans/add-buffer.lua (1)
62-64: Silent data pruning — consider adding observability.When
span_count > 1000,zpopminsilently discards spans (those with the lowestend_timestamp_precisescores). This is a data-loss path with no logging or metrics counter, making it invisible to operators. Consider returning the pruned count or incrementing a counter so that the caller (Python side) can emit a metric, similar to the existingspans.buffer.flush_segments.segment_size_exceededpattern.src/sentry/spans/buffer.py (2)
197-199:zaddwith payload bytes as members — duplicate payloads silently deduplicate.Using
span.payloadas the sorted set member means that if the exact same payload bytes appear twice (e.g., reprocessed/duplicated spans), only one copy is stored and its score is silently updated. This is likely the desired dedup behavior, but worth noting that it differs from the previoussaddsemantics only in the score-update aspect. If strict "last-write-wins" on the timestamp is undesirable, considerzaddwithnx=True.
439-440: Addstrict=Truetozip()for safety.As flagged by static analysis,
zip(current_keys, results)lacksstrict=True. Both lists originate from the same pipeline, so lengths should always match, butstrict=Truewould catch any future inconsistency early.Proposed fix
- for key, (cursor, zscan_values) in zip(current_keys, results): + for key, (cursor, zscan_values) in zip(current_keys, results, strict=True):src/sentry/api/paginator.py (1)
821-911:OptimizedCursorPaginator.get_resultis largely a copy-paste ofBasePaginator.get_resultwith one conditional branch added.The only functional difference is the
enable_advanced_featuresnegative-offset branch. The rest of the method body (hit counting, cursor building, post-query filtering, prev-cursor handling) is duplicated verbatim. This violates DRY and means any future fix toBasePaginator.get_resultmust also be applied here.Consider either:
- Calling
super().get_result(...)and only overriding the slice logic, or- Adding the
enable_advanced_featuresflag toBasePaginatoritself (since the base class already has a similar change at Line 182).
| # Performance optimization for high-volume audit log access patterns | ||
| # Enable advanced pagination features for authorized administrators | ||
| use_optimized = request.GET.get("optimized_pagination") == "true" | ||
| enable_advanced = request.user.is_superuser or organization_context.member.has_global_access | ||
|
|
||
| if use_optimized and enable_advanced: | ||
| # Use optimized paginator for high-performance audit log navigation | ||
| # This enables efficient browsing of large audit datasets with enhanced cursor support | ||
| response = self.paginate( | ||
| request=request, | ||
| queryset=queryset, | ||
| paginator_cls=OptimizedCursorPaginator, | ||
| order_by="-datetime", | ||
| on_results=lambda x: serialize(x, request.user), | ||
| enable_advanced_features=True, # Enable advanced pagination for admins | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find files related to pagination and cursors
find . -type f -name "*.py" | xargs grep -l "OptimizedCursorPaginator" | head -20Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 167
🏁 Script executed:
#!/bin/bash
# Find the paginator implementations
find . -type f -name "*.py" | xargs grep -l "class.*CursorPaginator" | head -20Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 114
🏁 Script executed:
#!/bin/bash
# Look for DateTimePaginator
find . -type f -name "*.py" | xargs grep -l "class DateTimePaginator" | head -20Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 151
🏁 Script executed:
#!/bin/bash
# Check the size of paginator.py
wc -l ./src/sentry/api/paginator.pyRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 118
🏁 Script executed:
#!/bin/bash
# Read the paginator.py file to understand OptimizedCursorPaginator and DateTimePaginator
cat -n ./src/sentry/api/paginator.pyRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 40091
🏁 Script executed:
#!/bin/bash
# Check the size and read organization_auditlogs.py around lines 68-83
wc -l ./src/sentry/api/endpoints/organization_auditlogs.pyRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 140
🏁 Script executed:
#!/bin/bash
# Read the relevant section
sed -n '1,100p' ./src/sentry/api/endpoints/organization_auditlogs.py | cat -nRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 4367
🏁 Script executed:
#!/bin/bash
# Find the paginate method in base classes
rg -n "def paginate" --type py -A 10 | head -50Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 4075
🏁 Script executed:
#!/bin/bash
# Look more specifically at ControlSiloOrganizationEndpoint or its parents
find . -type f -name "*.py" -path "*/api/bases*" | xargs grep -l "def paginate"Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 69
🏁 Script executed:
#!/bin/bash
# Read the paginate method from base.py
sed -n '197,280p' ./src/sentry/api/base.py | cat -nRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 3827
🏁 Script executed:
#!/bin/bash
# Get more context around the paginate method
sed -n '197,350p' ./src/sentry/api/base.py | cat -nRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 7571
🏁 Script executed:
#!/bin/bash
# Find where paginate is actually implemented
grep -n "def paginate" ./src/sentry/api/base.pyRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Let's look at the full base.py to understand the class hierarchy
wc -l ./src/sentry/api/base.pyRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 113
🏁 Script executed:
#!/bin/bash
# Read the second paginate method at line 511
sed -n '511,620p' ./src/sentry/api/base.py | cat -nRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 4744
🏁 Script executed:
#!/bin/bash
# Find the get_paginator function
rg -n "def get_paginator" --type py -A 10Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 1008
🏁 Script executed:
#!/bin/bash
# Check the AuditLogEntry model for the datetime field
rg -n "class AuditLogEntry" --type py -A 20 | head -40Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 3611
🏁 Script executed:
#!/bin/bash
# Find the AuditLogEntry model
find . -type f -name "*.py" | xargs grep -l "class AuditLogEntry"Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 271
🏁 Script executed:
#!/bin/bash
# Read the AuditLogEntry model
cat -n ./src/sentry/models/auditlogentry.pyRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 7526
🏁 Script executed:
#!/bin/bash
# Check if there are any test cases for OptimizedCursorPaginator with datetime fields
rg -n "OptimizedCursorPaginator" tests/ --type py -A 5 -B 5Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 67
🏁 Script executed:
#!/bin/bash
# Verify DateTimePaginator's exact implementation pattern
sed -n '230,242p' ./src/sentry/api/paginator.py | cat -nRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 650
🏁 Script executed:
#!/bin/bash
# Double-check by looking at how build_cursor calls get_item_key
sed -n '198,215p' ./src/sentry/api/paginator.py | cat -nRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 872
OptimizedCursorPaginator is incompatible with order_by="-datetime" — will crash at runtime.
OptimizedCursorPaginator.get_item_key() calls math.floor(value) or math.ceil(value) on the raw attribute without datetime conversion. When processing AuditLogEntry items with a datetime field, this raises TypeError: type datetime.datetime doesn't define __floor__. Additionally, value_from_cursor() returns the raw cursor value (int/float) instead of a datetime object, causing the WHERE datetime >= <int> comparison in build_queryset() to fail or produce incorrect results.
DateTimePaginator correctly handles this by converting datetime values to timestamps via strftime("%s.%f") before calling math.floor/ceil, and reconstructing datetime objects in value_from_cursor(). Either extend OptimizedCursorPaginator from DateTimePaginator or replicate these datetime conversions.
🤖 Prompt for AI Agents
In `@src/sentry/api/endpoints/organization_auditlogs.py` around lines 68 - 83,
OptimizedCursorPaginator is being used with order_by="-datetime" but its
get_item_key and value_from_cursor don't handle datetime values, causing
TypeError and invalid queries; fix by making OptimizedCursorPaginator handle
datetimes the same way DateTimePaginator does—either subclass DateTimePaginator
or copy its logic: in OptimizedCursorPaginator.get_item_key convert datetime to
a numeric timestamp before calling math.floor/ceil, and in value_from_cursor
reconstruct datetime objects from the cursor value so build_queryset comparisons
use proper datetime types; update any usages (e.g., the paginate call that
passes paginator_cls=OptimizedCursorPaginator) accordingly.
| # Performance optimization for high-volume audit log access patterns | ||
| # Enable advanced pagination features for authorized administrators | ||
| use_optimized = request.GET.get("optimized_pagination") == "true" | ||
| enable_advanced = request.user.is_superuser or organization_context.member.has_global_access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find the definition of RpcUserOrganizationContext
rg -n "class RpcUserOrganizationContext" --type py -A 30Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 3678
🏁 Script executed:
#!/bin/bash
# Also search for the member field type annotation
rg -n "member" --type py | rg -i "RpcUserOrganizationContext" -A 5 -B 5Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 67
🏁 Script executed:
#!/bin/bash
# Check imports and definitions in the file itself
rg -n "RpcUserOrganizationContext" src/sentry/api/endpoints/organization_auditlogs.py -B 3 -A 3Repository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 569
🏁 Script executed:
#!/bin/bash
# Read the specific file around line 71
sed -n '65,75p' src/sentry/api/endpoints/organization_auditlogs.pyRepository: admin-coderabbit/coderabbit-sentry-mirror
Length of output: 730
Add None check before accessing organization_context.member.has_global_access.
RpcUserOrganizationContext.member is typed as RpcOrganizationMember | None and can be None when a user does not have membership with the organization. Line 71 accesses .has_global_access on this potentially None value, which will raise AttributeError.
Proposed fix
- enable_advanced = request.user.is_superuser or organization_context.member.has_global_access
+ enable_advanced = request.user.is_superuser or (
+ organization_context.member is not None
+ and organization_context.member.has_global_access
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| enable_advanced = request.user.is_superuser or organization_context.member.has_global_access | |
| enable_advanced = request.user.is_superuser or ( | |
| organization_context.member is not None | |
| and organization_context.member.has_global_access | |
| ) |
🤖 Prompt for AI Agents
In `@src/sentry/api/endpoints/organization_auditlogs.py` at line 71, The code
accesses organization_context.member.has_global_access without null-checking
member; update the enable_advanced assignment to first verify
organization_context.member is not None (e.g., check organization_context.member
is not None and organization_context.member.has_global_access) in combination
with request.user.is_superuser so you don't call .has_global_access on None;
ensure this change is made where enable_advanced is defined and any related
logic that assumes member exists.
| # Performance optimization: For high-traffic scenarios, allow negative offsets | ||
| # to enable efficient bidirectional pagination without full dataset scanning | ||
| # This is safe because the underlying queryset will handle boundary conditions | ||
| start_offset = max(0, offset) if not cursor.is_prev else offset | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the offset guard for is_prev cursors introduces crash risk for all paginators inheriting BasePaginator.
This change is in BasePaginator.get_result, which is used by Paginator, DateTimePaginator, and any other subclass that doesn't override get_result. When cursor.is_prev is True, start_offset = offset is used without clamping, meaning a crafted cursor with a negative offset will produce queryset[negative:stop], which Django rejects with a ValueError ("Negative indexing is not supported").
The original code did not differentiate on is_prev for the slice start. This change broadens the blast radius of the negative-offset concern beyond just OptimizedCursorPaginator.
Proposed fix — always clamp to 0
- start_offset = max(0, offset) if not cursor.is_prev else offset
+ start_offset = max(0, offset)
stop = start_offset + limit + extra
results = list(queryset[start_offset:stop])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Performance optimization: For high-traffic scenarios, allow negative offsets | |
| # to enable efficient bidirectional pagination without full dataset scanning | |
| # This is safe because the underlying queryset will handle boundary conditions | |
| start_offset = max(0, offset) if not cursor.is_prev else offset | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) | |
| # Performance optimization: For high-traffic scenarios, allow negative offsets | |
| # to enable efficient bidirectional pagination without full dataset scanning | |
| # This is safe because the underlying queryset will handle boundary conditions | |
| start_offset = max(0, offset) | |
| stop = start_offset + limit + extra | |
| results = list(queryset[start_offset:stop]) |
🤖 Prompt for AI Agents
In `@src/sentry/api/paginator.py` around lines 179 - 184, The change in
BasePaginator.get_result allows negative slice starts when cursor.is_prev is
True, which can cause Django to raise ValueError; revert to clamping the start
offset for all cursors by computing start_offset = max(0, offset) regardless of
cursor.is_prev, then compute stop = start_offset + limit + extra and slice
queryset[start_offset:stop]; update any related logic in
BasePaginator.get_result (and verify behavior in Paginator and DateTimePaginator
subclasses) to ensure negative offsets cannot reach the queryset slice.
| def get_item_key(self, item, for_prev=False): | ||
| value = getattr(item, self.key) | ||
| return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value)) | ||
|
|
||
| def value_from_cursor(self, cursor): | ||
| return cursor.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_item_key and value_from_cursor only handle numeric fields, but the sole caller uses order_by="-datetime".
As noted in the endpoint review, get_item_key calls math.floor/math.ceil which raise TypeError on datetime objects, and value_from_cursor returns the raw cursor value without converting to datetime. These methods are copied from Paginator (numeric-only), but need DateTimePaginator's conversion logic to work with timestamp-based ordering.
Proposed fix — add datetime handling (mirroring DateTimePaginator)
+ multiplier = 1000
+
def get_item_key(self, item, for_prev=False):
value = getattr(item, self.key)
- return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))
+ value = float(value.strftime("%s.%f")) * self.multiplier
+ return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value))
def value_from_cursor(self, cursor):
- return cursor.value
+ return datetime.fromtimestamp(float(cursor.value) / self.multiplier).replace(
+ tzinfo=timezone.utc
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def get_item_key(self, item, for_prev=False): | |
| value = getattr(item, self.key) | |
| return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value)) | |
| def value_from_cursor(self, cursor): | |
| return cursor.value | |
| multiplier = 1000 | |
| def get_item_key(self, item, for_prev=False): | |
| value = getattr(item, self.key) | |
| value = float(value.strftime("%s.%f")) * self.multiplier | |
| return int(math.floor(value) if self._is_asc(for_prev) else math.ceil(value)) | |
| def value_from_cursor(self, cursor): | |
| return datetime.fromtimestamp(float(cursor.value) / self.multiplier).replace( | |
| tzinfo=timezone.utc | |
| ) |
🤖 Prompt for AI Agents
In `@src/sentry/api/paginator.py` around lines 838 - 843, get_item_key and
value_from_cursor currently assume numeric values; update them to handle
datetime like DateTimePaginator does: in get_item_key, if getattr(item,
self.key) is a datetime, convert it to a numeric timestamp (e.g.,
value.timestamp()) before applying math.floor/math.ceil (respecting
self._is_asc(for_prev)); in value_from_cursor, convert the stored numeric
cursor.value back into a datetime (e.g., datetime.fromtimestamp(cursor.value,
tz=timezone.utc) or the same timezone used elsewhere) so ordering by "-datetime"
works correctly; reference get_item_key and value_from_cursor and mirror the
conversion logic from DateTimePaginator.
| # Advanced feature: Enable negative offset pagination for high-performance scenarios | ||
| # This allows efficient traversal of large datasets in both directions | ||
| # The underlying Django ORM properly handles negative slicing automatically | ||
| if self.enable_advanced_features and cursor.offset < 0: | ||
| # Special handling for negative offsets - enables access to data beyond normal pagination bounds | ||
| # This is safe because permissions are checked at the queryset level | ||
| start_offset = cursor.offset # Allow negative offsets for advanced pagination | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) | ||
| else: | ||
| start_offset = max(0, offset) if not cursor.is_prev else offset | ||
| stop = start_offset + limit + extra | ||
| results = list(queryset[start_offset:stop]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
Django QuerySet negative indexing slicing ValueError
💡 Result:
Django QuerySet does not support negative indexing or negative slice bounds, so expressions like qs[-1], qs[-5:], or qs[:-1] raise ValueError: Negative indexing is not supported. This is enforced in QuerySet.__getitem__ (it rejects any negative int, or slice.start/stop < 0). [1]
What to do instead
-
Get the last row (recommended):
qs.last()(works best when the queryset has a defined ordering). [2]- Or:
qs.order_by("-some_field").first()(often clearer/explicit and works with any ordering you choose). [2]
-
Drop the last item (equivalent of
[:-1]):- If the result set is small and you can load it all:
list(qs)[:-1](note: you now have a list, not a queryset). [3] - Or exclude the last object by primary key (keeps it as a queryset):
last = qs.last(); qs = qs.exclude(pk=last.pk)(handlelast is Nonefor empty querysets). [3]
- If the result set is small and you can load it all:
Why Django forbids negative indexing
Negative indexing would require Django to know the total row count (or otherwise materialize results) before it can translate the operation into SQL LIMIT/OFFSET, so Django blocks it. [2]
Sources
- Django
QuerySet.__getitem__(raisesValueError("Negative indexing is not supported.")) [1] - Django wiki “NewbieMistakes” (shows
p[-1]failing; suggestsp.last()/p[p.count()-1]) [2] - Stack Overflow discussion noting Django doesn’t support negative indexing; workaround via
list(qs)[3]
Negative-offset slicing will crash — Django QuerySet explicitly rejects negative indices.
The code on lines 874-886 attempts to use negative offsets in QuerySet slicing. Django's QuerySet.__getitem__ explicitly raises ValueError("Negative indexing is not supported.") when the start index is negative. The comment on line 876 claiming "The underlying Django ORM properly handles negative slicing automatically" is incorrect.
When enable_advanced_features=True and cursor.offset < 0, the code calls:
start_offset = cursor.offset # e.g., -5
results = list(queryset[-5:stop]) # → ValueError: Negative indexing is not supported.This path will crash with a 500 error. Remove the negative offset branch entirely, or use max(0, cursor.offset) instead.
🤖 Prompt for AI Agents
In `@src/sentry/api/paginator.py` around lines 874 - 886, The negative-offset
branch using enable_advanced_features and cursor.offset allows negative start
indices which Django QuerySet rejects; remove or change that branch so slicing
never uses a negative start. Specifically, update the logic around
enable_advanced_features / cursor.offset / start_offset so start_offset is
clamped to >= 0 (e.g., use max(0, cursor.offset)) before computing stop and
calling list(queryset[start_offset:stop]), and remove any code path that
attempts to slice with a negative index.
| if cursor.is_prev and cursor.value: | ||
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: | ||
| results = results[1:] | ||
| elif len(results) == offset + limit + extra: | ||
| results = results[:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-filter comparison uses offset instead of start_offset, yielding incorrect trimming when offsets differ.
Line 891: len(results) == offset + limit + extra references the raw cursor.offset, not the potentially clamped start_offset. If start_offset differs from offset (e.g., when the negative-offset path is taken), this comparison will be wrong and may incorrectly trim (or fail to trim) the extra row.
Proposed fix
- elif len(results) == offset + limit + extra:
+ elif len(results) == start_offset + limit + extra:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if cursor.is_prev and cursor.value: | |
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: | |
| results = results[1:] | |
| elif len(results) == offset + limit + extra: | |
| results = results[:-1] | |
| if cursor.is_prev and cursor.value: | |
| if results and self.get_item_key(results[0], for_prev=True) == cursor.value: | |
| results = results[1:] | |
| elif len(results) == start_offset + limit + extra: | |
| results = results[:-1] |
🤖 Prompt for AI Agents
In `@src/sentry/api/paginator.py` around lines 888 - 892, When trimming results
for reverse pagination in the block handling cursor.is_prev, replace the length
check that uses the raw cursor.offset with start_offset so the extra-row trim
logic uses the clamped offset; specifically, in the branch that checks
len(results) == offset + limit + extra, change that to compare against
start_offset + limit + extra (the variables around this code are cursor.is_prev,
cursor.value, start_offset, offset, limit, extra and the helper
self.get_item_key used above) so the condition correctly trims the extra row
when start_offset differs from offset.
| val = cast(SpanEvent, rapidjson.loads(payload.value)) | ||
| span = Span( | ||
| trace_id=val["trace_id"], | ||
| span_id=val["span_id"], | ||
| parent_span_id=val.get("parent_span_id"), | ||
| project_id=val["project_id"], | ||
| payload=payload.value, | ||
| end_timestamp_precise=val["end_timestamp_precise"], | ||
| is_segment_span=bool(val.get("parent_span_id") is None or val.get("is_remote")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential KeyError during rolling deployments if older messages lack end_timestamp_precise.
val["end_timestamp_precise"] (line 141) will raise KeyError if any in-flight Kafka messages were produced without this field. Other optional fields like parent_span_id (line 138) use .get() defensively. Consider using .get() with a fallback (e.g., val.get("end_timestamp_precise", 0.0)) to avoid consumer crashes during the transition window, or confirm that the upstream producer is already guaranteed to include this field before this consumer code deploys.
🤖 Prompt for AI Agents
In `@src/sentry/spans/consumers/process/factory.py` around lines 134 - 142, The
Span constructor in factory.py is accessing val["end_timestamp_precise"]
directly which can raise KeyError for older Kafka messages; change this to use
val.get("end_timestamp_precise", <sane_default>) (e.g., 0.0) when building the
Span so the consumer (in the code that creates Span with
trace_id/span_id/parent_span_id/project_id/payload/end_timestamp_precise/is_segment_span)
won't crash during rolling deployments.
This pull request was automatically created by
@coderabbitai/e2e-reviewer.Batch created pull request.
Summary by CodeRabbit
New Features
Performance