Skip to content

Conversation

@keith-turner
Copy link
Contributor

Added per scan tracing statistics for the following :

  • The count of compressed bytes read from DFS (when all data is read from cache this will be zero).
  • The count of uncompressed bytes read from DFS or cache
  • The count of bytes returned (after iterators filter)
  • The count of key/values read before any filtering
  • The count of key/values returned after filtering
  • The count of seeks done
  • Statistics for cache hits, missed, and bypasses. A bypass is when a rfile block is read w/o using the cache.

The statistics are included in a tracing span that wraps reading each each batch of key values in tablet or scan server. So if a scan reads 10 batches of key values, then 10 spans will be emitted for tracing data. Each span will included the statistics for that batch.

Added per scan tracing statistics for the following :

 * The count of compressed bytes read from DFS (when all data is read
   from cache this will be zero).
 * The count of uncompressed bytes read from DFS or cache
 * The count of bytes returned (after iterators filter)
 * The count of key/values read before any filtering
 * The count of key/values returned after filtering
 * The count of seeks done
 * Statistics for cache hits, missed, and bypasses.  A bypass
   is when a rfile block is read w/o using the cache.

The statistics are included in a tracing span that wraps reading each
each batch of key values in tablet or scan server.  So if a scan reads
10 batches of key values, then 10 spans will be emitted for tracing
data.  Each span will included the statistics for that batch.
@keith-turner keith-turner added this to the 2.1.5 milestone Dec 9, 2025
@keith-turner
Copy link
Contributor Author

I have been testing this using Jaeger. The following json was pulled from Jaeger for a single span and it shows some of the new tags in this PR. It shows a scan batch read 560,364 bytes from DFS, which turned into 1,109,757 uncompressed bytes read and finally returned 54 bytes to the user after filtering. It shows that 78,343 key values were read before any filtering was done by the scan and after filtering only 3 key values were returned. The data cache was turned off for the table, the stats show that 12 rfile data blocks were read bypassing the cache. The index cache was turned on, for the 1 rfile index block read it shows up as a cache hit.

                   "tags": [
                        {
                            "key": "accumulo.bytes.read",
                            "type": "int64",
                            "value": 1109757
                        },
                        {
                            "key": "accumulo.bytes.read.file",
                            "type": "int64",
                            "value": 560364
                        },
                        {
                            "key": "accumulo.bytes.returned",
                            "type": "int64",
                            "value": 54
                        },
                        {
                            "key": "accumulo.cache.data.bypasses",
                            "type": "int64",
                            "value": 12
                        },
                        {
                            "key": "accumulo.cache.data.hits",
                            "type": "int64",
                            "value": 0
                        },
                        {
                            "key": "accumulo.cache.data.misses",
                            "type": "int64",
                            "value": 0
                        },
                        {
                            "key": "accumulo.cache.index.bypasses",
                            "type": "int64",
                            "value": 0
                        },
                        {
                            "key": "accumulo.cache.index.hits",
                            "type": "int64",
                            "value": 1
                        },
                        {
                            "key": "accumulo.cache.index.misses",
                            "type": "int64",
                            "value": 0
                        },
                        {
                            "key": "accumulo.entries.read",
                            "type": "int64",
                            "value": 78343
                        },
                        {
                            "key": "accumulo.entries.returned",
                            "type": "int64",
                            "value": 3
                        },
                        {
                            "key": "accumulo.executor",
                            "type": "string",
                            "value": "default"
                        },
                        {
                            "key": "accumulo.extent",
                            "type": "string",
                            "value": "1;04fe;04de"
                        },
                        {
                            "key": "accumulo.seeks",
                            "type": "int64",
                            "value": 1
                        },
                        {
                            "key": "accumulo.server",
                            "type": "string",
                            "value": "localhost:9997"
                        },
                        {
                            "key": "accumulo.table.id",
                            "type": "string",
                            "value": "1"
                        },
                        {
                            "key": "otel.scope.name",
                            "type": "string",
                            "value": "org.apache.accumulo"
                        },
                        {
                            "key": "otel.scope.version",
                            "type": "string",
                            "value": "2.1.5-SNAPSHOT"
                        },
                        {
                            "key": "span.kind",
                            "type": "string",
                            "value": "internal"
                        },
                        {
                            "key": "thread.id",
                            "type": "int64",
                            "value": 1215
                        },
                        {
                            "key": "thread.name",
                            "type": "string",
                            "value": "Client: localhost:58832 User: root Start: 1765238204805 Tablet: 1;04fe;04de"
                        }
                    ],

Wanted to see if its possible to pull the data from all spans from Jaeger and sum them up. This allows seeing the stats for a single scan as it hits many servers across the cluster. Wrote the following script to do this.

#!/bin/bash

YSERVER=$1
TRACE_ID=$2

trace_tmp="$(mktemp -t trace.XXXXXXXX.json)"

curl -s http://$YSERVER:16686/api/traces/$TRACE_ID?prettyPrint=false > $trace_tmp

echo "Sum of all traces"
cat $trace_tmp | jq -r '.data[0].spans[] | .tags | .[] | [.key,.type,.value] | @csv' | grep int64 | grep -v thread.id | awk -F ',' '{a[$1] += $3} END{for (i in a) print "\t" i, a[i]}' | sort

Ran the script against a traced a scan that filtered data on 128 tablets across 4 tservers. That worked well and the 333 entries returned is how many key values that scan actually returned, so that was a good test. The json from Jaeger had at least 128 spans, like the example above, that needed to be rolled up.

$ ./sum_scan_traces.sh localhost c22f939eb5dc1ba294b6aeae7941ebb5
Sum of all traces
	"accumulo.bytes.read" 162803236
	"accumulo.bytes.read.file" 82245238
	"accumulo.bytes.returned" 5972
	"accumulo.cache.data.bypasses" 1694
	"accumulo.cache.data.hits" 0
	"accumulo.cache.data.misses" 0
	"accumulo.cache.index.bypasses" 0
	"accumulo.cache.index.hits" 191
	"accumulo.cache.index.misses" 0
	"accumulo.entries.read" 9988264
	"accumulo.entries.returned" 333
	"accumulo.seeks" 128

@keith-turner
Copy link
Contributor Author

There are a few TODOs still in the PR. Some of them I need to open issues for and remove. Some of them are questions I had while working on this.

@dlmarion
Copy link
Contributor

How is the traceId created? Is it something created by OpenTelemetry, or is it something that we create? I created #2956 a while back in an effort to help users correlate scan information across the servers. Allowing the client to set the id, and then using that id in the log messages and traces could help users single out issues or performance information for a single scan.

@dlmarion
Copy link
Contributor

"accumulo.bytes.read" 162803236
"accumulo.bytes.read.file" 82245238
"accumulo.bytes.returned" 5972
"accumulo.cache.data.bypasses" 1694
"accumulo.cache.data.hits" 0
"accumulo.cache.data.misses" 0
"accumulo.cache.index.bypasses" 0
"accumulo.cache.index.hits" 191
"accumulo.cache.index.misses" 0
"accumulo.entries.read" 9988264
"accumulo.entries.returned" 333
"accumulo.seeks" 128

Suggest adding scan to the names where it makes sense. For example, accumulo.entries.read, could be relevant for compactions as well. Maybe it doesn't matter in the tracing because it's per thread of execution? It would certainly matter for any metrics emitted as those will be aggregated.

@keith-turner
Copy link
Contributor Author

How is the traceId created? Is it something created by OpenTelemetry, or is it something that we create? I created #2956 a while back in an effort to help users correlate scan information across the servers. Allowing the client to set the id, and then using that id in the log messages and traces could help users single out issues or performance information for a single scan.

Seems like when a new span is created and there is no active trace on the thread that opentelemetry will create a new random trace id. That trace id can be retrieved from the span. If a user create a spans on the client side that wraps their scan then that trace id will be available to anything server side that creates a span and ask for the trace id.

@dlmarion
Copy link
Contributor

How is the traceId created? Is it something created by OpenTelemetry, or is it something that we create? I created #2956 a while back in an effort to help users correlate scan information across the servers. Allowing the client to set the id, and then using that id in the log messages and traces could help users single out issues or performance information for a single scan.

Seems like when a new span is created and there is no active trace on the thread that opentelemetry will create a new random trace id. That trace id can be retrieved from the span. If a user create a spans on the client side that wraps their scan then that trace id will be available to anything server side that creates a span and ask for the trace id.

Ok, so a client application that also uses OpenTelemetry could start a span before calling the Accumulo client. The span will have a random generated trace id, but they could set an attribute on the span that they create with information from their application that will help them identify the thread of execution on the server side. I looked at one resource that says Jaeger "Tags" in their UI map to the OpenTelemetry attributes.

@keith-turner
Copy link
Contributor Author

keith-turner commented Dec 15, 2025

Ok, so a client application that also uses OpenTelemetry could start a span before calling the Accumulo client.

I added an IT in 2b79fd9 that spins up an external process that traces two scans (in the class ScanTraceClient). When the test runs it sees the two trace ids for each scan.

@keith-turner
Copy link
Contributor Author

Suggest adding scan to the names where it makes sense. For example, accumulo.entries.read, could be relevant for compactions as well. Maybe it doesn't matter in the tracing because it's per thread of execution? It would certainly matter for any metrics emitted as those will be aggregated.

Made that change in ae2abb7. These attributes are only used in a span related to scan, so they are scoped to scans in that way. Its probably good to have good attribute names across all span types though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you modified here exactly, but commons-io also has a counting input stream. See https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BoundedInputStream.html.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into that. I added two methods, one to reset the count to zero and another to get the wrapped input stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not see a way to reset the count w/ BoundedInputStream

Comment on lines +348 to +352
<dependency>
<groupId>io.opentelemetry.javaagent</groupId>
<artifactId>opentelemetry-javaagent</artifactId>
<!-- This version was selected because it aligns with the version of open telemetry that Accumulo 2.1 is using. -->
<version>2.14.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add this to the assemble pom and distribute it (assuming it's allowed). That would alleviate having to try and keep the Terraform code in accumulo-testing up to date (apache/accumulo-testing#293 for example).

Copy link
Contributor Author

@keith-turner keith-turner Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not sure if we should use the latest version of the agent. I looked at these release notes and found a version that corresponded to the version Accumulo is currently using. However I am not sure if that matters. There are many later versions of the agent.

Comment on lines 321 to 323
private static final AttributeKey<Long> ENTRIES_READ_KEY =
AttributeKey.longKey("accumulo.scan.entries.read");
private static final AttributeKey<Long> SEEKS_KEY = AttributeKey.longKey("accumulo.scan.seeks");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should place these in one location, like we do with the Metrics class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to make them public too, I saw that the same strings are used in an IT, where it could reference the variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made that change in 7ae602a

Copy link
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

keith-turner added a commit to keith-turner/accumulo that referenced this pull request Dec 16, 2025
CachableBlockFile.Reader.getBCFile() always read the RFile root block
from the index cache even if would not use it. Modified the method to
only read from the index cache when the data is actually needed.
Noticed this while working on apache#6010, saw for every data block read from
an an rfile it would also read from the index cache.

Ran mvn verify -Psunny w/ these changes.
@keith-turner keith-turner merged commit f844dc0 into apache:2.1 Dec 16, 2025
8 checks passed
@keith-turner keith-turner deleted the scan-tracing2 branch December 16, 2025 19:53
keith-turner added a commit to keith-turner/accumulo that referenced this pull request Dec 16, 2025
CachableBlockFile.Reader.getBCFile() always read the RFile root block
from the index cache even if would not use it. Modified the method to
only read from the index cache when the data is actually needed.
Noticed this while working on apache#6010, saw for every data block read from
an an rfile it would also read from the index cache.

Ran mvn verify -Psunny w/ these changes.

# Conflicts:
#	core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java
keith-turner added a commit to keith-turner/accumulo that referenced this pull request Dec 16, 2025
CachableBlockFile.Reader.getBCFile() always read the RFile root block
from the index cache even if would not use it. Modified the method to
only read from the index cache when the data is actually needed.
Noticed this while working on apache#6010, saw for every data block read from
a rfile it would also read from the index cache.

Adjusted ScanTracingIT because before this change there was an rfile
index cache read per rfile data block read.  After this change the
number of rfile index reads align w/ the number of rfiles opened, which
in this case aligns with the number of tablets in the test tables.

The existing behavior was likely not a performance problem if the data
was in cache, but could potentially cause thread contention and extra
CPU usage. So its probably good to fix, also nice to make the tracing
stats align better w/ expectations.
keith-turner added a commit that referenced this pull request Dec 18, 2025
CachableBlockFile.Reader.getBCFile() always read the RFile root block
from the index cache even if would not use it. Modified the method to
only read from the index cache when the data is actually needed.
Noticed this while working on #6010, saw for every data block read from
a rfile it would also read from the index cache.

Adjusted ScanTracingIT because before this change there was an rfile
index cache read per rfile data block read.  After this change the
number of rfile index reads align w/ the number of rfiles opened, which
in this case aligns with the number of tablets in the test tables.

The existing behavior was likely not a performance problem if the data
was in cache, but could potentially cause thread contention and extra
CPU usage. So its probably good to fix, also nice to make the tracing
stats align better w/ expectations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants