-
Notifications
You must be signed in to change notification settings - Fork 472
adds per scan tracing statistics #6010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added per scan tracing statistics for the following : * The count of compressed bytes read from DFS (when all data is read from cache this will be zero). * The count of uncompressed bytes read from DFS or cache * The count of bytes returned (after iterators filter) * The count of key/values read before any filtering * The count of key/values returned after filtering * The count of seeks done * Statistics for cache hits, missed, and bypasses. A bypass is when a rfile block is read w/o using the cache. The statistics are included in a tracing span that wraps reading each each batch of key values in tablet or scan server. So if a scan reads 10 batches of key values, then 10 spans will be emitted for tracing data. Each span will included the statistics for that batch.
|
I have been testing this using Jaeger. The following json was pulled from Jaeger for a single span and it shows some of the new tags in this PR. It shows a scan batch read 560,364 bytes from DFS, which turned into 1,109,757 uncompressed bytes read and finally returned 54 bytes to the user after filtering. It shows that 78,343 key values were read before any filtering was done by the scan and after filtering only 3 key values were returned. The data cache was turned off for the table, the stats show that 12 rfile data blocks were read bypassing the cache. The index cache was turned on, for the 1 rfile index block read it shows up as a cache hit. "tags": [
{
"key": "accumulo.bytes.read",
"type": "int64",
"value": 1109757
},
{
"key": "accumulo.bytes.read.file",
"type": "int64",
"value": 560364
},
{
"key": "accumulo.bytes.returned",
"type": "int64",
"value": 54
},
{
"key": "accumulo.cache.data.bypasses",
"type": "int64",
"value": 12
},
{
"key": "accumulo.cache.data.hits",
"type": "int64",
"value": 0
},
{
"key": "accumulo.cache.data.misses",
"type": "int64",
"value": 0
},
{
"key": "accumulo.cache.index.bypasses",
"type": "int64",
"value": 0
},
{
"key": "accumulo.cache.index.hits",
"type": "int64",
"value": 1
},
{
"key": "accumulo.cache.index.misses",
"type": "int64",
"value": 0
},
{
"key": "accumulo.entries.read",
"type": "int64",
"value": 78343
},
{
"key": "accumulo.entries.returned",
"type": "int64",
"value": 3
},
{
"key": "accumulo.executor",
"type": "string",
"value": "default"
},
{
"key": "accumulo.extent",
"type": "string",
"value": "1;04fe;04de"
},
{
"key": "accumulo.seeks",
"type": "int64",
"value": 1
},
{
"key": "accumulo.server",
"type": "string",
"value": "localhost:9997"
},
{
"key": "accumulo.table.id",
"type": "string",
"value": "1"
},
{
"key": "otel.scope.name",
"type": "string",
"value": "org.apache.accumulo"
},
{
"key": "otel.scope.version",
"type": "string",
"value": "2.1.5-SNAPSHOT"
},
{
"key": "span.kind",
"type": "string",
"value": "internal"
},
{
"key": "thread.id",
"type": "int64",
"value": 1215
},
{
"key": "thread.name",
"type": "string",
"value": "Client: localhost:58832 User: root Start: 1765238204805 Tablet: 1;04fe;04de"
}
],Wanted to see if its possible to pull the data from all spans from Jaeger and sum them up. This allows seeing the stats for a single scan as it hits many servers across the cluster. Wrote the following script to do this. #!/bin/bash
YSERVER=$1
TRACE_ID=$2
trace_tmp="$(mktemp -t trace.XXXXXXXX.json)"
curl -s http://$YSERVER:16686/api/traces/$TRACE_ID?prettyPrint=false > $trace_tmp
echo "Sum of all traces"
cat $trace_tmp | jq -r '.data[0].spans[] | .tags | .[] | [.key,.type,.value] | @csv' | grep int64 | grep -v thread.id | awk -F ',' '{a[$1] += $3} END{for (i in a) print "\t" i, a[i]}' | sort
Ran the script against a traced a scan that filtered data on 128 tablets across 4 tservers. That worked well and the 333 entries returned is how many key values that scan actually returned, so that was a good test. The json from Jaeger had at least 128 spans, like the example above, that needed to be rolled up. |
|
There are a few TODOs still in the PR. Some of them I need to open issues for and remove. Some of them are questions I had while working on this. |
|
How is the traceId created? Is it something created by OpenTelemetry, or is it something that we create? I created #2956 a while back in an effort to help users correlate scan information across the servers. Allowing the client to set the id, and then using that id in the log messages and traces could help users single out issues or performance information for a single scan. |
Suggest adding |
Seems like when a new span is created and there is no active trace on the thread that opentelemetry will create a new random trace id. That trace id can be retrieved from the span. If a user create a spans on the client side that wraps their scan then that trace id will be available to anything server side that creates a span and ask for the trace id. |
Ok, so a client application that also uses OpenTelemetry could start a span before calling the Accumulo client. The span will have a random generated trace id, but they could set an attribute on the span that they create with information from their application that will help them identify the thread of execution on the server side. I looked at one resource that says Jaeger "Tags" in their UI map to the OpenTelemetry attributes. |
I added an IT in 2b79fd9 that spins up an external process that traces two scans (in the class ScanTraceClient). When the test runs it sees the two trace ids for each scan. |
Made that change in ae2abb7. These attributes are only used in a span related to scan, so they are scoped to scans in that way. Its probably good to have good attribute names across all span types though. |
core/src/main/java/org/apache/accumulo/core/trace/ScanInstrumentation.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you modified here exactly, but commons-io also has a counting input stream. See https://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/BoundedInputStream.html.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into that. I added two methods, one to reset the count to zero and another to get the wrapped input stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not see a way to reset the count w/ BoundedInputStream
| <dependency> | ||
| <groupId>io.opentelemetry.javaagent</groupId> | ||
| <artifactId>opentelemetry-javaagent</artifactId> | ||
| <!-- This version was selected because it aligns with the version of open telemetry that Accumulo 2.1 is using. --> | ||
| <version>2.14.0</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add this to the assemble pom and distribute it (assuming it's allowed). That would alleviate having to try and keep the Terraform code in accumulo-testing up to date (apache/accumulo-testing#293 for example).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was not sure if we should use the latest version of the agent. I looked at these release notes and found a version that corresponded to the version Accumulo is currently using. However I am not sure if that matters. There are many later versions of the agent.
| private static final AttributeKey<Long> ENTRIES_READ_KEY = | ||
| AttributeKey.longKey("accumulo.scan.entries.read"); | ||
| private static final AttributeKey<Long> SEEKS_KEY = AttributeKey.longKey("accumulo.scan.seeks"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should place these in one location, like we do with the Metrics class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to make them public too, I saw that the same strings are used in an IT, where it could reference the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made that change in 7ae602a
test/src/main/java/org/apache/accumulo/test/tracing/ScanTracingIT.java
Outdated
Show resolved
Hide resolved
…gIT.java Co-authored-by: Dave Marion <dlmarion@apache.org>
dlmarion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
CachableBlockFile.Reader.getBCFile() always read the RFile root block from the index cache even if would not use it. Modified the method to only read from the index cache when the data is actually needed. Noticed this while working on apache#6010, saw for every data block read from an an rfile it would also read from the index cache. Ran mvn verify -Psunny w/ these changes.
CachableBlockFile.Reader.getBCFile() always read the RFile root block from the index cache even if would not use it. Modified the method to only read from the index cache when the data is actually needed. Noticed this while working on apache#6010, saw for every data block read from an an rfile it would also read from the index cache. Ran mvn verify -Psunny w/ these changes. # Conflicts: # core/src/main/java/org/apache/accumulo/core/file/blockfile/impl/CachableBlockFile.java
CachableBlockFile.Reader.getBCFile() always read the RFile root block from the index cache even if would not use it. Modified the method to only read from the index cache when the data is actually needed. Noticed this while working on apache#6010, saw for every data block read from a rfile it would also read from the index cache. Adjusted ScanTracingIT because before this change there was an rfile index cache read per rfile data block read. After this change the number of rfile index reads align w/ the number of rfiles opened, which in this case aligns with the number of tablets in the test tables. The existing behavior was likely not a performance problem if the data was in cache, but could potentially cause thread contention and extra CPU usage. So its probably good to fix, also nice to make the tracing stats align better w/ expectations.
CachableBlockFile.Reader.getBCFile() always read the RFile root block from the index cache even if would not use it. Modified the method to only read from the index cache when the data is actually needed. Noticed this while working on #6010, saw for every data block read from a rfile it would also read from the index cache. Adjusted ScanTracingIT because before this change there was an rfile index cache read per rfile data block read. After this change the number of rfile index reads align w/ the number of rfiles opened, which in this case aligns with the number of tablets in the test tables. The existing behavior was likely not a performance problem if the data was in cache, but could potentially cause thread contention and extra CPU usage. So its probably good to fix, also nice to make the tracing stats align better w/ expectations.
Added per scan tracing statistics for the following :
The statistics are included in a tracing span that wraps reading each each batch of key values in tablet or scan server. So if a scan reads 10 batches of key values, then 10 spans will be emitted for tracing data. Each span will included the statistics for that batch.