Skip to content

Conversation

@kevinrr888
Copy link
Member

This adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This adds iterator conflict checks to:

  • Scanner.addScanIterator
  • TableOperations.setProperty
  • TableOperations.modifyProperties
  • NewTableConfiguration.attachIterator
  • NamespaceOperations.attachIterator (was previously only checking for conflicts with iterators in the namespace, now also checks for conflicts with iterators in the tables of the namespace)
  • NamespaceOperations.setProperty
  • NamespaceOperations.modifyProperties

This also accounts for the several ways in which conflicts can arise:

  • Iterators that are attached directly to a table (either through TableOperations.attachIterator, TableOperations.setProperty, or TableOperations.modifyProperties)
  • Iterators that are attached to a namespace, inherited by a table (either through NamespaceOperations.attachIterator, NamespaceOperations.setProperty, or NamespaceOperations.modifyProperties)
  • Conflicts with default table iterators (if the table has them)
  • Adding the exact iterator already present should not fail

This commit also adds a new IteratorConflictsIT to test all of the above.

Part of #6030

This commit adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This commit adds iterator conflict checks to:
- Scanner.addScanIterator
- TableOperations.setProperty
- TableOperations.modifyProperties
- NewTableConfiguration.attachIterator

Note that this does not add conflict checks to NamespaceOperations.setProperty or NamespaceOperations.modifyProperties, these will be done in another commit.

This commit also accounts for the several ways in which conflicts can arise:
- Iterators that are attached directly to a table (either through TableOperations.attachIterator, TableOperations.setProperty, or TableOperations.modifyProperties)
- Iterators that are attached to a namespace, inherited by a table (either through NamespaceOperations.attachIterator, NamespaceOperations.setProperty, or NamespaceOperations.modifyProperties)
- Conflicts with default table iterators (if the table has them)
- Adding the exact iterator already present should not fail

This commit also adds a new IteratorConflictsIT to test all of the above.

Part of apache#6030
Adds conflict checks to:
- NamespaceOperations.attachIterator (was previously only checking for conflicts with iterators in the namespace, now also checks for conflicts with iterators in the tables of the namespace)
- NamespaceOperations.setProperty (check conflicts with namespace iterators and all tables in the namespace)
- NamespaceOperations.modifyProperties (check conflicts with namespace iterators and all tables in the namespace)

New tests to IteratorConflictsIT to test the above
@kevinrr888 kevinrr888 added this to the 2.1.5 milestone Jan 7, 2026
@kevinrr888 kevinrr888 self-assigned this Jan 7, 2026
Copy link
Member Author

@kevinrr888 kevinrr888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From running sunny day tests and all the tests I have changed in this PR, noticed that I unknowingly added new permission requirements to at least TableOperations.create() (new required permission ALTER_NAMESPACE) and Scanner.addScanIterator() (new required permission ALTER_TABLE). I imagine this is a blocker for these changes at this point, but let me know if it's not. I'll look into an alternative to avoid these permissions. See changes to ConditionalWriterIT, ScanIteratorIT, and ShellServerIT for examples of the failures I encountered.

Comment on lines 221 to 226
try {
TableOperationsHelper.checkIteratorConflicts(noDefaultsPropMap, setting, scopes);
TableOperationsHelper.checkIteratorConflicts(propertyMap, setting, scopes);
} catch (AccumuloException e) {
throw new IllegalStateException(String.format(
throw new AccumuloException(String.format(
"conflict with default table iterator: scopes: %s setting: %s", scopes, setting), e);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor mistake when I had originally written this to throw IllegalStateException. Changed to AccumuloException. This is called exclusively by TableOperations.create(), so I thought AccumuloException would be a more appropriate exception for this. This does change a public API method signature, so this may not be desired. If that is the case, I can change this to IllegalArgumentException, which is probably more appropriate than IllegalStateException

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general adding or removing checked exceptions from public API methods will always result in source and binary incompatibilities. When adding checked exceptions any calling code that does not already catch or throw the exception will now fail at compile or runtime. When removing checked exceptions it can cause code that catches them unnecessarily to fail. So would not want to make this change in 2.1 or even 4.0. The only way I know to change checked exceptions cleanly is to deprecate the method and add a new method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured there would be an issue with this change. Ill change to IllegalArgumentException

Comment on lines -219 to +239
Preconditions.checkState(valInPropMap == null || valInPropMap.equals(dv), String.format(
"conflict for property %s : %s (default val) != %s (set val)", dk, dv, valInPropMap));
});
if (valInPropMap != null && !valInPropMap.equals(dv)) {
throw new AccumuloException(String.format(
"conflict for property %s : %s (default val) != %s (set val)", dk, dv, valInPropMap));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about changing from IllegalStateException to AccumuloException

Comment on lines +48 to +50
* <p>
* - {@link Scanner#addScanIterator(IteratorSetting)}
* <p>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator conflict tests I added for addScanIterator really only test ScannerImpl impl of ScannerOptions not sure if its worthwhile to add testing for all impls

*
* @param props the parent namespace config
*/
public void configureInheritedIteratorProps(Map<String,String> props) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this had to be public to be accessible in TableOperationsImpl... I think this is fine since a similar thing is done for getProperties(), but let me know if this addition to the public API for 2.1 is a problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code is only needed by internal code, then its best to avoid adding to the public API. Would it be possible to do somethnig like the following instead in TableOperationsImpl.create() (or maybe even in the manager create table code, it usually better to do validation on the server side for things like this rather than trust the client to do it)?

  var tableProps = ntc.getProperties();
  var namespaceProps =  context.namespaceOperations().getNamespaceProperties(ns);
 // TODO throw exception if namespace and table iter props conflict

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, working on doing the validation server side (which removes the need for this method). I started working on this change because getting the namespace properties within TableOperations.create() results in requiring the user to have the ALTER_NAMESPACE permission, which we don't want for a create() table call

Comment on lines -207 to +222
TableOperationsHelper.checkIteratorConflicts(noDefaultsPropMap, setting, scopes);
TableOperationsHelper.checkIteratorConflicts(propertyMap, setting, scopes);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could remove noDefaultsPropMap since I pushed the check for equality into checkIteratorConflicts

Comment on lines +382 to +388
String valStr = String.format("%s,%s", setting.getPriority(), setting.getIteratorClass());
Map<String,String> optionConflicts = new TreeMap<>();
// skip if the setting is present in the map... not a conflict if exactly the same
if (props.containsKey(nameStr) && props.get(nameStr).equals(valStr)
&& IteratorConfigUtil.containsSameIterOpts(props, setting, optStr)) {
continue;
}
Copy link
Member Author

@kevinrr888 kevinrr888 Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is the same as before except the addition of "valStr" and this if check.

Moved here since same code was used for TableOperationsHelper.checkIteratorConflicts and NamespaceOperationsHelper.checkIteratorConflicts.

@kevinrr888
Copy link
Member Author

Transferring to WIP until I resolve #6040 (review)

@kevinrr888 kevinrr888 marked this pull request as draft January 8, 2026 15:53
private Consumer<IteratorSetting> getScanIteratorValidator(Callable<String> tableNameGetter) {
return (givenIter) -> {
try {
tableOperations().checkIteratorConflicts(tableNameGetter.call(), givenIter,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time an iterator is added to a scanner on the client side this code wil make an RPC to get the tables full config and then check it. This could be expensive and slow down code that is current creating a scanner and adding lots of iterators.

Instead of doing all this validation client side, we could potentially do it server side and not make any changes to the client side code. We are already passing iterators to servers for the scan RPCs, we could potentialy check for the conflicts in the following code and throw an exception if any are seen.

// Scan time iterator options were set, so need to merge those with pre-parsed table
// iterator options.
iterOpts = new HashMap<>(pic.getOpts().size() + scanParams.getSsio().size());
iterInfos = new ArrayList<>(pic.getIterInfo().size() + scanParams.getSsiList().size());
IteratorConfigUtil.mergeIteratorConfig(iterInfos, iterOpts, pic.getIterInfo(),
pic.getOpts(), scanParams.getSsiList(), scanParams.getSsio());

Not sure if that is workable, would be good to rule this in or out as an option. Would be nice to efficiently do the check leveraging the existing merge code if possible.

Comment on lines 221 to 226
try {
TableOperationsHelper.checkIteratorConflicts(noDefaultsPropMap, setting, scopes);
TableOperationsHelper.checkIteratorConflicts(propertyMap, setting, scopes);
} catch (AccumuloException e) {
throw new IllegalStateException(String.format(
throw new AccumuloException(String.format(
"conflict with default table iterator: scopes: %s setting: %s", scopes, setting), e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general adding or removing checked exceptions from public API methods will always result in source and binary incompatibilities. When adding checked exceptions any calling code that does not already catch or throw the exception will now fail at compile or runtime. When removing checked exceptions it can cause code that catches them unnecessarily to fail. So would not want to make this change in 2.1 or even 4.0. The only way I know to change checked exceptions cleanly is to deprecate the method and add a new method.

*
* @param props the parent namespace config
*/
public void configureInheritedIteratorProps(Map<String,String> props) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the code is only needed by internal code, then its best to avoid adding to the public API. Would it be possible to do somethnig like the following instead in TableOperationsImpl.create() (or maybe even in the manager create table code, it usually better to do validation on the server side for things like this rather than trust the client to do it)?

  var tableProps = ntc.getProperties();
  var namespaceProps =  context.namespaceOperations().getNamespaceProperties(ns);
 // TODO throw exception if namespace and table iter props conflict

@ctubbsii
Copy link
Member

ctubbsii commented Jan 8, 2026

Discussed iterator conflicts today, and here's a summary of some key points:

  1. Conflict within the config: In configuration, no two iterators at the same scope (scan, minc, majc) may be able to have the same priority.
    • This applies only to the complete view of the TableConfiguration, with all inherited properties from parent configs (namespace, system, ...), so it is okay, for example, if a table config set at the namespace level is overridden in part at the table level, so that the one single iterator at that scope and priority has configuration that spans across two levels of the configs. What is important is that the resulting view of the TableConfiguration when trying to construct an iterator stack, will not show any two different iterators at the same scope with the same priority.
    • Checks could be in place when editing table/namespace configuration to ensure a priority isn't "doubled up". A user who wishes to replace iteratorA with iteratorB at the same priority would have to remove iteratorA before adding iteratorB, or would have to use modifyProperties to atomically mutate the properties to remove and add at the same time, in order to avoid an error. This alone, however, does not guarantee that there isn't a conflict. If iteratorA had been set at the namespaceN.tableT, but iteratorB was being added to namespaceN, we would have to check that there isn't a conflict with any of the tables in namespaceN. That's not exactly practical, so we may just want to check that there isn't a conflict at the level being modified, and rely on later checks when setting up the iterator stack to verify that there isn't a conflict overall.
    • Note: this probably would be easier to deconflict if our iterator configs used a different property key scheme that was more overrideable atomically, like table.<scope>.iterator.<priority>=class,opt1key=opt1val,opt2key=opt2val,.... so that it wouldn't be possible to have conflict between namespace and table configs, because one would fully override the other. But, that's not what we have today.
  2. Conflict in user-supplied iterators for a specific scan/compaction: No two iterators in a single client-initiated operation's settings may have the same priority.
    • This is for things like scan-time iterators set in the API for a scan, and passed over the RPC, rather than iterators set in configuration on the server-side.
    • This also applies to any other place where we might be able to specify iterators that aren't in the configuration (compactions, conditional mutations, etc.)
    • We could check for conflicts set on the scanner easily, but would have to rely on the server-side setting up the iterator stack to ensure no conflicts between the user iterators and those set on the table.
  3. Conflict between configured iterators and user-supplied iterators: The complete iterator stack for an operation may not have any iterators running at the same priority, regardless of whether it came from the configuration or from the client API/RPC request.
    • To address this, we can simply check the full iterator stack when it is being constructed on the server-side, and fail the operation if any priorities are reused, regardless of where they came from.
    • Alternatively, we could treat one as overriding another, but I don't think that's a very good idea.
    • As a follow-on improvement here, we could treat all configured iterators as higher priority than all client operation-specific (scan-time/compaction-time) iterators:
      1. Instead of ordering three configured iterators and two user-supplied iterators by priority alone, as in C1, U2, C3, U4, C5, we would instead order them as C1, C3, C5, U2, U4.
      2. This enables stronger security guarantees by preventing a user-supplied iterator from seeing data that is filtered out in a administrator-configured iterator.
      3. This prevents bugs that could be caused by a user-supplied iterator that transforms data in a way that a subsequent administrator-configured iterator won't be able to handle.
      4. This is a behavior change, and may break some people's (ill-advised) uses, but I think it is better overall.
      5. This would also open the possibility of having a cleaner client-side API, because you don't actually have to specify priority numbers on the client. Instead, clients only need to order user-supplied iterators with respect to other user-supplied iterators, and won't need a priority number to indicate a global ordering that includes the configured iterators for a table. So, we could have an API something like: scanner.map(iterator1).map(iterator2).map(iterator3).scan().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants