[feature](search) Add multi-field search support with fields parameter#59845
[feature](search) Add multi-field search support with fields parameter#59845airborne12 merged 6 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 31901 ms |
TPC-DS: Total hot run time: 175733 ms |
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 31243 ms |
TPC-DS: Total hot run time: 172085 ms |
ClickBench: Total hot run time: 26.74 s |
Add `fields` parameter to the search() function, allowing queries to search
across multiple fields with a single query term. Similar to Elasticsearch's
query_string `fields` parameter.
Usage examples:
- search('hello', '{"fields":["title","content"]}')
-> Equivalent to: (title:hello OR content:hello)
- search('hello world', '{"fields":["title","content"],"default_operator":"and"}')
-> Equivalent to: (title:hello OR content:hello) AND (title:world OR content:world)
- search('a AND b', '{"fields":["title","content"],"mode":"lucene"}')
-> Multi-field with Lucene boolean semantics
Key changes:
- Extended SearchOptions with `fields` array and `isMultiFieldMode()`
- Added common helper methods for DRY compliance (parseWithVisitor, expandItemAcrossFields)
- Added FieldTrackingVisitor interface for polymorphic visitor handling
- Added parseDslMultiFieldMode() and parseDslMultiFieldLuceneMode()
- Added comprehensive unit tests (15+ test cases)
- Added regression tests (18 test cases)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…cross_fields behavior - Add test data id=9 to verify cross_fields vs best_fields semantics - Add Test 2b: multi_field_multi_term_and_lucene to test default_operator:and with mode:lucene - Add Test 11b: multi_field_cross_fields_verify to explicitly verify cross_fields behavior - Update expected output file with new test case results Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
0b42651 to
7c0d178
Compare
|
run buildall |
TPC-H: Total hot run time: 31305 ms |
TPC-DS: Total hot run time: 173919 ms |
ClickBench: Total hot run time: 26.8 s |
…_fields/cross_fields modes
Add support for "type" option in multi-field search DSL to control how terms
are matched across fields:
- best_fields (default): All terms must match within the SAME field
Example: "machine learning" -> (title:machine AND title:learning) OR
(content:machine AND content:learning)
- cross_fields: Terms can match across DIFFERENT fields
Example: "machine learning" -> (title:machine OR content:machine) AND
(title:learning OR content:learning)
This aligns with Elasticsearch's default behavior where best_fields is the
default mode. Users can explicitly set type:"cross_fields" when they need
terms to be distributed across multiple fields.
Also includes:
- Input validation in setType() with IllegalArgumentException for invalid values
- Explicit type checking to prevent silent fallback behavior
- Unit tests for type parameter parsing and expansion
- Regression tests for both modes in standard and Lucene modes
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
run buildall |
TPC-H: Total hot run time: 32084 ms |
TPC-DS: Total hot run time: 173383 ms |
ClickBench: Total hot run time: 26.81 s |
|
run cloud_p0 |
|
run vault_p0 |
|
run external |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
FE Regression Coverage ReportIncrement line coverage |
…sses Make fields private with proper getter/setter methods in QsNode, QsPlan, and QsFieldBinding classes. This follows Java encapsulation best practices and allows for future validation or transformation logic. - Make all fields private with @JsonProperty getters for serialization - Add setters for QsNode.field and QsFieldBinding.fieldName (needed for field name normalization in RewriteSearchToSlots) - Update all usages to use getter/setter methods instead of direct field access - Add Javadoc to QsNode constructors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…dling - Add SearchDslSyntaxException for clearer DSL syntax error messages - Add @nullable annotations to public parseDsl method parameters - Replace deprecated ANTLRInputStream with CharStreams.fromString() - Remove duplicate fields validation in expandMultiFieldDsl methods - Add context (actual fields value) to validation exception messages - Improve error handling with specific catch blocks for different exception types (syntax errors, argument errors, internal errors) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…mize StringBuilder - Add @JsonProperty/@JsonSetter annotations to SearchOptions for declarative JSON mapping - Simplify parseOptions() from 58 lines to 20 lines using JSON_MAPPER.readValue() - Add validate() method for mutual exclusion and range checks - Add minimum_should_match negative value validation - Reuse StringBuilder with setLength(0) instead of creating new instances in tokenizeDsl() Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
run buildall |
TPC-H: Total hot run time: 31480 ms |
TPC-DS: Total hot run time: 174157 ms |
ClickBench: Total hot run time: 26.75 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
#59845) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #59394 Problem Summary: This PR adds `fields` and `type` parameters to the SEARCH function, allowing queries to search across multiple fields with a single query term. This is similar to Elasticsearch's multi_match query with `best_fields` and `cross_fields` types. #### Multi-Field Search Support ```sql -- Single term across multiple fields (best_fields mode - default) SELECT * FROM docs WHERE search('hello', '{"fields":["title","content"]}'); -- Equivalent to: (title:hello) OR (content:hello) -- Multi-term with AND operator (best_fields mode - default) SELECT * FROM docs WHERE search('hello world', '{"fields":["title","content"],"default_operator":"and"}'); -- Equivalent to: (title:hello AND title:world) OR (content:hello AND content:world) -- Multi-term with cross_fields mode SELECT * FROM docs WHERE search('hello world', '{"fields":["title","content"],"default_operator":"and","type":"cross_fields"}'); -- Equivalent to: (title:hello OR content:hello) AND (title:world OR content:world) -- Combined with Lucene mode SELECT * FROM docs WHERE search('machine AND learning', '{"fields":["title","content"],"mode":"lucene","minimum_should_match":0}'); ``` #### Type Parameter Options | Type | Description | Behavior | |------|-------------|----------| | `best_fields` (default) | All terms must match within the **SAME** field | `"hello world"` → `(title:hello AND title:world) OR (content:hello AND content:world)` | | `cross_fields` | Terms can match across **DIFFERENT** fields | `"hello world"` → `(title:hello OR content:hello) AND (title:world OR content:world)` | **Key features:** - `type` parameter controls how terms are matched across fields - `best_fields` (default): Finds documents where all terms appear in the same field - ideal for relevance ranking - `cross_fields`: Treats multiple fields as one big field - ideal for name searches across first_name/last_name - Compatible with both standard mode and Lucene boolean mode - `fields` and `default_field` are mutually exclusive - Supports functions (EXACT, ANY, ALL) across fields - Supports wildcard queries across fields **Behavior examples:** | Query | Fields | Type | Expanded DSL | |-------|--------|------|--------------| | `hello` | `["title","content"]` | best_fields | `(title:hello) OR (content:hello)` | | `hello world` (AND) | `["title","content"]` | best_fields | `(title:hello AND title:world) OR (content:hello AND content:world)` | | `hello world` (AND) | `["title","content"]` | cross_fields | `(title:hello OR content:hello) AND (title:world OR content:world)` | | `EXACT(foo bar)` | `["title","content"]` | any | `(title:EXACT(foo bar) OR content:EXACT(foo bar))` | | `hello AND category:tech` | `["title","content"]` | any | `(title:hello OR content:hello) AND category:tech` | **Use case examples:** - **Product search**: Use `best_fields` when searching product name and description - prefer products where query terms appear together - **Person name search**: Use `cross_fields` when searching first_name and last_name - "John Smith" should match documents with `first_name:John` and `last_name:Smith` ### Release note - Add multi-field search support for SEARCH function (`fields` parameter) - Add `type` parameter with `best_fields` (default) and `cross_fields` modes - `best_fields`: All terms must match within the same field (default, matches Elasticsearch behavior) - `cross_fields`: Terms can match across different fields - Compatible with Lucene mode for MUST/SHOULD/MUST_NOT semantics
apache#59845) ### What problem does this PR solve? Issue Number: close #xxx Related PR: apache#59394 Problem Summary: This PR adds `fields` and `type` parameters to the SEARCH function, allowing queries to search across multiple fields with a single query term. This is similar to Elasticsearch's multi_match query with `best_fields` and `cross_fields` types. #### Multi-Field Search Support ```sql -- Single term across multiple fields (best_fields mode - default) SELECT * FROM docs WHERE search('hello', '{"fields":["title","content"]}'); -- Equivalent to: (title:hello) OR (content:hello) -- Multi-term with AND operator (best_fields mode - default) SELECT * FROM docs WHERE search('hello world', '{"fields":["title","content"],"default_operator":"and"}'); -- Equivalent to: (title:hello AND title:world) OR (content:hello AND content:world) -- Multi-term with cross_fields mode SELECT * FROM docs WHERE search('hello world', '{"fields":["title","content"],"default_operator":"and","type":"cross_fields"}'); -- Equivalent to: (title:hello OR content:hello) AND (title:world OR content:world) -- Combined with Lucene mode SELECT * FROM docs WHERE search('machine AND learning', '{"fields":["title","content"],"mode":"lucene","minimum_should_match":0}'); ``` #### Type Parameter Options | Type | Description | Behavior | |------|-------------|----------| | `best_fields` (default) | All terms must match within the **SAME** field | `"hello world"` → `(title:hello AND title:world) OR (content:hello AND content:world)` | | `cross_fields` | Terms can match across **DIFFERENT** fields | `"hello world"` → `(title:hello OR content:hello) AND (title:world OR content:world)` | **Key features:** - `type` parameter controls how terms are matched across fields - `best_fields` (default): Finds documents where all terms appear in the same field - ideal for relevance ranking - `cross_fields`: Treats multiple fields as one big field - ideal for name searches across first_name/last_name - Compatible with both standard mode and Lucene boolean mode - `fields` and `default_field` are mutually exclusive - Supports functions (EXACT, ANY, ALL) across fields - Supports wildcard queries across fields **Behavior examples:** | Query | Fields | Type | Expanded DSL | |-------|--------|------|--------------| | `hello` | `["title","content"]` | best_fields | `(title:hello) OR (content:hello)` | | `hello world` (AND) | `["title","content"]` | best_fields | `(title:hello AND title:world) OR (content:hello AND content:world)` | | `hello world` (AND) | `["title","content"]` | cross_fields | `(title:hello OR content:hello) AND (title:world OR content:world)` | | `EXACT(foo bar)` | `["title","content"]` | any | `(title:EXACT(foo bar) OR content:EXACT(foo bar))` | | `hello AND category:tech` | `["title","content"]` | any | `(title:hello OR content:hello) AND category:tech` | **Use case examples:** - **Product search**: Use `best_fields` when searching product name and description - prefer products where query terms appear together - **Person name search**: Use `cross_fields` when searching first_name and last_name - "John Smith" should match documents with `first_name:John` and `last_name:Smith` ### Release note - Add multi-field search support for SEARCH function (`fields` parameter) - Add `type` parameter with `best_fields` (default) and `cross_fields` modes - `best_fields`: All terms must match within the same field (default, matches Elasticsearch behavior) - `cross_fields`: Terms can match across different fields - Compatible with Lucene mode for MUST/SHOULD/MUST_NOT semantics
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #59394
Problem Summary:
This PR adds
fieldsandtypeparameters to the SEARCH function, allowing queries to search across multiple fields with a single query term. This is similar to Elasticsearch's multi_match query withbest_fieldsandcross_fieldstypes.Multi-Field Search Support
Type Parameter Options
best_fields(default)"hello world"→(title:hello AND title:world) OR (content:hello AND content:world)cross_fields"hello world"→(title:hello OR content:hello) AND (title:world OR content:world)Key features:
typeparameter controls how terms are matched across fieldsbest_fields(default): Finds documents where all terms appear in the same field - ideal for relevance rankingcross_fields: Treats multiple fields as one big field - ideal for name searches across first_name/last_namefieldsanddefault_fieldare mutually exclusiveBehavior examples:
hello["title","content"](title:hello) OR (content:hello)hello world(AND)["title","content"](title:hello AND title:world) OR (content:hello AND content:world)hello world(AND)["title","content"](title:hello OR content:hello) AND (title:world OR content:world)EXACT(foo bar)["title","content"](title:EXACT(foo bar) OR content:EXACT(foo bar))hello AND category:tech["title","content"](title:hello OR content:hello) AND category:techUse case examples:
best_fieldswhen searching product name and description - prefer products where query terms appear togethercross_fieldswhen searching first_name and last_name - "John Smith" should match documents withfirst_name:Johnandlast_name:SmithRelease note
fieldsparameter)typeparameter withbest_fields(default) andcross_fieldsmodesbest_fields: All terms must match within the same field (default, matches Elasticsearch behavior)cross_fields: Terms can match across different fieldsCheck List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)