feat: add passive CPE and WordPress detection #2366

dogancanbakir · 2026-01-06T10:24:28Z

Summary

Adds support for passive detection of CPE (Common Platform Enumeration) identifiers and WordPress plugins/themes using the awesome-search-queries database.

CPE Detection (`-cpe` flag)

Matches response title, body, and favicon hash against patterns extracted from Shodan, FOFA, and Google dork queries
Extracts product and vendor information
Generates CPE 2.3 identifiers (e.g., cpe:2.3:a:vendor:product:*:*:*:*:*:*:*:*)

WordPress Detection (`-wp` flag)

Detects plugins via /wp-content/plugins/[name]/ patterns in response body
Detects themes via /wp-content/themes/[name]/ patterns in response body
Validates against known plugins/themes list from awesome-search-queries

New CLI Flags (PROBES group)

Flag	Description
`-cpe`	Display CPE (Common Platform Enumeration) based on awesome-search-queries
`-wp, -wordpress`	Display WordPress plugins and themes

Both are automatically included in JSON/CSV output.

Testing

# Test CPE detection
echo "https://jira.atlassian.com" | go run . -cpe -silent
# Output: https://jira.atlassian.com [cpe:2.3:a:stagil:stagil_navigation:*:*:*:*:*:*:*:*]

# Test WordPress detection
echo "https://wordpress.org" | go run . -wp -silent
# Output: https://wordpress.org [wp-plugins:gutenberg]

# Test both flags together
echo "https://wordpress.org" | go run . -cpe -wp -silent
# Output: https://wordpress.org [cpe:2.3:a:webp:webp_server_go:*:*:*:*:*:*:*:*] [wp-plugins:gutenberg]

# Test JSON output (CPE and WordPress included automatically)
echo "https://wordpress.org" | go run . -j -silent | jq '{cpe,wordpress}'
# Output:
# {
#   "cpe": [{"product": "webp_server_go", "vendor": "webp", "cpe": "cpe:2.3:a:webp:webp_server_go:*:*:*:*:*:*:*:*"}],
#   "wordpress": {"plugins": ["gutenberg"]}
# }

# Test with tech-detect for comparison
echo "https://wordpress.org" | go run . -td -cpe -wp -silent

Test plan

Test CPE detection on various sites (Jenkins, Jira, GitLab)
Test WordPress plugin detection
Test WordPress theme detection
Test JSON output includes CPE and WordPress data
Test CSV output includes CPE and WordPress data
Verify no false positives with validation against known lists

Closes #1975

Summary by CodeRabbit

Release Notes

New Features
- Added CPE (Common Platform Enumeration) detection to identify and display software products, vendors, and versions discovered during scans.
- Added WordPress detection to identify and display installed WordPress plugins and themes.
Documentation
- Updated README with new command-line flags for CPE detection (-cpe) and WordPress detection (-wordpress).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Add support for passive detection of CPE (Common Platform Enumeration) identifiers and WordPress plugins/themes using awesome-search-queries. CPE Detection (-cpe flag): - Matches response title, body, and favicon hash against patterns - Extracts product, vendor, and generates CPE 2.3 identifiers - Uses patterns from Shodan, FOFA, Google dorks WordPress Detection (-wp flag): - Detects plugins via /wp-content/plugins/[name]/ patterns - Detects themes via /wp-content/themes/[name]/ patterns - Validates against known plugins/themes list New CLI flags in PROBES group: - -cpe: display CPE based on awesome-search-queries - -wp, -wordpress: display WordPress plugins and themes Both are automatically included in JSON/CSV output. Closes #1975

coderabbitai · 2026-01-06T10:24:39Z

Walkthrough

The PR adds passive CPE and WordPress detection capabilities to httpx by integrating the awesome-search-queries library. Two new detector modules are introduced: CPEDetector for identifying products via pattern matching against title, body, and favicon hashes, and WordPressDetector for extracting WordPress plugin and theme names from HTML responses. Both detectors are initialized conditionally in the runner, and their results are attached to the output Result structure via new fields.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Added `-cpe` and `-wordpress` flag documentation to the PROBES section; reformatted alignment and expanded flag descriptions.
Dependency Management `go.mod`	Added indirect dependency on `github.com/projectdiscovery/awesome-search-queries` for query and plugin/theme data.
CLI Options `runner/options.go`	Introduced `CPEDetect` and `WordPress` boolean fields to `ScanOptions` and `Options` structs; added corresponding `-cpe` and `-wordpress` command-line flags.
CPE Detection `runner/cpe.go`	New module implementing `CPEDetector` with pattern-based matching against title, body, and favicon hashes; includes vendor parsing, CPE string generation, and deduplication utilities.
WordPress Detection `runner/wordpress.go`	New module implementing `WordPressDetector` that loads plugin and theme lists from awesome-search-queries and extracts matches from HTML via regex with deduplication.
Output Structure `runner/types.go`	Extended `Result` struct with `CPE` (slice of CPEInfo) and `WordPress` (pointer to WordPressInfo) fields, including JSON/CSV/mapstructure tags.
Runner Integration `runner/runner.go`	Added `cpeDetector` and `wpDetector` fields to `Runner`; conditional initialization during runner creation; detector invocations integrated into scan result processing to populate new Result fields.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Two detectors hopping through the code,
CPE and WordPress on the road,
Patterns matched, plugins found with glee,
Results enriched for all to see! 🎉

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main feature addition: passive CPE and WordPress detection, matching the changeset's primary focus across multiple files (cpe.go, wordpress.go, options.go, runner.go).
Linked Issues check	✅ Passed	The PR fully addresses issue #1975 objectives: implements passive CPE detection with product/vendor extraction and CPE 2.3 identifier generation [cpe.go], implements WordPress plugin/theme detection with validation [wordpress.go], exposes CLI flags -cpe and -wordpress [options.go], and includes results in structured outputs [runner.go, types.go].
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue #1975 objectives: CPE detection, WordPress detection, CLI flags, and output integration. The go.mod dependency addition (awesome-search-queries) is required for core functionality.

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI Agents

In @go.mod:
- Line 131: The dependency github.com/projectdiscovery/awesome-search-queries is
marked as indirect in go.mod but is directly imported by runner/cpe.go and
runner/wordpress.go; run `go mod tidy` (or manually remove the `// indirect`
comment and ensure the require line matches
`github.com/projectdiscovery/awesome-search-queries
v0.0.0-20260104120501-961ef30f7193`) so the module is recorded as a direct
dependency, then re-run `go build`/tests to confirm imports in runner/cpe.go and
runner/wordpress.go resolve correctly.

In @runner/cpe.go:
- Around line 106-116: The code in the loop over titlePrefixes redundantly
strips the prefix twice: first with strings.TrimPrefix(query, prefix) then again
with strings.TrimPrefix(..., prefix[:len(prefix)-1]); update the logic in the
block handling titlePrefixes (the loop using titlePrefixes, extractQuotedValue,
and writing into d.titlePatterns via appendUnique) so you only strip the
intended prefix once—either remove the second TrimPrefix call entirely, or
replace the first TrimPrefix with logic that conditionally trims the variant
without its last character when the prefix form includes a trailing quote (e.g.,
handle prefixes like `title='`/`title="`), then normalize to lowercase and
proceed to set d.titlePatterns[pattern] as before.

🧹 Nitpick comments (1)

runner/runner.go (1)
2348-2371: Potential nil pointer dereference in WordPress detection output.

Line 2351 calls wpInfo.HasData() which is safe because HasData() has a nil receiver check. However, line 2350 uses r.wpDetector.Detect(...) which can return nil, and the current flow relies on HasData() handling nil correctly. The code is technically safe due to the HasData() implementation, but the pattern is fragile.
🔎 Suggested defensive pattern

For consistency with cpeMatches (which uses a nil-safe slice), consider:
 	var wpInfo *WordPressInfo
 	if r.wpDetector != nil {
 		wpInfo = r.wpDetector.Detect(string(resp.Data))
-		if wpInfo.HasData() && r.options.WordPress {
+		if wpInfo != nil && wpInfo.HasData() && r.options.WordPress {
This makes the nil check explicit and doesn't rely on HasData() implementation details.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc2c7a2 and 81461d3.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (7)

README.md
go.mod
runner/cpe.go
runner/options.go
runner/runner.go
runner/types.go
runner/wordpress.go

🧰 Additional context used

🧬 Code graph analysis (2)

runner/types.go (2)

runner/cpe.go (1)

CPEInfo (11-15)

runner/wordpress.go (1)

WordPressInfo (12-15)

runner/runner.go (2)

runner/cpe.go (3)

CPEDetector (17-21)

NewCPEDetector (35-68)

CPEInfo (11-15)

runner/wordpress.go (3)

WordPressDetector (17-22)

NewWordPressDetector (24-59)

WordPressInfo (12-15)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Functional Test (macOS-latest)
GitHub Check: Functional Test (windows-latest)
GitHub Check: Functional Test (ubuntu-latest)
GitHub Check: Analyze (go)
GitHub Check: release-test

🔇 Additional comments (20)

runner/options.go (4)

88-89: New fields properly added to ScanOptions.

The CPEDetect and WordPress boolean fields are correctly added to the ScanOptions struct, aligning with the new detection capabilities.

153-154: Clone method correctly updated.

The new fields are properly propagated in the Clone() method, ensuring cloned instances retain the CPE and WordPress detection settings.

263-264: Options struct correctly extended.

The CPEDetect and WordPress fields are properly added to the main Options struct, matching the ScanOptions additions.

396-397: CLI flags correctly defined.

The new -cpe and -wordpress/-wp flags are properly wired with clear descriptions. They are appropriately placed in the PROBES group alongside related detection features like -td (tech-detect).

runner/cpe.go (5)

11-21: Well-structured data types.

The CPEInfo and CPEDetector structs are cleanly designed with appropriate JSON tags and encapsulation of pattern maps for title, body, and favicon matching.

35-68: Constructor properly initializes detector from query data.

The NewCPEDetector function correctly loads and parses queries from the awesome-search-queries library, building pattern maps for each query type. Error handling is appropriate with wrapped errors for context.

70-91: Vendor parsing and CPE generation are correct.

The parseVendor function handles both string and array vendor formats gracefully. The generateCPE function produces valid CPE 2.3 identifiers with proper escaping of spaces to underscores.

153-170: Quote extraction logic is correct.

The extractQuotedValue function properly handles quoted strings and truncates at logical OR operators, which is appropriate for parsing search query syntax.

181-225: Detection logic correctly matches patterns and deduplicates results.

The Detect method efficiently checks title, body, and favicon patterns with case-insensitive matching and proper deduplication using a seen map. The approach of using strings.Contains for title/body and exact match for favicon hash is appropriate.

README.md (1)

113-117: Documentation correctly updated for new flags.

The README properly documents the new -cpe and -wordpress/-wp flags with clear descriptions that align with the implementation in runner/options.go.

runner/types.go (1)

105-106: Result struct correctly extended with new detection fields.

The CPE slice and WordPress pointer fields are properly added with consistent tags (json, csv, mapstructure) matching the existing field conventions. Using a pointer for WordPressInfo enables proper omitempty behavior for nil values.

runner/runner.go (4)

84-85: Runner struct correctly extended with detector fields.

The cpeDetector and wpDetector fields are properly added to the Runner struct to hold the initialized detectors.

138-150: Graceful initialization with appropriate error handling.

The detectors are initialized conditionally based on flags or JSON/CSV output requirements. Using warning logs instead of fatal errors on initialization failure is a good practice, allowing the scan to proceed without these optional features.

316-317: Scan options correctly propagate detection flags.

The CPEDetect and WordPress flags are properly set in scanopts, enabling detection when explicitly requested or when structured output (JSON/CSV) is enabled.

2436-2437: Result population is correct.

The CPE and WordPress fields are properly assigned to the result struct, integrating the detection data into the output pipeline.

runner/wordpress.go (5)

12-22: Clean data structure design.

The WordPressInfo and WordPressDetector structs are well-designed. Using map[string]struct{} for known plugins/themes provides O(1) lookups, and the compiled regex patterns are efficient for repeated matching.

24-59: Constructor properly initializes detector with external data.

The NewWordPressDetector function correctly:

Compiles regex patterns for plugin/theme path extraction

Loads known plugins/themes from the external library

Returns errors appropriately on any failure

61-70: List loading helper is correct.

The loadList function properly parses newline-delimited data, trims whitespace, and populates the target map. Returning scanner.Err() is correct for propagating any scanning errors.

72-114: Detection logic is correct with proper deduplication and validation.

The Detect method:

Returns early on empty body

Uses regex to extract plugin/theme names from paths

Validates against known lists to reduce false positives

Deduplicates matches using seen maps

Returns nil when no matches are found

116-118: Nil-safe HasData method is a good pattern.

The nil receiver check (w != nil) ensures safe usage even when Detect() returns nil, which is the pattern used in runner/runner.go.

coderabbitai · 2026-01-06T10:28:31Z

go.mod

 	github.com/pierrec/lz4/v4 v4.1.23 // indirect
 	github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
 	github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
+	github.com/projectdiscovery/awesome-search-queries v0.0.0-20260104120501-961ef30f7193 // indirect


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the runner directory and check what files exist find . -type f -name "cpe.go" -o -name "wordpress.go" | head -20

Repository: projectdiscovery/httpx

Length of output: 103

🏁 Script executed:

# Also check if go.mod exists and verify line 131 if [ -f go.mod ]; then sed -n '130,132p' go.mod fi

Repository: projectdiscovery/httpx

Length of output: 303

🏁 Script executed:

# Search for imports of awesome-search-queries in the codebase rg "awesome-search-queries" --type go

Repository: projectdiscovery/httpx

Length of output: 397

Dependency should be marked as direct, not indirect.

The awesome-search-queries dependency is directly imported in runner/cpe.go and runner/wordpress.go, so it should not be marked as // indirect in go.mod. Run go mod tidy to correct this automatically.

🤖 Prompt for AI Agents

In @go.mod at line 131, The dependency github.com/projectdiscovery/awesome-search-queries is marked as indirect in go.mod but is directly imported by runner/cpe.go and runner/wordpress.go; run `go mod tidy` (or manually remove the `// indirect` comment and ensure the require line matches `github.com/projectdiscovery/awesome-search-queries v0.0.0-20260104120501-961ef30f7193`) so the module is recorded as a direct dependency, then re-run `go build`/tests to confirm imports in runner/cpe.go and runner/wordpress.go resolve correctly.

coderabbitai · 2026-01-06T10:28:31Z

runner/cpe.go

+	for _, prefix := range titlePrefixes {
+		if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {
+			pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))
+			pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1])
+			if pattern != "" {
+				pattern = strings.ToLower(pattern)
+				d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)
+			}
+			return
+		}
+	}


⚠️ Potential issue | 🟡 Minor

Suspicious redundant prefix stripping.

Line 109 attempts to strip prefix[:len(prefix)-1] after already trimming the full prefix on line 108. This appears to be dead code or a logic error — if the prefix was already removed, stripping a shorter version won't have any effect.

🔎 Suggested fix

If the intent was to handle cases where the prefix includes a trailing quote character (like title=' or title="), line 108 should handle that. Consider removing line 109 if it's redundant:

if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) { pattern := extractQuotedValue(strings.TrimPrefix(query, prefix)) - pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1]) if pattern != "" { pattern = strings.ToLower(pattern) d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info) } return }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for _, prefix := range titlePrefixes {

if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {

pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))

pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1])

if pattern != "" {

pattern = strings.ToLower(pattern)

d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)

}

return

}

}

for _, prefix := range titlePrefixes {

if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {

pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))

if pattern != "" {

pattern = strings.ToLower(pattern)

d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)

}

return

}

}

🤖 Prompt for AI Agents

In @runner/cpe.go around lines 106 - 116, The code in the loop over titlePrefixes redundantly strips the prefix twice: first with strings.TrimPrefix(query, prefix) then again with strings.TrimPrefix(..., prefix[:len(prefix)-1]); update the logic in the block handling titlePrefixes (the loop using titlePrefixes, extractQuotedValue, and writing into d.titlePatterns via appendUnique) so you only strip the intended prefix once—either remove the second TrimPrefix call entirely, or replace the first TrimPrefix with logic that conditionally trims the variant without its last character when the prefix form includes a trailing quote (e.g., handle prefixes like `title='`/`title="`), then normalize to lowercase and proceed to set d.titlePatterns[pattern] as before.

Mzack9999

lgtm! Anyway I think that wordpress fingerprinting would be better suited within a nuclei template.

auto-assign bot requested a review from dwisiswant0 January 6, 2026 10:24

dogancanbakir requested review from Mzack9999, ehsandeep and knakul853 January 6, 2026 10:24

coderabbitai bot reviewed Jan 6, 2026

View reviewed changes

Mzack9999 approved these changes Jan 8, 2026

View reviewed changes

dogancanbakir merged commit 834bbd7 into dev Jan 8, 2026
15 checks passed

dogancanbakir deleted the feature/cpe-wordpress-detection branch January 8, 2026 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add passive CPE and WordPress detection #2366

feat: add passive CPE and WordPress detection #2366

dogancanbakir commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 6, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 6, 2026

Uh oh!

coderabbitai bot Jan 6, 2026

Uh oh!

Mzack9999 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add passive CPE and WordPress detection #2366

feat: add passive CPE and WordPress detection #2366

Conversation

dogancanbakir commented Jan 6, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

CPE Detection (-cpe flag)

WordPress Detection (-wp flag)

New CLI Flags (PROBES group)

Testing

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

Mzack9999 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dogancanbakir commented Jan 6, 2026 •

edited by coderabbitai bot

Loading

CPE Detection (`-cpe` flag)

WordPress Detection (`-wp` flag)

coderabbitai bot commented Jan 6, 2026 •

edited

Loading