-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: add passive CPE and WordPress detection #2366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add support for passive detection of CPE (Common Platform Enumeration) identifiers and WordPress plugins/themes using awesome-search-queries. CPE Detection (-cpe flag): - Matches response title, body, and favicon hash against patterns - Extracts product, vendor, and generates CPE 2.3 identifiers - Uses patterns from Shodan, FOFA, Google dorks WordPress Detection (-wp flag): - Detects plugins via /wp-content/plugins/[name]/ patterns - Detects themes via /wp-content/themes/[name]/ patterns - Validates against known plugins/themes list New CLI flags in PROBES group: - -cpe: display CPE based on awesome-search-queries - -wp, -wordpress: display WordPress plugins and themes Both are automatically included in JSON/CSV output. Closes #1975
WalkthroughThe PR adds passive CPE and WordPress detection capabilities to httpx by integrating the awesome-search-queries library. Two new detector modules are introduced: CPEDetector for identifying products via pattern matching against title, body, and favicon hashes, and WordPressDetector for extracting WordPress plugin and theme names from HTML responses. Both detectors are initialized conditionally in the runner, and their results are attached to the output Result structure via new fields. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI Agents
In @go.mod:
- Line 131: The dependency github.com/projectdiscovery/awesome-search-queries is
marked as indirect in go.mod but is directly imported by runner/cpe.go and
runner/wordpress.go; run `go mod tidy` (or manually remove the `// indirect`
comment and ensure the require line matches
`github.com/projectdiscovery/awesome-search-queries
v0.0.0-20260104120501-961ef30f7193`) so the module is recorded as a direct
dependency, then re-run `go build`/tests to confirm imports in runner/cpe.go and
runner/wordpress.go resolve correctly.
In @runner/cpe.go:
- Around line 106-116: The code in the loop over titlePrefixes redundantly
strips the prefix twice: first with strings.TrimPrefix(query, prefix) then again
with strings.TrimPrefix(..., prefix[:len(prefix)-1]); update the logic in the
block handling titlePrefixes (the loop using titlePrefixes, extractQuotedValue,
and writing into d.titlePatterns via appendUnique) so you only strip the
intended prefix once—either remove the second TrimPrefix call entirely, or
replace the first TrimPrefix with logic that conditionally trims the variant
without its last character when the prefix form includes a trailing quote (e.g.,
handle prefixes like `title='`/`title="`), then normalize to lowercase and
proceed to set d.titlePatterns[pattern] as before.
🧹 Nitpick comments (1)
runner/runner.go (1)
2348-2371: Potential nil pointer dereference in WordPress detection output.Line 2351 calls
wpInfo.HasData()which is safe becauseHasData()has a nil receiver check. However, line 2350 usesr.wpDetector.Detect(...)which can returnnil, and the current flow relies onHasData()handling nil correctly. The code is technically safe due to theHasData()implementation, but the pattern is fragile.🔎 Suggested defensive pattern
For consistency with
cpeMatches(which uses a nil-safe slice), consider:var wpInfo *WordPressInfo if r.wpDetector != nil { wpInfo = r.wpDetector.Detect(string(resp.Data)) - if wpInfo.HasData() && r.options.WordPress { + if wpInfo != nil && wpInfo.HasData() && r.options.WordPress {This makes the nil check explicit and doesn't rely on
HasData()implementation details.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (7)
README.mdgo.modrunner/cpe.gorunner/options.gorunner/runner.gorunner/types.gorunner/wordpress.go
🧰 Additional context used
🧬 Code graph analysis (2)
runner/types.go (2)
runner/cpe.go (1)
CPEInfo(11-15)runner/wordpress.go (1)
WordPressInfo(12-15)
runner/runner.go (2)
runner/cpe.go (3)
CPEDetector(17-21)NewCPEDetector(35-68)CPEInfo(11-15)runner/wordpress.go (3)
WordPressDetector(17-22)NewWordPressDetector(24-59)WordPressInfo(12-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Functional Test (macOS-latest)
- GitHub Check: Functional Test (windows-latest)
- GitHub Check: Functional Test (ubuntu-latest)
- GitHub Check: Analyze (go)
- GitHub Check: release-test
🔇 Additional comments (20)
runner/options.go (4)
88-89: New fields properly added to ScanOptions.The
CPEDetectandWordPressboolean fields are correctly added to theScanOptionsstruct, aligning with the new detection capabilities.
153-154: Clone method correctly updated.The new fields are properly propagated in the
Clone()method, ensuring cloned instances retain the CPE and WordPress detection settings.
263-264: Options struct correctly extended.The
CPEDetectandWordPressfields are properly added to the mainOptionsstruct, matching theScanOptionsadditions.
396-397: CLI flags correctly defined.The new
-cpeand-wordpress/-wpflags are properly wired with clear descriptions. They are appropriately placed in the PROBES group alongside related detection features like-td(tech-detect).runner/cpe.go (5)
11-21: Well-structured data types.The
CPEInfoandCPEDetectorstructs are cleanly designed with appropriate JSON tags and encapsulation of pattern maps for title, body, and favicon matching.
35-68: Constructor properly initializes detector from query data.The
NewCPEDetectorfunction correctly loads and parses queries from theawesome-search-querieslibrary, building pattern maps for each query type. Error handling is appropriate with wrapped errors for context.
70-91: Vendor parsing and CPE generation are correct.The
parseVendorfunction handles both string and array vendor formats gracefully. ThegenerateCPEfunction produces valid CPE 2.3 identifiers with proper escaping of spaces to underscores.
153-170: Quote extraction logic is correct.The
extractQuotedValuefunction properly handles quoted strings and truncates at logical OR operators, which is appropriate for parsing search query syntax.
181-225: Detection logic correctly matches patterns and deduplicates results.The
Detectmethod efficiently checks title, body, and favicon patterns with case-insensitive matching and proper deduplication using a seen map. The approach of usingstrings.Containsfor title/body and exact match for favicon hash is appropriate.README.md (1)
113-117: Documentation correctly updated for new flags.The README properly documents the new
-cpeand-wordpress/-wpflags with clear descriptions that align with the implementation inrunner/options.go.runner/types.go (1)
105-106: Result struct correctly extended with new detection fields.The
CPEslice andWordPresspointer fields are properly added with consistent tags (json,csv,mapstructure) matching the existing field conventions. Using a pointer forWordPressInfoenables properomitemptybehavior for nil values.runner/runner.go (4)
84-85: Runner struct correctly extended with detector fields.The
cpeDetectorandwpDetectorfields are properly added to theRunnerstruct to hold the initialized detectors.
138-150: Graceful initialization with appropriate error handling.The detectors are initialized conditionally based on flags or JSON/CSV output requirements. Using warning logs instead of fatal errors on initialization failure is a good practice, allowing the scan to proceed without these optional features.
316-317: Scan options correctly propagate detection flags.The
CPEDetectandWordPressflags are properly set inscanopts, enabling detection when explicitly requested or when structured output (JSON/CSV) is enabled.
2436-2437: Result population is correct.The
CPEandWordPressfields are properly assigned to the result struct, integrating the detection data into the output pipeline.runner/wordpress.go (5)
12-22: Clean data structure design.The
WordPressInfoandWordPressDetectorstructs are well-designed. Usingmap[string]struct{}for known plugins/themes provides O(1) lookups, and the compiled regex patterns are efficient for repeated matching.
24-59: Constructor properly initializes detector with external data.The
NewWordPressDetectorfunction correctly:
- Compiles regex patterns for plugin/theme path extraction
- Loads known plugins/themes from the external library
- Returns errors appropriately on any failure
61-70: List loading helper is correct.The
loadListfunction properly parses newline-delimited data, trims whitespace, and populates the target map. Returningscanner.Err()is correct for propagating any scanning errors.
72-114: Detection logic is correct with proper deduplication and validation.The
Detectmethod:
- Returns early on empty body
- Uses regex to extract plugin/theme names from paths
- Validates against known lists to reduce false positives
- Deduplicates matches using seen maps
- Returns nil when no matches are found
116-118: Nil-safe HasData method is a good pattern.The nil receiver check (
w != nil) ensures safe usage even whenDetect()returns nil, which is the pattern used inrunner/runner.go.
| github.com/pierrec/lz4/v4 v4.1.23 // indirect | ||
| github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect | ||
| github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect | ||
| github.com/projectdiscovery/awesome-search-queries v0.0.0-20260104120501-961ef30f7193 // indirect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find the runner directory and check what files exist
find . -type f -name "cpe.go" -o -name "wordpress.go" | head -20Repository: projectdiscovery/httpx
Length of output: 103
🏁 Script executed:
# Also check if go.mod exists and verify line 131
if [ -f go.mod ]; then
sed -n '130,132p' go.mod
fiRepository: projectdiscovery/httpx
Length of output: 303
🏁 Script executed:
# Search for imports of awesome-search-queries in the codebase
rg "awesome-search-queries" --type goRepository: projectdiscovery/httpx
Length of output: 397
Dependency should be marked as direct, not indirect.
The awesome-search-queries dependency is directly imported in runner/cpe.go and runner/wordpress.go, so it should not be marked as // indirect in go.mod. Run go mod tidy to correct this automatically.
🤖 Prompt for AI Agents
In @go.mod at line 131, The dependency
github.com/projectdiscovery/awesome-search-queries is marked as indirect in
go.mod but is directly imported by runner/cpe.go and runner/wordpress.go; run
`go mod tidy` (or manually remove the `// indirect` comment and ensure the
require line matches `github.com/projectdiscovery/awesome-search-queries
v0.0.0-20260104120501-961ef30f7193`) so the module is recorded as a direct
dependency, then re-run `go build`/tests to confirm imports in runner/cpe.go and
runner/wordpress.go resolve correctly.
| for _, prefix := range titlePrefixes { | ||
| if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) { | ||
| pattern := extractQuotedValue(strings.TrimPrefix(query, prefix)) | ||
| pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1]) | ||
| if pattern != "" { | ||
| pattern = strings.ToLower(pattern) | ||
| d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info) | ||
| } | ||
| return | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suspicious redundant prefix stripping.
Line 109 attempts to strip prefix[:len(prefix)-1] after already trimming the full prefix on line 108. This appears to be dead code or a logic error — if the prefix was already removed, stripping a shorter version won't have any effect.
🔎 Suggested fix
If the intent was to handle cases where the prefix includes a trailing quote character (like title=' or title="), line 108 should handle that. Consider removing line 109 if it's redundant:
if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) {
pattern := extractQuotedValue(strings.TrimPrefix(query, prefix))
- pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1])
if pattern != "" {
pattern = strings.ToLower(pattern)
d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info)
}
return
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for _, prefix := range titlePrefixes { | |
| if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) { | |
| pattern := extractQuotedValue(strings.TrimPrefix(query, prefix)) | |
| pattern = strings.TrimPrefix(pattern, prefix[:len(prefix)-1]) | |
| if pattern != "" { | |
| pattern = strings.ToLower(pattern) | |
| d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info) | |
| } | |
| return | |
| } | |
| } | |
| for _, prefix := range titlePrefixes { | |
| if strings.HasPrefix(strings.ToLower(query), strings.ToLower(prefix)) { | |
| pattern := extractQuotedValue(strings.TrimPrefix(query, prefix)) | |
| if pattern != "" { | |
| pattern = strings.ToLower(pattern) | |
| d.titlePatterns[pattern] = appendUnique(d.titlePatterns[pattern], info) | |
| } | |
| return | |
| } | |
| } |
🤖 Prompt for AI Agents
In @runner/cpe.go around lines 106 - 116, The code in the loop over
titlePrefixes redundantly strips the prefix twice: first with
strings.TrimPrefix(query, prefix) then again with strings.TrimPrefix(...,
prefix[:len(prefix)-1]); update the logic in the block handling titlePrefixes
(the loop using titlePrefixes, extractQuotedValue, and writing into
d.titlePatterns via appendUnique) so you only strip the intended prefix
once—either remove the second TrimPrefix call entirely, or replace the first
TrimPrefix with logic that conditionally trims the variant without its last
character when the prefix form includes a trailing quote (e.g., handle prefixes
like `title='`/`title="`), then normalize to lowercase and proceed to set
d.titlePatterns[pattern] as before.
Mzack9999
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Anyway I think that wordpress fingerprinting would be better suited within a nuclei template.
Summary
Adds support for passive detection of CPE (Common Platform Enumeration) identifiers and WordPress plugins/themes using the awesome-search-queries database.
CPE Detection (
-cpeflag)cpe:2.3:a:vendor:product:*:*:*:*:*:*:*:*)WordPress Detection (
-wpflag)/wp-content/plugins/[name]/patterns in response body/wp-content/themes/[name]/patterns in response bodyNew CLI Flags (PROBES group)
-cpe-wp, -wordpressBoth are automatically included in JSON/CSV output.
Testing
Test plan
Closes #1975
Summary by CodeRabbit
Release Notes
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.