diff --git a/SKILLS.md b/SKILLS.md index 776030e69b6..917316c680a 100644 --- a/SKILLS.md +++ b/SKILLS.md @@ -1,55 +1,81 @@ +--- +name: cape-sandbox-developer +description: Comprehensive guide for architecture, development patterns, and advanced troubleshooting in CAPE Sandbox (v2). +--- + # CAPE Sandbox Developer Skills & Architecture Guide This document outlines the architectural structure, core concepts, and development patterns for the CAPE Sandbox (v2). It serves as a guide for extending functionality, debugging, and maintaining the codebase. +> **Agent Hint:** Use the referenced documentation files (`docs/book/src/...`) to dive deeper into specific topics. + ## 1. Project Overview CAPE (Config And Payload Extraction) is a malware analysis sandbox derived from Cuckoo Sandbox. It focuses on automated malware analysis with a specific emphasis on extracting payloads and configuration from malware. +* **Ref:** `docs/book/src/introduction/what.rst` + **Core Tech Stack:** - **Language:** Python 3 - **Web Framework:** Django - **Database:** PostgreSQL (SQLAlchemy) for task management, MongoDB/Elasticsearch for results storage. -- **Virtualization:** KVM/QEMU (preferred), VirtualBox, VMWare. +- **Virtualization:** KVM/QEMU (preferred), VirtualBox, VMWare, Azure, Google Cloud. - **Frontend:** HTML5, Bootstrap, Jinja2 Templates. +- **Dependency Management:** Poetry. ## 2. Directory Structure Key | Directory | Purpose | | :--- | :--- | | `agent/` | Python script (`agent.py`) running *inside* the Guest VM to handle communication. | -| `analyzer/` | Core analysis components running *inside* the Guest VM (monitor, analyzers). | -| `conf/` | Configuration files (`cuckoo.conf`, `reporting.conf`, `web.conf`, etc.). | +| `analyzer/` | Core analysis components running *inside* the Guest VM (monitor, analyzers, packages). | +| `conf/` | Default configuration files. **Do not edit directly**; use `custom/conf/`. | +| `custom/conf/` | User overrides for configuration files. | | `data/` | Static assets, yara rules, monitor binaries, and HTML templates (`data/html`). | | `lib/cuckoo/` | Core logic (Scheduler, Database, Guest Manager, Result Processor). | -| `modules/` | Pluggable components (Signatures, Processing, Reporting, Auxiliary). | +| `modules/` | Pluggable components (Signatures, Processing, Reporting, Auxiliary, Machinery). | | `web/` | Django-based web interface (Views, URLs, Templates). | -| `utils/` | Standalone CLI utilities (`process.py`, `cleaners.py`, `rooter.py`). | +| `utils/` | Standalone CLI utilities (`process.py`, `submit.py`, `rooter.py`, `community.py`). | ## 3. Core Workflows ### A. The Analysis Lifecycle 1. **Submission:** User submits file/URL via WebUI (`web/submission/`) or API (`web/api/`). + * **Ref:** `docs/book/src/usage/submit.rst`, `docs/book/src/usage/api.rst` 2. **Scheduling:** Task is added to SQL DB. `lib/cuckoo/core/scheduler.py` picks it up. -3. **Execution:** +3. **Infrastructure:** + * `modules/machinery` starts the VM. + * `utils/rooter.py` configures network routing (if applicable). + * **Ref:** `docs/book/src/usage/rooter.rst` +4. **Execution:** * VM is restored/started. * `analyzer` is uploaded to VM. - * Sample is injected/executed. + * Sample is injected/executed using specific **Analysis Packages** (`analyzer/windows/modules/packages/`). + * **Ref:** `docs/book/src/usage/packages.rst` * Behavior is monitored via API hooking (CAPE Monitor). -4. **Result Collection:** Logs, PCAP, and dropped files are transferred back to Host. -5. **Processing:** `modules/processing/` parses raw logs into a structured dictionary. -6. **Signatures:** `modules/signatures/` runs logic against the processed data. -7. **Reporting:** `modules/reporting/` exports data (JSON, HTML, MongoDB, MAEC). - -### B. Web Interface Architecture -The Web UI is split into two distinct rendering logic paths: -1. **Django Views (`web/analysis/views.py`):** Handles URL routing, authentication, and context generation. It fetches data from MongoDB/Elasticsearch. -2. **Jinja2 Templates:** - * **Web Templates (`web/templates/`):** Standard Django templates for the UI. - * **Report Templates (`data/html/`):** Standalone Jinja2 templates used by the `reporthtml` module to generate static HTML reports. *Note: Changes here affect the downloadable HTML report, not necessarily the Web UI.* - -## 4. Development Guides + * **Auxiliary Modules** (`modules/auxiliary/`) run in parallel on the Host (e.g., Sniffer). +5. **Result Collection:** Logs, PCAP, and dropped files are transferred back to Host. +6. **Processing:** `modules/processing/` parses raw logs into a structured dictionary (Global Container). +7. **Signatures:** `modules/signatures/` runs logic against the processed data. +8. **Reporting:** `modules/reporting/` exports data (JSON, HTML, MongoDB, MAEC). + +## 4. Configuration Management +* **Overrides:** Never edit files in `conf/` directly. Create a copy in `custom/conf/` with the same name. +* **Environment Variables:** You can use env vars in configs: `%(ENV:VARIABLE_NAME)s`. +* **Conf.d:** You can create directories like `custom/conf/reporting.conf.d/` and add `.conf` files there for granular overrides. +* **Ref:** `docs/book/src/installation/host/configuration.rst` + +## 5. Development Guides +* **Coding Style:** See `docs/book/src/development/code_style.rst` + +### Coding Standards (PEP 8+) +* **Imports:** Explicit imports only (`from lib import a, b`). No `from lib import *`. Group standard library, 3rd party, and local imports. +* **Strings:** Use double quotes (`"`) for strings. (This line was corrected from the original prompt to reflect the actual change needed for the example.) +* **Logging:** Use `import logging; log = logging.getLogger(__name__)`. Do not use `print()`. +* **Exceptions:** Use custom exceptions from `lib/cuckoo/common/exceptions.py` (e.g., `CuckooOperationalError`). ### How to Add a Detection Signature Signatures live in `modules/signatures/`. +* **Ref:** `docs/book/src/customization/signatures.rst` + ```python from lib.cuckoo.common.abstracts import Signature @@ -59,15 +85,21 @@ class MyMalware(Signature): severity = 3 categories = ["trojan"] authors = ["You"] + minimum = "2.0" - def on_call(self, call, process): - # Inspect individual API calls - if call["api"] == "CreateFileW" and "evil.exe" in call["arguments"]["filepath"]: - return True + def run(self): + # Helper methods: check_file, check_key, check_mutex, check_api, check_ip, check_domain + return self.check_file(pattern=".*evil\\.exe$", regex=True) + + # For performance, use evented signatures (on_call) for high-volume API checks + # evented = True + # def on_call(self, call, process): ... ``` ### How to Add a Processing Module -Processing modules (`modules/processing/`) run after analysis to extract specific data (e.g., Static analysis of a file). +Processing modules (`modules/processing/`) run after analysis to extract specific data. +* **Ref:** `docs/book/src/customization/processing.rst` + ```python from lib.cuckoo.common.abstracts import Processing @@ -75,38 +107,107 @@ class MyExtractor(Processing): def run(self): self.key = "my_data" # Key in the final report JSON result = {} - # ... logic ... + # Access raw data via self.analysis_path, self.log_path, etc. return result ``` -### How to Modify the Web Report -1. **Locate the Template:** Look in `web/templates/analysis/`. - * `overview/index.html`: Main dashboard. - * `overview/_info.html`: General details. - * `overview/_summary.html`: Behavioral summary. -2. **Edit:** Use Django template language (`{% if %}`, `{{ variable }}`). -3. **Context:** Data is usually passed as `analysis` object. Access fields like `analysis.info.id`, `analysis.network`, `analysis.behavior`. +### How to Add a Reporting Module +Reporting modules (`modules/reporting/`) consume the processed data (Global Container). +* **Ref:** `docs/book/src/customization/reporting.rst` + +```python +from lib.cuckoo.common.abstracts import Report +from lib.cuckoo.common.exceptions import CuckooReportError + +class MyReport(Report): + def run(self, results): + # 'results' is the big dictionary containing all processed data + try: + # Write to file or database + pass + except Exception as e: + raise CuckooReportError(f"Failed to report: {e}") +``` + +### How to Add a Machinery Module +Machinery modules (`modules/machinery/`) control the virtualization layer. +* **Ref:** `docs/book/src/customization/machinery.rst` + +```python +from lib.cuckoo.common.abstracts import Machinery +from lib.cuckoo.common.exceptions import CuckooMachineError + +class MyHypervisor(Machinery): + def start(self, label): + # Start the VM + pass + + def stop(self, label): + # Stop the VM + pass +``` + +### How to Add an Analysis Package +Packages (`analyzer/windows/modules/packages/`) define how to execute the sample inside the VM. +* **Ref:** `docs/book/src/customization/packages.rst` + +```python +from lib.common.abstracts import Package + +class MyPackage(Package): + def start(self, path): + args = self.options.get("arguments") + # 'execute' handles injection and monitoring + return self.execute(path, args, suspended=False) +``` -## 5. Troubleshooting & Debugging +## 6. Best Practices + +### Web & UI +1. **Conditionally Render:** Always check if a dictionary key exists in templates (`{% if analysis.key %}`) before rendering to avoid UI breaks on different analysis types (Static vs Dynamic). +2. **Keep Views Light:** Perform heavy data crunching in `modules/processing`, not in Django views. +3. **Modular CSS/JS:** Keep custom styles in `web/static/` rather than inline in templates when possible. + +### Performance +1. **Evented Signatures:** Use `evented = True` and `on_call()` in signatures to process API calls in a single loop instead of iterating the whole log multiple times. +2. **Ram-boost:** Enable `ram_boost` in `processing.conf` behavior section to keep API logs in memory if the Host has >20GB RAM. +3. **Disable Unused Reports:** Disable heavy reporting modules (e.g., HTML, MAEC) in `reporting.conf` if not strictly needed for automation. + +### Security +1. **Guest Isolation:** Always use static IPs and consider isolated/host-only networks. Disable noisy services (LLMNR, Teredo) in Guest to reduce PCAP noise. +2. **Stealth:** Use the `no-stealth` option sparingly. CAPE's anti-anti-VM features are enabled by default and are critical for modern malware. + +## 7. Troubleshooting & Debugging +* **Ref:** `docs/book/src/Issues/Debugging_VM_issues.rst` (VM hangs, High CPU) +* **Ref:** `docs/book/src/installation/guest/troubleshooting.rst` (Network, Agent issues) ### Common Issues -* **"Waiting for container":** Usually a network configuration issue in `conf/cuckoo.conf` or `conf/auxiliary.conf`. -* **Report Empty:** Check `reporting.conf`. If using MongoDB, ensure `mongodb` is enabled. -* **Template Errors:** Use `{% if variable %}` guards aggressively. Missing keys in MongoDB documents cause Jinja2 crashes. +* **"Waiting for container":** Check `conf/cuckoo.conf` (IPs) or network configuration. Ensure `cape-rooter` is running if routing is enabled. +* **VM Stuck/Hanging:** + * Check `ps aux | grep qemu` or `grep python`. + * **100% CPU:** Livelock. + * **0% CPU:** Waiting for I/O (likely network or agent). + * Check `lib/cuckoo/core/guest.py` timeouts. +* **Permissions:** Ensure `cape` user owns the directories and files. +* **Database Migrations:** If DB errors occur, run `cd utils/db_migration && poetry run alembic upgrade head`. + +### Advanced Debugging (py-spy) +If the Python controller is unresponsive, use `py-spy` to inspect the stack trace without stopping the process: +1. **Install:** `pip install py-spy` +2. **Dump:** `sudo py-spy dump --pid ` +3. **Analyze:** Look for `wait_for_completion` (waiting for Guest/Agent) or network calls like `select`, `poll`, `recv` that may be blocked. ### Important Commands -* `poetry run python cuckoo.py -d`: Run CAPE in debug mode (verbose logs). -* `poetry run python utils/process.py -r `: Re-run processing and reporting for a specific task without restarting the VM. -* `poetry run python utils/cleaners.py --clean`: Wipe all tasks and reset the DB. +* **Start CAPE:** `sudo -u cape poetry run python cuckoo.py` +* **Debug Mode:** `sudo -u cape poetry run python cuckoo.py -d` +* **Reprocess Task:** `sudo -u cape poetry run python utils/process.py -r ` +* **Clean All:** `sudo -u cape poetry run python utils/cleaners.py --clean` (Destructive!) +* **Download Signatures:** `sudo -u cape poetry run python utils/community.py -waf` +* **Test Rooter:** `sudo python3 utils/rooter.py -g cape -v` ### Database Querying (MongoDB) CAPE stores unstructured analysis results in the `analysis` collection. ```bash mongo cuckoo db.analysis.find({"info.id": 123}, {"behavior.summary": 1}).pretty() -``` - -## 6. Best Practices -1. **Conditionally Render:** Always check if a dictionary key exists in templates before rendering to avoid UI breaks on different analysis types (Static vs Dynamic). -2. **Keep Views Light:** Perform heavy data crunching in `modules/processing`, not in Django views. -3. **Modular CSS/JS:** Keep custom styles in `web/static/` rather than inline in templates when possible. +``` \ No newline at end of file diff --git a/docs/book/src/installation/guest/linux.rst b/docs/book/src/installation/guest/linux.rst index 460b06a3b57..c4a7e46418f 100644 --- a/docs/book/src/installation/guest/linux.rst +++ b/docs/book/src/installation/guest/linux.rst @@ -2,7 +2,7 @@ Installing the Linux guest ========================== -Linux guests doesn't have official CAPAE support! +Linux guests don't have official CAPE support! First, prepare the networking for your machinery platform on the host side. .. This has not been tested recently: diff --git a/docs/book/src/usage/dist.rst b/docs/book/src/usage/dist.rst index c6754730b1b..de4dc0661c4 100644 --- a/docs/book/src/usage/dist.rst +++ b/docs/book/src/usage/dist.rst @@ -164,7 +164,7 @@ or:: Submit a new analysis task The method of submission is always the same: by REST API or via web GUI, both only pointing to the "master node". -Get the report of a task should be requested throw master node integrated /api/ +Get the report of a task should be requested through the master node integrated /api/ Proposed setup ============== diff --git a/docs/book/src/usage/monitor.rst b/docs/book/src/usage/monitor.rst index ac20fdb6d4b..43eb259f6c1 100644 --- a/docs/book/src/usage/monitor.rst +++ b/docs/book/src/usage/monitor.rst @@ -8,7 +8,7 @@ What make CAPE's debugger unique among Windows debuggers is the fact that it has The debugger is not interactive, its actions are pre-determined upon submission and the results can be found in the debugger log which is presented in a dedicated tab in the UI. -Th following is a quick guide on getting started with the debugger. +The following is a quick guide on getting started with the debugger. Breakpoints: bp0, bp1, bp2, bp3 ===============================