Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 147 additions & 46 deletions SKILLS.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,81 @@
---
name: cape-sandbox-developer
description: Comprehensive guide for architecture, development patterns, and advanced troubleshooting in CAPE Sandbox (v2).
---

# CAPE Sandbox Developer Skills & Architecture Guide

This document outlines the architectural structure, core concepts, and development patterns for the CAPE Sandbox (v2). It serves as a guide for extending functionality, debugging, and maintaining the codebase.

> **Agent Hint:** Use the referenced documentation files (`docs/book/src/...`) to dive deeper into specific topics.

## 1. Project Overview
CAPE (Config And Payload Extraction) is a malware analysis sandbox derived from Cuckoo Sandbox. It focuses on automated malware analysis with a specific emphasis on extracting payloads and configuration from malware.

* **Ref:** `docs/book/src/introduction/what.rst`

**Core Tech Stack:**
- **Language:** Python 3
- **Web Framework:** Django
- **Database:** PostgreSQL (SQLAlchemy) for task management, MongoDB/Elasticsearch for results storage.
- **Virtualization:** KVM/QEMU (preferred), VirtualBox, VMWare.
- **Virtualization:** KVM/QEMU (preferred), VirtualBox, VMWare, Azure, Google Cloud.
- **Frontend:** HTML5, Bootstrap, Jinja2 Templates.
- **Dependency Management:** Poetry.

## 2. Directory Structure Key
| Directory | Purpose |
| :--- | :--- |
| `agent/` | Python script (`agent.py`) running *inside* the Guest VM to handle communication. |
| `analyzer/` | Core analysis components running *inside* the Guest VM (monitor, analyzers). |
| `conf/` | Configuration files (`cuckoo.conf`, `reporting.conf`, `web.conf`, etc.). |
| `analyzer/` | Core analysis components running *inside* the Guest VM (monitor, analyzers, packages). |
| `conf/` | Default configuration files. **Do not edit directly**; use `custom/conf/`. |
| `custom/conf/` | User overrides for configuration files. |
| `data/` | Static assets, yara rules, monitor binaries, and HTML templates (`data/html`). |
| `lib/cuckoo/` | Core logic (Scheduler, Database, Guest Manager, Result Processor). |
| `modules/` | Pluggable components (Signatures, Processing, Reporting, Auxiliary). |
| `modules/` | Pluggable components (Signatures, Processing, Reporting, Auxiliary, Machinery). |
| `web/` | Django-based web interface (Views, URLs, Templates). |
| `utils/` | Standalone CLI utilities (`process.py`, `cleaners.py`, `rooter.py`). |
| `utils/` | Standalone CLI utilities (`process.py`, `submit.py`, `rooter.py`, `community.py`). |

## 3. Core Workflows

### A. The Analysis Lifecycle
1. **Submission:** User submits file/URL via WebUI (`web/submission/`) or API (`web/api/`).
* **Ref:** `docs/book/src/usage/submit.rst`, `docs/book/src/usage/api.rst`
2. **Scheduling:** Task is added to SQL DB. `lib/cuckoo/core/scheduler.py` picks it up.
3. **Execution:**
3. **Infrastructure:**
* `modules/machinery` starts the VM.
* `utils/rooter.py` configures network routing (if applicable).
* **Ref:** `docs/book/src/usage/rooter.rst`
4. **Execution:**
* VM is restored/started.
* `analyzer` is uploaded to VM.
* Sample is injected/executed.
* Sample is injected/executed using specific **Analysis Packages** (`analyzer/windows/modules/packages/`).
* **Ref:** `docs/book/src/usage/packages.rst`
* Behavior is monitored via API hooking (CAPE Monitor).
4. **Result Collection:** Logs, PCAP, and dropped files are transferred back to Host.
5. **Processing:** `modules/processing/` parses raw logs into a structured dictionary.
6. **Signatures:** `modules/signatures/` runs logic against the processed data.
7. **Reporting:** `modules/reporting/` exports data (JSON, HTML, MongoDB, MAEC).

### B. Web Interface Architecture
The Web UI is split into two distinct rendering logic paths:
1. **Django Views (`web/analysis/views.py`):** Handles URL routing, authentication, and context generation. It fetches data from MongoDB/Elasticsearch.
2. **Jinja2 Templates:**
* **Web Templates (`web/templates/`):** Standard Django templates for the UI.
* **Report Templates (`data/html/`):** Standalone Jinja2 templates used by the `reporthtml` module to generate static HTML reports. *Note: Changes here affect the downloadable HTML report, not necessarily the Web UI.*

## 4. Development Guides
* **Auxiliary Modules** (`modules/auxiliary/`) run in parallel on the Host (e.g., Sniffer).
5. **Result Collection:** Logs, PCAP, and dropped files are transferred back to Host.
6. **Processing:** `modules/processing/` parses raw logs into a structured dictionary (Global Container).
7. **Signatures:** `modules/signatures/` runs logic against the processed data.
8. **Reporting:** `modules/reporting/` exports data (JSON, HTML, MongoDB, MAEC).

## 4. Configuration Management
* **Overrides:** Never edit files in `conf/` directly. Create a copy in `custom/conf/` with the same name.
* **Environment Variables:** You can use env vars in configs: `%(ENV:VARIABLE_NAME)s`.
* **Conf.d:** You can create directories like `custom/conf/reporting.conf.d/` and add `.conf` files there for granular overrides.
* **Ref:** `docs/book/src/installation/host/configuration.rst`

## 5. Development Guides
* **Coding Style:** See `docs/book/src/development/code_style.rst`

### Coding Standards (PEP 8+)
* **Imports:** Explicit imports only (`from lib import a, b`). No `from lib import *`. Group standard library, 3rd party, and local imports.
* **Strings:** Use double quotes (`"`) for strings. (This line was corrected from the original prompt to reflect the actual change needed for the example.)
* **Logging:** Use `import logging; log = logging.getLogger(__name__)`. Do not use `print()`.
* **Exceptions:** Use custom exceptions from `lib/cuckoo/common/exceptions.py` (e.g., `CuckooOperationalError`).

### How to Add a Detection Signature
Signatures live in `modules/signatures/`.
* **Ref:** `docs/book/src/customization/signatures.rst`

```python
from lib.cuckoo.common.abstracts import Signature

Expand All @@ -59,54 +85,129 @@ class MyMalware(Signature):
severity = 3
categories = ["trojan"]
authors = ["You"]
minimum = "2.0"

def on_call(self, call, process):
# Inspect individual API calls
if call["api"] == "CreateFileW" and "evil.exe" in call["arguments"]["filepath"]:
return True
def run(self):
# Helper methods: check_file, check_key, check_mutex, check_api, check_ip, check_domain
return self.check_file(pattern=".*evil\\.exe$", regex=True)

# For performance, use evented signatures (on_call) for high-volume API checks
# evented = True
# def on_call(self, call, process): ...
```

### How to Add a Processing Module
Processing modules (`modules/processing/`) run after analysis to extract specific data (e.g., Static analysis of a file).
Processing modules (`modules/processing/`) run after analysis to extract specific data.
* **Ref:** `docs/book/src/customization/processing.rst`

```python
from lib.cuckoo.common.abstracts import Processing

class MyExtractor(Processing):
def run(self):
self.key = "my_data" # Key in the final report JSON
result = {}
# ... logic ...
# Access raw data via self.analysis_path, self.log_path, etc.
return result
```

### How to Modify the Web Report
1. **Locate the Template:** Look in `web/templates/analysis/`.
* `overview/index.html`: Main dashboard.
* `overview/_info.html`: General details.
* `overview/_summary.html`: Behavioral summary.
2. **Edit:** Use Django template language (`{% if %}`, `{{ variable }}`).
3. **Context:** Data is usually passed as `analysis` object. Access fields like `analysis.info.id`, `analysis.network`, `analysis.behavior`.
### How to Add a Reporting Module
Reporting modules (`modules/reporting/`) consume the processed data (Global Container).
* **Ref:** `docs/book/src/customization/reporting.rst`

```python
from lib.cuckoo.common.abstracts import Report
from lib.cuckoo.common.exceptions import CuckooReportError

class MyReport(Report):
def run(self, results):
# 'results' is the big dictionary containing all processed data
try:
# Write to file or database
pass
except Exception as e:
raise CuckooReportError(f"Failed to report: {e}")
```

### How to Add a Machinery Module
Machinery modules (`modules/machinery/`) control the virtualization layer.
* **Ref:** `docs/book/src/customization/machinery.rst`

```python
from lib.cuckoo.common.abstracts import Machinery
from lib.cuckoo.common.exceptions import CuckooMachineError

class MyHypervisor(Machinery):
def start(self, label):
# Start the VM
pass

def stop(self, label):
# Stop the VM
pass
```

### How to Add an Analysis Package
Packages (`analyzer/windows/modules/packages/`) define how to execute the sample inside the VM.
* **Ref:** `docs/book/src/customization/packages.rst`

```python
from lib.common.abstracts import Package

class MyPackage(Package):
def start(self, path):
args = self.options.get("arguments")
# 'execute' handles injection and monitoring
return self.execute(path, args, suspended=False)
```

## 5. Troubleshooting & Debugging
## 6. Best Practices

### Web & UI
1. **Conditionally Render:** Always check if a dictionary key exists in templates (`{% if analysis.key %}`) before rendering to avoid UI breaks on different analysis types (Static vs Dynamic).
2. **Keep Views Light:** Perform heavy data crunching in `modules/processing`, not in Django views.
3. **Modular CSS/JS:** Keep custom styles in `web/static/` rather than inline in templates when possible.

### Performance
1. **Evented Signatures:** Use `evented = True` and `on_call()` in signatures to process API calls in a single loop instead of iterating the whole log multiple times.
2. **Ram-boost:** Enable `ram_boost` in `processing.conf` behavior section to keep API logs in memory if the Host has >20GB RAM.
3. **Disable Unused Reports:** Disable heavy reporting modules (e.g., HTML, MAEC) in `reporting.conf` if not strictly needed for automation.

### Security
1. **Guest Isolation:** Always use static IPs and consider isolated/host-only networks. Disable noisy services (LLMNR, Teredo) in Guest to reduce PCAP noise.
2. **Stealth:** Use the `no-stealth` option sparingly. CAPE's anti-anti-VM features are enabled by default and are critical for modern malware.

## 7. Troubleshooting & Debugging
* **Ref:** `docs/book/src/Issues/Debugging_VM_issues.rst` (VM hangs, High CPU)
* **Ref:** `docs/book/src/installation/guest/troubleshooting.rst` (Network, Agent issues)

### Common Issues
* **"Waiting for container":** Usually a network configuration issue in `conf/cuckoo.conf` or `conf/auxiliary.conf`.
* **Report Empty:** Check `reporting.conf`. If using MongoDB, ensure `mongodb` is enabled.
* **Template Errors:** Use `{% if variable %}` guards aggressively. Missing keys in MongoDB documents cause Jinja2 crashes.
* **"Waiting for container":** Check `conf/cuckoo.conf` (IPs) or network configuration. Ensure `cape-rooter` is running if routing is enabled.
* **VM Stuck/Hanging:**
* Check `ps aux | grep qemu` or `grep python`.
* **100% CPU:** Livelock.
* **0% CPU:** Waiting for I/O (likely network or agent).
* Check `lib/cuckoo/core/guest.py` timeouts.
* **Permissions:** Ensure `cape` user owns the directories and files.
* **Database Migrations:** If DB errors occur, run `cd utils/db_migration && poetry run alembic upgrade head`.

### Advanced Debugging (py-spy)
If the Python controller is unresponsive, use `py-spy` to inspect the stack trace without stopping the process:
1. **Install:** `pip install py-spy`
2. **Dump:** `sudo py-spy dump --pid <PYTHON_PID>`
3. **Analyze:** Look for `wait_for_completion` (waiting for Guest/Agent) or network calls like `select`, `poll`, `recv` that may be blocked.

### Important Commands
* `poetry run python cuckoo.py -d`: Run CAPE in debug mode (verbose logs).
* `poetry run python utils/process.py -r <task_id>`: Re-run processing and reporting for a specific task without restarting the VM.
* `poetry run python utils/cleaners.py --clean`: Wipe all tasks and reset the DB.
* **Start CAPE:** `sudo -u cape poetry run python cuckoo.py`
* **Debug Mode:** `sudo -u cape poetry run python cuckoo.py -d`
* **Reprocess Task:** `sudo -u cape poetry run python utils/process.py -r <task_id>`
* **Clean All:** `sudo -u cape poetry run python utils/cleaners.py --clean` (Destructive!)
* **Download Signatures:** `sudo -u cape poetry run python utils/community.py -waf`
* **Test Rooter:** `sudo python3 utils/rooter.py -g cape -v`

### Database Querying (MongoDB)
CAPE stores unstructured analysis results in the `analysis` collection.
```bash
mongo cuckoo
db.analysis.find({"info.id": 123}, {"behavior.summary": 1}).pretty()
```

## 6. Best Practices
1. **Conditionally Render:** Always check if a dictionary key exists in templates before rendering to avoid UI breaks on different analysis types (Static vs Dynamic).
2. **Keep Views Light:** Perform heavy data crunching in `modules/processing`, not in Django views.
3. **Modular CSS/JS:** Keep custom styles in `web/static/` rather than inline in templates when possible.
```
2 changes: 1 addition & 1 deletion docs/book/src/installation/guest/linux.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Installing the Linux guest
==========================

Linux guests doesn't have official CAPAE support!
Linux guests don't have official CAPE support!
First, prepare the networking for your machinery platform on the host side.

.. This has not been tested recently:
Expand Down
2 changes: 1 addition & 1 deletion docs/book/src/usage/dist.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ or::
Submit a new analysis task
The method of submission is always the same: by REST API or via web GUI, both only pointing to the "master node".

Get the report of a task should be requested throw master node integrated /api/
Get the report of a task should be requested through the master node integrated /api/

Proposed setup
==============
Expand Down
2 changes: 1 addition & 1 deletion docs/book/src/usage/monitor.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ What make CAPE's debugger unique among Windows debuggers is the fact that it has

The debugger is not interactive, its actions are pre-determined upon submission and the results can be found in the debugger log which is presented in a dedicated tab in the UI.

Th following is a quick guide on getting started with the debugger.
The following is a quick guide on getting started with the debugger.

Breakpoints: bp0, bp1, bp2, bp3
===============================
Expand Down