Veritensor is an end-to-end security platform for the entire AI Life Cycle. It replaces traditional black-box scanning with deep semantic analysis, data supply chain protection, and cryptographic trust verification.
Unlike standard security tools, Veritensor provides a unified defense layer for every asset in your AI stack:
- Models: Deep AST and Bytecode analysis of Pickle, PyTorch, Keras, Safetensors, and GGUF to block RCE, backdoors, and weight-tampering.
- Datasets: High-speed streaming protection for Parquet, CSV, and JSONL to detect Data Poisoning, Malicious URLs, and PII.
- Notebooks: Hardening of Jupyter (.ipynb) files by scanning code execution, markdown phishing, and identifying secrets leaked in cell outputs.
- RAG Knowledge Base: Zero-trust extraction for PDF, DOCX, and PPTX to neutralize prompt injections before they reach your Vector Database.
- Supply Chain Trust: Cryptographic signing of models and containers via Sigstore, ensuring that only verified assets reach your production environment.
- Deep Static Analysis: Decompiles Pickle bytecode and Keras Lambda layers to find obfuscated attacks (e.g.,
STACK_GLOBALexploits). Now supports deep scanning of Zip archives (PyTorch) and Python Wheels. - Identity Verification: Automatically verifies model hashes against the official Hugging Face registry to detect Man-in-the-Middle attacks.
- Dataset Poisoning Guard: Scans massive datasets (100GB+) using Streaming Analysis. Detects "Ignore previous instructions" patterns and malicious URLs in Parquet, CSV, TSV, and JSONL.
- Notebook Inspector: Scans Jupyter
.ipynbfiles for execution threats, malicious markdown (XSS/Phishing), and leaked secrets in output cells. - RAG Document Security: Protects your knowledge base by scanning PDF, DOCX, and PPTX for prompt injections and PII before vectorization.
- License Firewall: Blocks models with restrictive licenses (e.g., Non-Commercial, AGPL). Veritensor performs a hybrid check: it inspects embedded file metadata first, and automatically falls back to the Hugging Face API if metadata is missing (requires
--repo). - Supply Chain Security: Integrates with Sigstore Cosign to sign Docker containers. Includes timestamps to prevent replay attacks.
- CI/CD Native: Supports direct scanning from Amazon S3 and integrates with GitHub Actions, GitLab, and Pre-commit pipelines.
Veritensor is modular. Install only what you need to keep your environment lightweight:
| Option | Command | Use Case |
|---|---|---|
| Core | pip install veritensor |
Base model scanning (Pickle, Keras, Safetensors) |
| Data | pip install veritensor[data] |
Datasets (Parquet, TSV, Pandas support) |
| RAG | pip install veritensor[rag] |
Documents (PDF, DOCX, PPTX) |
| PII | pip install veritensor[pii] |
PII detection |
| AWS | pip install veritensor[aws] |
Direct scanning from S3 buckets |
| All | pip install veritensor[all] |
Full suite for enterprise security |
docker pull arseniibrazhnyk/veritensor:latestCheck a file or directory for malware:
veritensor scan ./models/bert-base.ptExample Output:
╭────────────────────────────────╮
│ 🛡️ Veritensor Security Scanner │
╰────────────────────────────────╯
Scan Results
┏━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ File ┃ Status ┃ Threats / Details ┃ SHA256 (Short) ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ model.pt │ FAIL │ CRITICAL: os.system (RCE Detected) │ a1b2c3d4... │
└──────────────┴────────┴──────────────────────────────────────┴────────────────┘
❌ BLOCKING DEPLOYMENT
Ensure the file on your disk matches the official version from the registry (detects tampering):
veritensor scan ./pytorch_model.bin --repo meta-llama/Llama-2-7bVeritensor automatically reads metadata from safetensors and GGUF files. If a model has a Non-Commercial license (e.g., cc-by-nc-4.0), it will raise a HIGH severity alert.
To override this (Break-glass mode), use:
veritensor scan ./model.safetensors --forceVeritensor uses streaming to handle huge files. It samples 10k rows by default for speed.
veritensor scan ./data/train.parquet --full-scanCheck code cells, markdown, and saved outputs for threats:
veritensor scan ./research/experiment.ipynbScan remote assets without manual downloading:
veritensor scan s3://my-ml-bucket/models/llama-3.pklVeritensor supports industry-standard formats for integration with security dashboards and audit tools.
Generate a report compatible with GitHub Code Scanning:
veritensor scan ./models --sarif > veritensor-report.sarifGenerate a CycloneDX v1.5 SBOM to inventory your AI assets:
veritensor scan ./models --sbom > sbom.jsonFor custom parsers and SOAR automation:
veritensor scan ./models --jsonVeritensor integrates with Sigstore Cosign to cryptographically sign your Docker images only if they pass the security scan.
Generate a key pair for signing:
veritensor keygen
# Output: veritensor.key (Private) and veritensor.pub (Public)Pass the --image flag and the path to your private key (via env var).
# Set path to your private key
export VERITENSOR_PRIVATE_KEY_PATH=veritensor.key
# If scan passes -> Sign the image
veritensor scan ./models/my_model.pkl --image my-org/my-app:v1.0.0Before deploying, verify the signature to ensure the model was scanned:
cosign verify --key veritensor.pub my-org/my-app:v1.0.0Add this to your .github/workflows/security.yml to block malicious models in Pull Requests:
name: AI Security Scan
on: [pull_request]
jobs:
veritensor-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4 # Recommended to use latest checkout
- name: Veritensor AI Security Platform
uses: ArseniiBrazhnyk/Veritensor@v1.4.1
with:
path: '.' # Scans everything: Models, Notebooks, Datasets, and Lock-files
force: 'false'Prevent committing malicious models to your repository. Add this to .pre-commit-config.yaml:
repos:
- repo: https://github.com/ArseniiBrazhnyk/Veritensor
rev: v1.4.1
hooks:
- id: veritensor-scan| Format | Extension | Analysis Method |
|---|---|---|
| Models | .pt, .pth, .bin, .pkl, .joblib, .h5, .keras, .safetensors, .gguf, .whl |
AST Analysis, Pickle VM Emulation, Metadata Validation |
| Datasets | .parquet, .csv, .tsv, .jsonl, .ndjson, .ldjson |
Streaming Regex Scan (URLs, Injections, PII) |
| Notebooks | .ipynb |
JSON Structure Analysis + Code AST + Markdown Phishing |
| RAG Docs | .pdf, .docx, .pptx, .txt, .md |
Document Object Model (DOM) Text Extraction |
You can customize security policies by creating a veritensor.yaml file in your project root.
Pro Tip: You can use regex: prefix for flexible matching.
# veritensor.yaml
# 1. Security Threshold
# Fail the build if threats of this severity (or higher) are found.
# Options: CRITICAL, HIGH, MEDIUM, LOW.
fail_on_severity: CRITICAL
# 2. Dataset Scanning
# Sampling limit for quick scans (default: 10000)
dataset_sampling_limit: 10000
# 3. License Firewall Policy
# If true, blocks models that have no license metadata.
fail_on_missing_license: false
# List of license keywords to block (case-insensitive).
custom_restricted_licenses:
- "cc-by-nc" # Non-Commercial
- "agpl" # Viral licenses
- "research-only"
# 4. Static Analysis Exceptions (Pickle)
# Allow specific Python modules that are usually blocked by the strict scanner.
allowed_modules:
- "my_company.internal_layer"
- "sklearn.tree"
# 5. Model Whitelist (License Bypass)
# List of Repo IDs that are trusted. Veritensor will SKIP license checks for these.
# Supports Regex!
allowed_models:
- "meta-llama/Meta-Llama-3-70B-Instruct" # Exact match
- "regex:^google-bert/.*" # Allow all BERT models from Google
- "internal/my-private-model"To generate a default configuration file, run: veritensor init
Veritensor uses a decoupled signature database (signatures.yaml) to detect malicious patterns. This ensures that detection logic is separated from the core engine.
- Automatic Updates: To get the latest threat definitions, simply upgrade the package:
pip install --upgrade veritensor
- Transparent Rules: You can inspect the default signatures in
src/veritensor/engines/static/signatures.yaml. - Custom Policies: If the default rules are too strict for your use case (false positives), use
veritensor.yamlto whitelist specific modules or models.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.