diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/README.md b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/README.md new file mode 100644 index 00000000..a90e0549 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/README.md @@ -0,0 +1,141 @@ +# Argo Workflows + +These manifests install a minimal Argo Workflows control plane into the shared `credreg-staging` namespace. The controller and server components rely on a shared PostgreSQL database (for example, the RDS modules under `terraform/environments/eks`) for workflow persistence. + +## Components +- `externalsecret.yaml` – syncs the AWS Secrets Manager entry `credreg-argo-workflows` into a Kubernetes Secret named `argo-postgres`. +- `configmap.yaml` – controller configuration that enables Postgres-based persistence; set the host/database here, while credentials come from the synced secret. +- `rbac.yaml` – service accounts plus the RBAC needed by the workflow controller and Argo server. +- `workflow-controller-deployment.yaml` – runs `workflow-controller` with the standard `argoexec` image. +- `argo-server.yaml` – exposes the Argo UI/API inside the cluster on port `2746`. +- `argo-basic-auth-externalsecret.yaml` – syncs the AWS Secrets Manager entry `credreg-argo-basic-auth` (or similar) to supply the base64-encoded `user:password` string for ingress auth. +- `argo-server-ingress.yaml` – optional HTTPS ingress + certificate (via cert-manager + Let's Encrypt) and basic auth for external access to the Argo UI. + +## Before applying +1. **Provision or reference a PostgreSQL instance.** Ensure the desired environment has a reachable database endpoint. +2. **Create the Secrets Manager entry.** Create `credreg-argo-workflows` (or adjust the `remoteRef.key` value) with JSON keys `host`, `port`, `database`, `username`, `password`, `sslmode`. The External Secrets Operator will sync it into the cluster and the controller/server pick them up via env vars. +3. **Update `configmap.yaml`.** Set `persistence.postgresql.host` (and database/table names if they differ) for the target environment. Even though credentials are secret-backed, Argo still requires the host in this config. +4. **Install Argo CRDs.** Apply the upstream CRDs from https://github.com/argoproj/argo-workflows/releases (required only once per cluster) before rolling out these manifests. +5. **Configure DNS if using the ingress.** Update `argo-server-ingress.yaml` with the desired hostname(s) and point the DNS record at the ingress controller's load balancer. + +## Apply order +```bash +kubectl apply -f terraform/environments/eks/k8s-manifests-staging/argo-workflow/externalsecret.yaml +kubectl apply -f terraform/environments/eks/k8s-manifests-staging/argo-workflow/rbac.yaml +kubectl apply -f terraform/environments/eks/k8s-manifests-staging/argo-workflow/configmap.yaml +kubectl apply -f terraform/environments/eks/k8s-manifests-staging/argo-workflow/workflow-controller-deployment.yaml +kubectl apply -f terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server.yaml +# Optional ingress / certificate +kubectl apply -f terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-basic-auth-externalsecret.yaml +kubectl apply -f terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server-ingress.yaml +``` + +Once the `argo-postgres` secret is synced and the controller connects to Postgres successfully, `kubectl get wf -n credreg-staging` should show persisted workflows even after pod restarts. + +## Workflow Templates + +### index-s3-to-es + +Indexes all JSON-LD graphs from S3 directly to Elasticsearch. S3 is treated as the source of truth. + +**Architecture:** +``` +Argo Workflow (curl container) + │ + ├──1. POST to Keycloak /token (client credentials grant) + │ → Obtain fresh JWT + │ + └──2. POST /workflows/index-all-s3-to-es + │ + ▼ + Registry API + │ + ├──▶ List S3 bucket objects + │ + └──▶ For each .json file: + └──▶ Index to Elasticsearch +``` + +**Prerequisites - Keycloak Service Account:** + +1. Create a Keycloak client in the `CE-Test` realm: + - **Client ID**: e.g., `argo-workflows` + - **Client authentication**: ON (confidential client) + - **Service accounts roles**: ON + - **Authentication flow**: Only "Service accounts roles" enabled + +2. Assign the admin role to the service account: + - Go to the client → Service Account Roles + - Assign `ROLE_ADMINISTRATOR` from the `RegistryAPI` client + +3. Get the client secret: + - Go to the client → Credentials + - Update the Client Secret + + +**Required configuration:** + +1. **Keycloak Credentials Secret** (`argo-keycloak-credentials`): + - `client_id` – Keycloak client ID + - `client_secret` – Keycloak client secret + +2. **Registry API environment variables** (already in app-configmap): + - `ENVELOPE_GRAPHS_BUCKET` – S3 bucket containing JSON-LD graphs + - `ELASTICSEARCH_ADDRESS` – Elasticsearch endpoint + - `AWS_REGION` – AWS region for S3 access + +**Trigger the workflow:** + +Via Argo CLI: +```bash +argo submit --from workflowtemplate/index-s3-to-es -n credreg-staging +``` + +Via Argo REST API: +```bash +kubectl port-forward -n credreg-staging svc/argo-server 2746:2746 +BEARER=$(kubectl create token argo-server -n credreg-staging) + +curl -sk https://localhost:2746/api/v1/workflows/credreg-staging \ + -H "Authorization: Bearer $BEARER" \ + -H 'Content-Type: application/json' \ + -d '{ + "workflow": { + "metadata": { "generateName": "index-s3-to-es-" }, + "spec": { "workflowTemplateRef": { "name": "index-s3-to-es" } } + } + }' +``` + +Via Argo UI: +1. Navigate to the Argo UI +2. Go to Workflow Templates +3. Select `index-s3-to-es` +4. Click "Submit" + +**Monitor workflow:** +```bash +# List workflows +kubectl get wf -n credreg-staging + +# Watch workflow status +argo watch -n credreg-staging + +# View logs +argo logs -n credreg-staging +``` + +**Workflow parameters:** + +| Parameter | Default | Description | +|-----------|---------|-------------| +| `api-base-url` | `http://main-app.credreg-staging.svc.cluster.local:9292` | Registry API base URL | +| `keycloak-url` | `https://test-ce-kc-002.credentialengine.org/realms/CE-Test/protocol/openid-connect/token` | Keycloak token endpoint | + +Override parameters when submitting: +```bash +argo submit --from workflowtemplate/index-s3-to-es \ + -p api-base-url=http://custom-api:9292 \ + -p keycloak-url=https://other-keycloak/realms/X/protocol/openid-connect/token \ + -n credreg-staging +``` diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/TRIGGER-DUMMY-WORKFLOW.md b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/TRIGGER-DUMMY-WORKFLOW.md new file mode 100644 index 00000000..c2f32a6f --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/TRIGGER-DUMMY-WORKFLOW.md @@ -0,0 +1,79 @@ +# Triggering Argo Workflows via port-forward + curl + +Use this guide when you need to submit a workflow from your workstation without going through the ingress (no basic auth). The flow is: + +1. **Port-forward the Argo server service** + ```bash + kubectl port-forward -n credreg-staging svc/argo-server 2746:2746 + ``` + Leave this running in a separate terminal; it exposes `https://localhost:2746`. + +2. **Mint a service-account token** + ```bash + BEARER=$(kubectl create token argo-server -n credreg-staging) + ``` + Any SA with workflow submit/list permissions works (`argo-server` or `argo-workflow-controller`). + +3. **Create the workflow payload** + ```bash + cat > wf.json <<'EOF' + { + "workflow": { + "apiVersion": "argoproj.io/v1alpha1", + "kind": "Workflow", + "metadata": { "generateName": "rest-test-" }, + "spec": { + "serviceAccountName": "argo-workflow-controller", + "entrypoint": "hello", + "templates": [ + { + "name": "hello", + "container": { + "image": "public.ecr.aws/docker/library/debian:stable-slim", + "command": ["bash", "-c"], + "args": [ + "apt-get update >/dev/null && DEBIAN_FRONTEND=noninteractive apt-get install -y cowsay >/dev/null && /usr/games/cowsay \"hello from REST\"" + ] + } + } + ] + } + } + } + EOF + ``` + +4. **Submit the workflow (cURL)** + ```bash + curl -sk https://localhost:2746/api/v1/workflows/credreg-staging \ + -H "Authorization: Bearer $BEARER" \ + -H 'Content-Type: application/json' \ + -d @wf.json + ``` + A successful response echoes the workflow metadata (UID, status, etc.). + +## Trigger via Postman + +1. Keep the port-forward running: `kubectl port-forward -n credreg-staging svc/argo-server 2746:2746`. +2. Generate a Bearer token: `kubectl create token argo-server -n credreg-staging` (copy the value). +3. In Postman: + - **Method:** `POST` + - **URL:** `https://localhost:2746/api/v1/workflows/credreg-staging` + - **Headers:** + - `Authorization: Bearer ` + - `Content-Type: application/json` + - **Body:** raw JSON from `wf.json` (same payload as above). +4. Disable SSL verification in Postman (Settings → General → “SSL certificate verification” off) or import the Argo server cert so the self-signed TLS passes. +5. Send the request; you should see the workflow metadata returned. Use the same token for subsequent requests until it expires. + +5. **Verify status** + ```bash + kubectl get wf -n credreg-staging + kubectl logs -n credreg-staging wf/ + ``` + +6. **Clean up** + - `kubectl delete wf -n credreg-staging` (optional) + - Stop the `kubectl port-forward` process. + +> Tip: For ad-hoc tests, this approach avoids ingress auth entirely. When you’re ready to call the public endpoint, add the ingress basic-auth header and keep using the Bearer token in parallel. diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-basic-auth-externalsecret.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-basic-auth-externalsecret.yaml new file mode 100644 index 00000000..bd415a38 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-basic-auth-externalsecret.yaml @@ -0,0 +1,18 @@ +apiVersion: external-secrets.io/v1 +kind: ExternalSecret +metadata: + name: argo-basic-auth + namespace: credreg-staging +spec: + refreshInterval: 1h + secretStoreRef: + name: aws-secret-manager + kind: ClusterSecretStore + target: + name: argo-basic-auth + creationPolicy: Owner + data: + - secretKey: auth + remoteRef: + key: credreg-argo-basic-auth + property: auth diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-keycloak-credentials-secret.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-keycloak-credentials-secret.yaml new file mode 100644 index 00000000..3933afb5 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-keycloak-credentials-secret.yaml @@ -0,0 +1,25 @@ +# Keycloak service account credentials for Argo workflows +# Used to obtain JWT tokens via Client Credentials Grant +# +# Prerequisites: +# 1. Create a Keycloak client with: +# - Client authentication: ON (confidential) +# - Service accounts roles: ON +# - Assign ROLE_ADMINISTRATOR to the service account +# +# 2. Update the values below with your client credentials +# 3. Apply: kubectl apply -f argo-keycloak-credentials-secret.yaml +# +# Alternatively, use External Secrets Operator to sync from AWS Secrets Manager +apiVersion: v1 +kind: Secret +metadata: + name: argo-keycloak-credentials + namespace: credreg-staging + labels: + app: credential-registry + component: argo-workflow +type: Opaque +stringData: + client_id: "[KEYCLOAK_CLIENT_ID]" + client_secret: "[KEYCLOAK_CLIENT_SECRET]" diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server-ingress.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server-ingress.yaml new file mode 100644 index 00000000..895c8e6f --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server-ingress.yaml @@ -0,0 +1,46 @@ +apiVersion: cert-manager.io/v1 +kind: Certificate +metadata: + name: argo-server-cert + namespace: credreg-staging +spec: + secretName: argo-server-tls + issuerRef: + kind: ClusterIssuer + name: letsencrypt-prod + dnsNames: + - argo-staging.credentialengineregistry.org +--- +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: argo-server + namespace: credreg-staging + annotations: + cert-manager.io/cluster-issuer: letsencrypt-prod + nginx.ingress.kubernetes.io/backend-protocol: "HTTPS" + nginx.ingress.kubernetes.io/ssl-redirect: "true" + nginx.ingress.kubernetes.io/auth-type: "basic" + nginx.ingress.kubernetes.io/auth-secret: "argo-basic-auth" + nginx.ingress.kubernetes.io/auth-realm: "Authentication Required" + nginx.ingress.kubernetes.io/proxy-body-size: "10m" + nginx.ingress.kubernetes.io/proxy-read-timeout: "300" + nginx.ingress.kubernetes.io/proxy-send-timeout: "300" + nginx.ingress.kubernetes.io/whitelist-source-range: "71.212.64.155/32,129.224.215.205/32,148.222.194.113/32" +spec: + ingressClassName: nginx + tls: + - hosts: + - argo-staging.credentialengineregistry.org + secretName: argo-server-tls + rules: + - host: argo-staging.credentialengineregistry.org + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: argo-server + port: + number: 2746 diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server.yaml new file mode 100644 index 00000000..e1ff3150 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/argo-server.yaml @@ -0,0 +1,83 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: argo-server + namespace: credreg-staging + labels: + app.kubernetes.io/name: argo-server + app.kubernetes.io/part-of: argo-workflows +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: argo-server + template: + metadata: + labels: + app.kubernetes.io/name: argo-server + app.kubernetes.io/part-of: argo-workflows + spec: + serviceAccountName: argo-server + containers: + - name: argo-server + image: quay.io/argoproj/argocli:v3.7.7 + imagePullPolicy: IfNotPresent + args: + - server + - --auth-mode + - server + - --namespaced + - --namespace + - credreg-staging + - --configmap + - workflow-controller-configmap + envFrom: + - secretRef: + name: argo-postgres + ports: + - containerPort: 2746 + name: web + livenessProbe: + httpGet: + scheme: HTTPS + path: /healthz + port: web + httpHeaders: + - name: Host + value: localhost + initialDelaySeconds: 10 + periodSeconds: 30 + readinessProbe: + httpGet: + scheme: HTTPS + path: /healthz + port: web + httpHeaders: + - name: Host + value: localhost + initialDelaySeconds: 10 + periodSeconds: 15 + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 500m + memory: 512Mi +--- +apiVersion: v1 +kind: Service +metadata: + name: argo-server + namespace: credreg-staging + labels: + app.kubernetes.io/name: argo-server + app.kubernetes.io/part-of: argo-workflows +spec: + selector: + app.kubernetes.io/name: argo-server + type: ClusterIP + ports: + - name: web + port: 2746 + targetPort: web diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/bundle-ce-registry-workflow-template.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/bundle-ce-registry-workflow-template.yaml new file mode 100644 index 00000000..49dee0ef --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/bundle-ce-registry-workflow-template.yaml @@ -0,0 +1,217 @@ +apiVersion: argoproj.io/v1alpha1 +kind: WorkflowTemplate +metadata: + name: bundle-ce-registry-to-zip + namespace: credreg-staging + labels: + app: credential-registry +spec: + serviceAccountName: main-app-service-account + entrypoint: bundle-ce-registry-to-zip + arguments: + parameters: + - name: dest-bucket + value: "cer-envelope-downloads" + - name: slack-webhook + value: "" + templates: + - name: bundle-ce-registry-to-zip + inputs: + parameters: + - name: dest-bucket + - name: slack-webhook + metadata: + labels: + app: credential-registry + workflow: bundle-ce-registry-to-zip + container: + image: python:3.11-slim + command: + - /bin/sh + - -c + - | + set -e + echo "=== Bundle CE Registry JSON files to ZIP ===" + echo "Started at: $(date -u)" + echo "" + + echo "[$(date -u +%H:%M:%S)] Upgrading pip..." + pip install --quiet --upgrade pip --root-user-action=ignore + echo "[$(date -u +%H:%M:%S)] Installing boto3..." + pip install --quiet boto3 --root-user-action=ignore + echo "[$(date -u +%H:%M:%S)] Dependencies ready." + echo "" + + python3 - << 'PYEOF' + import boto3 + import zipfile + import zlib + import os + import json + import queue + import secrets + import threading + import time + import urllib.request + from concurrent.futures import ThreadPoolExecutor, as_completed + from datetime import datetime, timezone + + SOURCE_BUCKET = "cer-envelope-graphs" + SOURCE_PREFIX = "ce_registry/" + DEST_BUCKET = os.environ["DEST_BUCKET"] + SLACK_WEBHOOK = os.environ["SLACK_WEBHOOK"] + COMMUNITY_NAME = "ce_registry" + WORKERS = 64 + ZIP_PATH = "/tmp/ce_registry.zip" + + zip_key = f"{COMMUNITY_NAME}_{int(time.time())}_{secrets.token_hex(16)}.zip" + + def notify_slack(text): + if not SLACK_WEBHOOK: + return + try: + payload = json.dumps({"text": text}).encode() + req = urllib.request.Request( + SLACK_WEBHOOK, data=payload, + headers={"Content-Type": "application/json"} + ) + urllib.request.urlopen(req, timeout=10) + except Exception as e: + print(f"Warning: Slack notification failed: {e}") + + s3 = boto3.client("s3") + print(f"Source: s3://{SOURCE_BUCKET}/{SOURCE_PREFIX}") + print(f"Destination: s3://{DEST_BUCKET}/{zip_key}") + print() + + job_start = datetime.now(timezone.utc) + zip_size_mb = None + error_msg = None + + try: + # List all *.json objects + paginator = s3.get_paginator("list_objects_v2") + json_keys = [] + for page in paginator.paginate(Bucket=SOURCE_BUCKET, Prefix=SOURCE_PREFIX): + for obj in page.get("Contents", []): + if obj["Key"].endswith(".json"): + json_keys.append(obj["Key"]) + + if not json_keys: + raise RuntimeError("No .json files found at source location") + + total = len(json_keys) + print(f"[{datetime.now(timezone.utc).strftime('%H:%M:%S')}] Found {total} JSON file(s). Downloading+compressing with {WORKERS} workers...") + start_time = datetime.now(timezone.utc) + + result_queue = queue.Queue(maxsize=WORKERS * 2) + counter = {"done": 0} + lock = threading.Lock() + + def download_and_compress(key): + """Download from S3 and compress with raw DEFLATE in the worker thread.""" + obj = s3.get_object(Bucket=SOURCE_BUCKET, Key=key) + data = obj["Body"].read() + crc = zlib.crc32(data) & 0xFFFFFFFF + compressor = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -15) + compressed = compressor.compress(data) + compressor.flush() + return os.path.basename(key), compressed, crc, len(data) + + def write_precompressed(zf, filename, compressed_data, crc32, original_size): + """Write pre-compressed raw DEFLATE data directly into the ZIP (no re-compression).""" + zinfo = zipfile.ZipInfo(filename=filename) + zinfo.file_size = original_size + zinfo.compress_size = len(compressed_data) + zinfo.CRC = crc32 + zinfo.compress_type = zipfile.ZIP_DEFLATED + zinfo.flag_bits = 0 + with zf._lock: + zinfo.header_offset = zf.fp.tell() + zf.fp.write(zinfo.FileHeader()) + zf.fp.write(compressed_data) + zf.filelist.append(zinfo) + zf.NameToInfo[filename] = zinfo + zf._didModify = True + + def producer(): + with ThreadPoolExecutor(max_workers=WORKERS) as pool: + futures = {pool.submit(download_and_compress, k): k for k in sorted(json_keys)} + for future in as_completed(futures): + result_queue.put(future.result()) + result_queue.put(None) # sentinel + + producer_thread = threading.Thread(target=producer, daemon=True) + producer_thread.start() + + # Main thread writes pre-compressed entries to ZIP (pure I/O, no CPU compression) + with zipfile.ZipFile(ZIP_PATH, "w") as zf: + while True: + item = result_queue.get() + if item is None: + break + filename, compressed_data, crc32, original_size = item + write_precompressed(zf, filename, compressed_data, crc32, original_size) + with lock: + counter["done"] += 1 + done = counter["done"] + if done % 2000 == 0: + elapsed = (datetime.now(timezone.utc) - start_time).seconds + rate = done / elapsed if elapsed > 0 else 0 + eta = int((total - done) / rate) if rate > 0 else 0 + print(f" [{datetime.now(timezone.utc).strftime('%H:%M:%S')}] {done}/{total} files ({done*100//total}%) — {rate:.0f} files/s — ETA: {eta//60}m{eta%60:02d}s") + + producer_thread.join() + + zip_size_mb = os.path.getsize(ZIP_PATH) / 1024 / 1024 + print(f"\nZIP size: {zip_size_mb:.2f} MB") + + print(f"Uploading to s3://{DEST_BUCKET}/{zip_key} ...") + s3.upload_file(ZIP_PATH, DEST_BUCKET, zip_key, + ExtraArgs={"ContentType": "application/zip"}) + + print(f"\n=== Done! Uploaded s3://{DEST_BUCKET}/{zip_key} ===") + + except Exception as e: + error_msg = str(e) + print(f"\nERROR: {error_msg}") + raise + + finally: + duration = int((datetime.now(timezone.utc) - job_start).total_seconds()) + dur_str = f"{duration // 60}m{duration % 60:02d}s" + if error_msg: + msg = ( + f":x: *CE Registry ZIP bundle failed* (staging)\n" + f">*Duration:* {dur_str}\n" + f">*Error:* {error_msg}" + ) + else: + msg = ( + f":white_check_mark: *CE Registry ZIP bundle succeeded* (staging)\n" + f">*Files:* {total:,}\n" + f">*ZIP size:* {zip_size_mb:.2f} MB\n" + f">*Uploaded:* `s3://{DEST_BUCKET}/{zip_key}`\n" + f">*Duration:* {dur_str}" + ) + notify_slack(msg) + PYEOF + env: + - name: DEST_BUCKET + value: "{{inputs.parameters.dest-bucket}}" + - name: SLACK_WEBHOOK + value: "{{inputs.parameters.slack-webhook}}" + resources: + requests: + cpu: "1000m" + memory: "2Gi" + limits: + cpu: "2000m" + memory: "4Gi" + activeDeadlineSeconds: 10800 + retryStrategy: + limit: 2 + retryPolicy: OnFailure + backoff: + duration: "60s" + factor: 2 + maxDuration: "3h" diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/configmap.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/configmap.yaml new file mode 100644 index 00000000..4878162c --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/configmap.yaml @@ -0,0 +1,29 @@ +apiVersion: v1 +kind: ConfigMap +metadata: + name: workflow-controller-configmap + namespace: credreg-staging +data: + config: | + metricsConfig: + enabled: false + secure: true + telemetryConfig: + enabled: false + secure: true + namespace: credreg-staging + persistence: + archive: true + nodeStatusOffload: true + postgresql: + host: argo-workflows-staging.cwdkv5tua6nq.us-east-1.rds.amazonaws.com + port: 5432 + database: argo_workflows + tableName: argo_workflows + sslMode: require + userNameSecret: + name: argo-postgres + key: username + passwordSecret: + name: argo-postgres + key: password diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/externalsecret.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/externalsecret.yaml new file mode 100644 index 00000000..4c274281 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/externalsecret.yaml @@ -0,0 +1,42 @@ +apiVersion: external-secrets.io/v1 +kind: ExternalSecret +metadata: + name: argo-postgres + namespace: credreg-staging +spec: + refreshInterval: 1h + secretStoreRef: + name: aws-secret-manager + kind: ClusterSecretStore + target: + name: argo-postgres + creationPolicy: Owner + data: + - secretKey: host + remoteRef: + key: credreg-argo-workflows-staging + property: host + - secretKey: port + remoteRef: + key: credreg-argo-workflows-staging + property: port + - secretKey: database + remoteRef: + key: credreg-argo-workflows-staging + property: database + - secretKey: username + remoteRef: + key: credreg-argo-workflows-staging + property: username + - secretKey: password + remoteRef: + key: credreg-argo-workflows-staging + property: password + - secretKey: sslmode + remoteRef: + key: credreg-argo-workflows-staging + property: sslmode + - secretKey: argo_token + remoteRef: + key: credreg-argo-workflows-staging + property: argo_token diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/rbac.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/rbac.yaml new file mode 100644 index 00000000..f6d64339 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/rbac.yaml @@ -0,0 +1,111 @@ +apiVersion: v1 +kind: ServiceAccount +metadata: + name: argo-workflow-controller + namespace: credreg-staging + annotations: + eks.amazonaws.com/role-arn: arn:aws:iam::996810415034:role/ce-registry-eks-application-irsa-role + labels: + app.kubernetes.io/component: controller + app.kubernetes.io/part-of: argo-workflows +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: argo-server + namespace: credreg-staging + labels: + app.kubernetes.io/component: server + app.kubernetes.io/part-of: argo-workflows +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRole +metadata: + name: argo-workflow-controller + labels: + app.kubernetes.io/part-of: argo-workflows +rules: + - apiGroups: ["argoproj.io"] + resources: ["workflowtasksets", "workflowtasksets/status", "workflowartifactgctasks", "workflows", "workflows/finalizers", "workflows/status", "workflowtemplates", "cronworkflows", "clusterworkflowtemplates", "clusterworkflowtemplates/finalizers", "workflowtaskresults"] + verbs: ["*"] + - apiGroups: [""] + resources: ["configmaps", "persistentvolumeclaims", "pods", "pods/log", "pods/exec", "secrets", "serviceaccounts", "services", "events"] + verbs: ["*"] + - apiGroups: ["apps"] + resources: ["deployments", "replicasets", "statefulsets"] + verbs: ["get", "list", "watch"] + - apiGroups: ["coordination.k8s.io"] + resources: ["leases"] + verbs: ["get", "create", "update", "patch"] + - apiGroups: ["batch"] + resources: ["jobs"] + verbs: ["create", "delete", "get", "list", "watch"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: argo-workflow-controller + labels: + app.kubernetes.io/part-of: argo-workflows +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: argo-workflow-controller +subjects: + - kind: ServiceAccount + name: argo-workflow-controller + namespace: credreg-staging +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: argo-server + namespace: credreg-staging +rules: + - apiGroups: ["argoproj.io"] + resources: ["workflows", "workflowtemplates", "cronworkflows"] + verbs: ["*"] + - apiGroups: [""] + resources: ["configmaps", "secrets", "pods", "pods/log", "services"] + verbs: ["get", "list", "watch"] + - apiGroups: [""] + resources: ["events"] + verbs: ["create", "patch", "update"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: argo-server + namespace: credreg-staging +subjects: + - kind: ServiceAccount + name: argo-server + namespace: credreg-staging +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: argo-server +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: Role +metadata: + name: argo-workflow-executor + namespace: credreg-staging +rules: + - apiGroups: ["argoproj.io"] + resources: ["workflowtaskresults"] + verbs: ["create", "patch"] +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: RoleBinding +metadata: + name: argo-workflow-executor-main-app + namespace: credreg-staging +subjects: + - kind: ServiceAccount + name: main-app-service-account + namespace: credreg-staging +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: Role + name: argo-workflow-executor diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/validate-graph-resources-workflow-template.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/validate-graph-resources-workflow-template.yaml new file mode 100644 index 00000000..c1286d89 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/validate-graph-resources-workflow-template.yaml @@ -0,0 +1,176 @@ +apiVersion: argoproj.io/v1alpha1 +kind: WorkflowTemplate +metadata: + name: validate-graph-resources + namespace: credreg-staging + labels: + app: credential-registry +spec: + serviceAccountName: main-app-service-account + entrypoint: validate-graph-resources + arguments: + parameters: + - name: graph-s3-path + - name: dest-bucket + value: "cer-resources-prod" + - name: slack-webhook + value: "" + templates: + - name: validate-graph-resources + inputs: + parameters: + - name: graph-s3-path + - name: dest-bucket + - name: slack-webhook + metadata: + labels: + app: credential-registry + workflow: validate-graph-resources + container: + image: python:3.11-slim + command: + - /bin/sh + - -c + - | + set -e + echo "=== Validate Graph Resources ===" + echo "Started at: $(date -u)" + echo "" + + echo "[$(date -u +%H:%M:%S)] Upgrading pip..." + pip install --quiet --upgrade pip --root-user-action=ignore + echo "[$(date -u +%H:%M:%S)] Installing boto3..." + pip install --quiet boto3 --root-user-action=ignore + echo "[$(date -u +%H:%M:%S)] Dependencies ready." + echo "" + + python3 - << 'PYEOF' + import boto3 + import json + import os + import urllib.request + from concurrent.futures import ThreadPoolExecutor, as_completed + from datetime import datetime, timezone + from urllib.parse import urlparse + + GRAPH_S3_PATH = os.environ["GRAPH_S3_PATH"] + DEST_BUCKET = os.environ["DEST_BUCKET"] + SLACK_WEBHOOK = os.environ["SLACK_WEBHOOK"] + WORKERS = 32 + + def notify_slack(text): + if not SLACK_WEBHOOK: + return + try: + payload = json.dumps({"text": text}).encode() + req = urllib.request.Request( + SLACK_WEBHOOK, data=payload, + headers={"Content-Type": "application/json"} + ) + urllib.request.urlopen(req, timeout=10) + except Exception as e: + print(f"Warning: Slack notification failed: {e}") + + def parse_s3_path(s3_path): + parsed = urlparse(s3_path) + return parsed.netloc, parsed.path.lstrip("/") + + s3 = boto3.client("s3") + source_bucket, source_key = parse_s3_path(GRAPH_S3_PATH) + + print(f"Graph: {GRAPH_S3_PATH}") + print(f"Destination: s3://{DEST_BUCKET}/{{ctid}}.json") + print() + + job_start = datetime.now(timezone.utc) + uploaded_count = 0 + error_msg = None + + try: + print(f"[{datetime.now(timezone.utc).strftime('%H:%M:%S')}] Downloading graph...") + obj = s3.get_object(Bucket=source_bucket, Key=source_key) + graph_json = obj["Body"].read().decode("utf-8") + + data = json.loads(graph_json) + if "@graph" not in data or not isinstance(data["@graph"], list): + raise ValueError("Graph JSON does not contain a valid '@graph' array.") + + resources = [] + for element in data["@graph"]: + ctid = element.get("ceterms:ctid") + if not ctid: + # Skip blank nodes and elements without a ctid + continue + resources.append((ctid, json.dumps(element))) + + total = len(resources) + print(f"[{datetime.now(timezone.utc).strftime('%H:%M:%S')}] Validated {total} resource(s). Uploading with {WORKERS} workers...") + start_time = datetime.now(timezone.utc) + + def upload(args): + ctid, resource_json = args + s3.put_object( + Bucket=DEST_BUCKET, + Key=f"{ctid}.json", + Body=resource_json.encode("utf-8"), + ContentType="application/json" + ) + return ctid + + with ThreadPoolExecutor(max_workers=WORKERS) as pool: + futures = {pool.submit(upload, r): r[0] for r in resources} + for i, future in enumerate(as_completed(futures), 1): + ctid = future.result() + print(f" [{i}/{total}] Uploaded {ctid}.json") + + uploaded_count = total + duration = int((datetime.now(timezone.utc) - start_time).total_seconds()) + print(f"\n=== Done! {total} resource(s) uploaded to s3://{DEST_BUCKET}/ in {duration}s ===") + + except Exception as e: + error_msg = str(e) + print(f"\nERROR: {error_msg}") + raise + + finally: + duration = int((datetime.now(timezone.utc) - job_start).total_seconds()) + dur_str = f"{duration // 60}m{duration % 60:02d}s" + if error_msg: + msg = ( + f":x: *Validate Graph Resources failed* (staging)\n" + f">*Graph:* `{GRAPH_S3_PATH}`\n" + f">*Duration:* {dur_str}\n" + f">*Error:* {error_msg}" + ) + else: + msg = ( + f":white_check_mark: *Validate Graph Resources succeeded* (staging)\n" + f">*Graph:* `{GRAPH_S3_PATH}`\n" + f">*Resources uploaded:* {uploaded_count:,}\n" + f">*Destination:* `s3://{DEST_BUCKET}/`\n" + f">*Duration:* {dur_str}" + ) + notify_slack(msg) + PYEOF + env: + - name: GRAPH_S3_PATH + value: "{{inputs.parameters.graph-s3-path}}" + - name: DEST_BUCKET + value: "{{inputs.parameters.dest-bucket}}" + - name: SLACK_WEBHOOK + value: "{{inputs.parameters.slack-webhook}}" + resources: + requests: + cpu: "200m" + memory: "256Mi" + limits: + cpu: "500m" + memory: "512Mi" + activeDeadlineSeconds: 3600 + retryStrategy: + limit: 2 + retryPolicy: OnFailure + backoff: + duration: "30s" + factor: 2 + maxDuration: "1h" diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/wf.json b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/wf.json new file mode 100644 index 00000000..11fc609d --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/wf.json @@ -0,0 +1,28 @@ +{ + "workflow": { + "apiVersion": "argoproj.io/v1alpha1", + "kind": "Workflow", + "metadata": { + "generateName": "rest-test-" + }, + "spec": { + "serviceAccountName": "argo-workflow-controller", + "entrypoint": "hello", + "templates": [ + { + "name": "hello", + "container": { + "image": "public.ecr.aws/docker/library/debian:stable-slim", + "command": [ + "bash", + "-c" + ], + "args": [ + "apt-get update >/dev/null && DEBIAN_FRONTEND=noninteractive apt-get install -y cowsay >/dev/null && /usr/games/cowsay \"hello from REST\"" + ] + } + } + ] + } + } +} diff --git a/terraform/environments/eks/k8s-manifests-staging/argo-workflow/workflow-controller-deployment.yaml b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/workflow-controller-deployment.yaml new file mode 100644 index 00000000..127fb2e3 --- /dev/null +++ b/terraform/environments/eks/k8s-manifests-staging/argo-workflow/workflow-controller-deployment.yaml @@ -0,0 +1,90 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + name: workflow-controller + namespace: credreg-staging + labels: + app.kubernetes.io/name: workflow-controller + app.kubernetes.io/part-of: argo-workflows +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: workflow-controller + template: + metadata: + labels: + app.kubernetes.io/name: workflow-controller + app.kubernetes.io/part-of: argo-workflows + spec: + serviceAccountName: argo-workflow-controller + containers: + - name: workflow-controller + image: quay.io/argoproj/workflow-controller:v3.7.7 + imagePullPolicy: IfNotPresent + args: + - --configmap + - workflow-controller-configmap + - --executor-image + - quay.io/argoproj/argoexec:v3.7.7 + env: + - name: LEADER_ELECTION_IDENTITY + valueFrom: + fieldRef: + fieldPath: metadata.name + - name: ARGO_POSTGRES_HOST + valueFrom: + secretKeyRef: + name: argo-postgres + key: host + - name: ARGO_POSTGRES_PORT + valueFrom: + secretKeyRef: + name: argo-postgres + key: port + - name: ARGO_POSTGRES_DB + valueFrom: + secretKeyRef: + name: argo-postgres + key: database + - name: ARGO_POSTGRES_USERNAME + valueFrom: + secretKeyRef: + name: argo-postgres + key: username + - name: ARGO_POSTGRES_PASSWORD + valueFrom: + secretKeyRef: + name: argo-postgres + key: password + - name: ARGO_POSTGRES_SSLMODE + valueFrom: + secretKeyRef: + name: argo-postgres + key: sslmode + ports: + - containerPort: 9090 + name: metrics + livenessProbe: + httpGet: + port: 6060 + path: /healthz + failureThreshold: 3 + initialDelaySeconds: 90 + periodSeconds: 60 + timeoutSeconds: 30 + readinessProbe: + httpGet: + port: 6060 + path: /healthz + failureThreshold: 3 + initialDelaySeconds: 90 + periodSeconds: 60 + timeoutSeconds: 30 + resources: + requests: + cpu: 100m + memory: 256Mi + limits: + cpu: 500m + memory: 512Mi diff --git a/terraform/modules/eks/irsa-iam-policy-and-role.tf b/terraform/modules/eks/irsa-iam-policy-and-role.tf index df33dcba..4a33ff57 100644 --- a/terraform/modules/eks/irsa-iam-policy-and-role.tf +++ b/terraform/modules/eks/irsa-iam-policy-and-role.tf @@ -119,11 +119,13 @@ resource "aws_iam_policy" "application_policy" { "s3:DeleteObject" ], "Resource" : [ + "arn:aws:s3:::cer-envelope-graphs/*", "arn:aws:s3:::cer-envelope-graphs-staging/*", "arn:aws:s3:::cer-envelope-graphs-sandbox/*", "arn:aws:s3:::cer-envelope-graphs-sandb/*", "arn:aws:s3:::cer-envelope-graphs-prod/*", - "arn:aws:s3:::cer-envelope-downloads/*" + "arn:aws:s3:::cer-envelope-downloads/*", + "arn:aws:s3:::ocn-exports/*" ] }, { @@ -135,11 +137,13 @@ resource "aws_iam_policy" "application_policy" { "s3:GetBucketLocation" ], "Resource" : [ + "arn:aws:s3:::cer-envelope-graphs", "arn:aws:s3:::cer-envelope-graphs-staging", "arn:aws:s3:::cer-envelope-graphs-sandbox", "arn:aws:s3:::cer-envelope-graphs-sandb", "arn:aws:s3:::cer-envelope-graphs-prod", - "arn:aws:s3:::cer-envelope-downloads" + "arn:aws:s3:::cer-envelope-downloads", + "arn:aws:s3:::ocn-exports" ] } ]