Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 1 addition & 14 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ make shellcheck
- `roles/dev-scripts/install-dev/files/pull-secret.json`: OpenShift pull secret

#### Kcli Method
- `vars/kcli-install.yml`: Variable override file for persistent configuration
- `vars/kcli.yml`: Variable override file for persistent configuration
- `roles/kcli/kcli-install/files/pull-secret.json`: OpenShift pull secret
- SSH key automatically read from `~/.ssh/id_ed25519.pub` on ansible controller

Expand Down Expand Up @@ -150,18 +150,13 @@ The repository includes comprehensive README files in `deploy/openshift-clusters

## Development Guidelines and Standards

### Critical Repository Structure Rules

**IMPORTANT**: The `kcli/` directory is included for reference only and should NEVER be modified. It contains the upstream kcli tool that we integrate with, but all development work happens in the `deploy/` and `docs/` directories.

### File Organization

**Development Areas:**
- **`deploy/`**: All deployment automation and infrastructure code
- `deploy/aws-hypervisor/`: AWS hypervisor setup scripts
- `deploy/openshift-clusters/`: OpenShift cluster deployment with Ansible
- **`docs/`**: Project documentation for different topologies
- **`kcli/`**: **READ-ONLY** - Reference copy of upstream kcli tool (DO NOT MODIFY)

### Coding Standards

Expand Down Expand Up @@ -213,7 +208,6 @@ The repository includes comprehensive README files in `deploy/openshift-clusters
### Development Workflow Rules

#### When Making Changes
- **NEVER modify anything in the `kcli/` directory** - it's reference material only
- Focus changes on `deploy/` scripts and `docs/` documentation
- Consider impact on multiple virtualization providers when updating deployment scripts
- Test deployment scenarios end-to-end
Expand All @@ -222,13 +216,6 @@ The repository includes comprehensive README files in `deploy/openshift-clusters
- Check for credential exposure in logs or output
- Validate Ansible playbooks and shell scripts before committing

#### Working with kcli Integration
- Use `kcli/` directory as reference for understanding kcli capabilities
- Study `kcli/kvirt/providers/` to understand provider implementations
- Reference `kcli/kvirt/cluster/openshift/` for OpenShift deployment patterns
- Check `kcli/samples/` for configuration examples
- **Remember**: Read from kcli for understanding, implement in `deploy/` for our use

### Dependencies and Configuration

#### Dependencies
Expand Down
2 changes: 1 addition & 1 deletion deploy/openshift-clusters/README-external-host.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ See [hands-off deployment](../aws-hypervisor/README.md#automated-rhsm-registrati

#### Option B: Local Variable File
```bash
cp vars/init-host.yml.sample vars/init-host.yml.local
cp vars/init-host.yml vars/init-host.yml.local
# Edit vars/init-host.yml.local with your credentials
```

Expand Down
10 changes: 5 additions & 5 deletions deploy/openshift-clusters/README-kcli.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ You can configure the deployment using any combination of these methods (in prec
1. **Command line variables** (highest precedence)
2. **Playbook vars section**
3. **vars/kcli.yml** (user configuration file)
4. **Role defaults** (lowest precedence) (`roles/kcli/kcli-install/defaults/main.yml`)
4. **Role defaults** (lowest precedence) (`vars/kcli.yml.template`)

For simple overrides, the command line is recommended. For setting your preferred permanent config, copy [kcli.yml.template](vars/kcli.yml.template) to [kcli.yml](vars/kcli.yml) and update the values to your preference. This file is not tracked by Git and will persist between TNT updates.

Expand Down Expand Up @@ -126,8 +126,8 @@ ansible-playbook kcli-install.yml \
| `vm_memory` | `32768` | Memory per node (MB) |
| `vm_numcpus` | `16` | CPU cores per node |
| `vm_disk_size` | `120` | Disk size per node (GB) |
| `ocp_version` | `"stable"` | OpenShift version channel |
| `ocp_tag` | `"4.19"` | Specific version tag |
| `ocp_version` | `"candidate"` | OpenShift version channel |
| `ocp_tag` | `"4.20"` | Specific version tag |
| `network_name` | `"default"` | kcli network name |
| `bmc_user` | `"admin"` | BMC username (fencing) |
| `bmc_password` | `"admin123"` | BMC password (fencing) |
Expand All @@ -141,7 +141,7 @@ topology: "fencing"
bmc_user: "admin"
bmc_password: "admin123"
bmc_driver: "redfish"
ksushy_port: 8000
ksushy_port: 9000
```

## 5. Deployment
Expand Down Expand Up @@ -305,7 +305,7 @@ The playbook uses reasonable defaults that work for typical kcli deployments:
| `ksushy_ip` | `192.168.122.1` | Standard libvirt network gateway |
| `bmc_user` | `admin` | From kcli-install defaults |
| `bmc_password` | `admin123` | From kcli-install defaults |
| `ksushy_port` | `8000` | From kcli-install defaults |
| `ksushy_port` | `9000` | From kcli-install defaults |

These defaults work for standard kcli deployments where VMs use the default libvirt network (`192.168.122.x/24`).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,12 @@
shell: |
# Get the SSH key Ansible is using (check in order of preference)
if [ -n "$SSH_AUTH_SOCK" ]; then
# If using ssh-agent, get the first key
ssh-add -L 2>/dev/null | head -n1 && exit 0
# If using ssh-agent, get the first key (filter out error messages)
KEY=$(ssh-add -L 2>/dev/null | grep -v "^The agent has no identities" | head -n1)
if [ -n "$KEY" ]; then
echo "$KEY"
exit 0
fi
fi

# Check common key locations
Expand All @@ -102,20 +106,44 @@
local_ssh_pubkey_content: "{{ detected_ssh_key.stdout | trim }}"
when: detected_ssh_key.rc == 0

- name: Validate SSH public key format
delegate_to: localhost
set_fact:
ssh_key_valid: "{{ local_ssh_pubkey_content is defined and local_ssh_pubkey_content is match('^(ssh-rsa|ssh-ed25519|ecdsa-sha2-nistp256|ecdsa-sha2-nistp384|ecdsa-sha2-nistp521|ssh-dss) ') }}"

- name: Warn if no valid SSH key found
debug:
msg: |
WARNING: No valid SSH public key detected on the local machine.
ProxyJump SSH access to cluster VMs will not work.
Ensure you have an SSH key pair in ~/.ssh/ (id_ed25519, id_rsa, etc.)
when: not (ssh_key_valid | default(false))

- name: Add local user's SSH key to cluster VMs
loop: "{{ parsed_vm_entries }}"
when:
- local_ssh_pubkey_content is defined
- local_ssh_pubkey_content | length > 0
- ssh_key_valid | default(false)
shell: |
set -e
VM_IP="{{ item.ip }}"

# ssh-keyscan can use bare IP for both IPv4 and IPv6
ssh-keyscan -H "$VM_IP" >> ~/.ssh/known_hosts 2>/dev/null || true

# SSH to bare IP (works for both IPv4 and IPv6)
ssh -o StrictHostKeyChecking=no core@"$VM_IP" "echo '{{ local_ssh_pubkey_content }}' >> ~/.ssh/authorized_keys && sort -u ~/.ssh/authorized_keys -o ~/.ssh/authorized_keys" || true
ssh -o StrictHostKeyChecking=no -o ConnectTimeout=10 core@"$VM_IP" \
"echo '{{ local_ssh_pubkey_content }}' >> ~/.ssh/authorized_keys && sort -u ~/.ssh/authorized_keys -o ~/.ssh/authorized_keys"
register: ssh_key_propagation
changed_when: false
retries: 3
delay: 5
until: ssh_key_propagation.rc == 0

- name: Display SSH key propagation results
debug:
msg: "SSH key added to {{ item.item.name }} ({{ item.item.ip }})"
loop: "{{ ssh_key_propagation.results }}"
when: ssh_key_propagation is defined

- name: Update inventory file with cluster VMs
delegate_to: localhost
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The install-dev role handles the complete setup of OpenShift bare metal developm
- `dev_scripts_path`: Path to dev-scripts directory (default: "openshift-metal3/dev-scripts")
- `dev_scripts_branch`: Git branch to use (default: "master")
- `test_cluster_name`: OpenShift cluster name (default: "ostest")
- `method`: Deployment method (default: "ipi")
- `method`: Deployment method (set by calling playbook, e.g., "ipi")

### Computed Variables (vars/main.yml)

Expand All @@ -42,10 +42,11 @@ ansible-playbook setup.yml

## Task Structure

- `dev-scripts.yml`: Dev-scripts environment setup
- `create.yml`: OpenShift cluster creation (conditional)
- `proxy.yml`: Proxy configuration setup
- `main.yml`: Orchestrates all tasks and configures aliases
- `bounce.yml`: Cluster bounce/restart operations
- `check_vars.yml`: Variable validation
- `config.yml`: Configuration setup
- `teardown.yml`: Cluster teardown operations

## Notes

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ This role follows the same authentication file conventions as the dev-scripts ro
- `vm_disk_size`: Disk size per node in GB (default: 120)

### OpenShift Version
See [defaults](../kcli-install/defaults/main.yml.template) for default values
See [defaults](../../../vars/kcli.yml.template) for default values

If you're installing a specific openshift release image, you will need to set the proper channel in ocp_version
- `ocp_version`: OpenShift version channel
Expand Down
2 changes: 1 addition & 1 deletion deploy/openshift-clusters/roles/proxy-setup/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This role enables easy access to OpenShift clusters deployed in restricted netwo
### Optional Variables

- `proxy_port`: Port for proxy service (default: 8213)
- `proxy_user`: Default user for squid configuration (default: ec2-user)
- `proxy_user`: User for squid configuration (auto-detected from system)

## Usage

Expand Down
63 changes: 62 additions & 1 deletion deploy/openshift-clusters/scripts/startup-cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,14 @@ fi
INSTANCE_ID=$(cat "${SHARED_DIR}/aws-instance-id")
echo "Starting up OpenShift cluster VMs on instance ${INSTANCE_ID}..."

# Check cluster topology from state file
CLUSTER_STATE_FILE="${SHARED_DIR}/cluster-vm-state.json"
CLUSTER_TOPOLOGY=""
if [[ -f "${CLUSTER_STATE_FILE}" ]]; then
CLUSTER_TOPOLOGY=$(grep -o '"topology":[[:space:]]*"[^"]*"' "${CLUSTER_STATE_FILE}" | cut -d'"' -f4 2>/dev/null || echo "")
echo "Detected cluster topology: ${CLUSTER_TOPOLOGY:-unknown}"
fi

# Check current instance state
INSTANCE_STATE=$(aws --region "${REGION}" ec2 describe-instances --instance-ids "${INSTANCE_ID}" --query 'Reservations[0].Instances[0].State.Name' --output text --no-cli-pager)

Expand Down Expand Up @@ -152,11 +160,64 @@ ssh "$(cat "${SHARED_DIR}/ssh_user")@${HOST_PUBLIC_IP}" << 'EOF'
echo ""
echo "You can check the cluster status as usual, depending on your setup."
echo "It might take a few minutes for the cluster to be fully ready."

# Clean up the cluster VMs list
rm -f ~/cluster-vms.txt
EOF

# Start sushy-tools BMC simulator for fencing topology
if [[ "${CLUSTER_TOPOLOGY}" == "fencing" ]]; then
echo ""
echo "Fencing topology detected. Ensuring sushy-tools BMC simulator is running..."

ssh "$(cat "${SHARED_DIR}/ssh_user")@${HOST_PUBLIC_IP}" << 'EOF'
# Check if sushy-tools container exists (dev-scripts deployment)
if sudo podman container exists sushy-tools 2>/dev/null; then
CONTAINER_STATUS=$(sudo podman inspect sushy-tools --format '{{.State.Status}}' 2>/dev/null || echo "unknown")
echo "sushy-tools container status: ${CONTAINER_STATUS}"

if [[ "${CONTAINER_STATUS}" == "running" ]]; then
echo "sushy-tools BMC simulator is already running"
else
echo "Starting sushy-tools container..."
sudo podman start sushy-tools

# Wait and verify
sleep 2
CONTAINER_STATUS=$(sudo podman inspect sushy-tools --format '{{.State.Status}}' 2>/dev/null || echo "unknown")
if [[ "${CONTAINER_STATUS}" == "running" ]]; then
echo "sushy-tools container started successfully"
else
echo "Warning: Failed to start sushy-tools container"
echo "STONITH fencing may not work properly"
echo "You can try manually: sudo podman start sushy-tools"
fi
fi
# Fallback: check for ksushy user service (kcli deployment)
elif systemctl --user list-unit-files ksushy.service &>/dev/null; then
KSUSHY_STATUS=$(systemctl --user is-active ksushy.service 2>/dev/null || echo "inactive")

if [[ "${KSUSHY_STATUS}" == "active" ]]; then
echo "ksushy BMC simulator is already running"
else
echo "Starting ksushy BMC simulator..."
systemctl --user start ksushy.service

sleep 2
if systemctl --user is-active ksushy.service &>/dev/null; then
echo "ksushy BMC simulator started successfully"
else
echo "Warning: Failed to start ksushy service"
echo "STONITH fencing may not work properly"
fi
fi
else
echo "Warning: No BMC simulator found (sushy-tools container or ksushy service)"
echo "STONITH fencing may not work properly"
fi
EOF
fi

echo ""
echo "OpenShift cluster startup completed successfully!"
echo "If you need to redeploy the cluster, use: make redeploy-cluster"
1 change: 1 addition & 0 deletions helpers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ The `build-and-patch-resource-agents.yml` playbook automates the entire workflow
# From the deploy/ directory
# Simplest, no customization. Uses resource-agents repo, main branch, auto sets next version
make patch-nodes
```

#### Using Ansible Directly

Expand Down