Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions .claude/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

This repository contains code snippets demonstrating Senzing SDK V4 usage in Python, Java, and C#. These are educational examples for entity resolution tasks like loading, searching, deleting, and redo processing.

**Warning**: Only run snippets against a test Senzing database - they add/delete data and some purge the entire repository.

## Build and Run Commands

### Python

Python snippets are standalone scripts. Run directly after setting up the environment:

```bash
source <project_path>/setupEnv
export SENZING_ENGINE_CONFIGURATION_JSON='{"PIPELINE": {...}, "SQL": {...}}'
python python/loading/add_records.py
```

**Linting and formatting:**

```bash
black --line-length 120 python/
isort --profile black python/
pylint python/
flake8 python/
mypy python/
bandit python/
```

### Java

Build with Maven (requires `SENZING_PATH` environment variable):

```bash
cd java
mvn package
```

Run individual snippets:

```bash
java -cp target/sz-sdk-snippets.jar loading.LoadRecords
```

Run via SnippetRunner (creates temp repository):

```bash
java -jar target/sz-sdk-snippets.jar all # Run all
java -jar target/sz-sdk-snippets.jar loading # Run group
java -jar target/sz-sdk-snippets.jar loading.LoadViaLoop # Run specific
```

Checkstyle and SpotBugs available via Maven profiles:

```bash
mvn -P checkstyle validate
mvn -P spotbugs validate
```

### C-sharp

Build and run from `csharp/snippets` directory:

```bash
cd csharp/snippets
dotnet run --project loading/LoadRecords
```

Run via SnippetRunner:

```bash
cd csharp/runner
dotnet run --project SnippetRunner all
dotnet run --project SnippetRunner loading
```

## Environment Setup

All languages require `SENZING_ENGINE_CONFIGURATION_JSON` environment variable with connection details:

```json
{
"PIPELINE": {
"SUPPORTPATH": "/path/to/data",
"CONFIGPATH": "/path/to/etc",
"RESOURCEPATH": "/path/to/resources"
},
"SQL": {
"CONNECTION": "postgresql://user:password@host:5432:g2"
}
}
```

**Platform-specific library paths:**

- Linux: `export LD_LIBRARY_PATH=$SENZING_PATH/er/lib:$LD_LIBRARY_PATH`
- macOS: `export DYLD_LIBRARY_PATH=$SENZING_PATH/er/lib:$SENZING_PATH/er/lib/macos:$DYLD_LIBRARY_PATH`
- Windows: `set Path=%SENZING_PATH%\er\lib;%Path%`

## Code Architecture

### Snippet Categories (same structure across all languages)

| Category | Purpose |
| ----------------- | ------------------------------------------------ |
| `initialization/` | Engine setup, factory creation, priming, purging |
| `configuration/` | Data source registration, config management |
| `loading/` | Record ingestion (loop, queue, futures patterns) |
| `deleting/` | Record removal with various concurrency patterns |
| `searching/` | Entity search operations |
| `redo/` | Redo record processing (continuous, with-info) |
| `stewardship/` | Force resolve/unresolve operations |
| `information/` | License, version, stats, repository info |

### Concurrency Patterns

Snippets demonstrate three main patterns:

- **Loop**: Simple sequential processing
- **Queue**: Producer-consumer with thread pool
- **Futures**: Async execution with concurrent.futures (Python) / CompletableFuture (Java) / Tasks (C#)

### Data Files

Test data in `resources/data/`:

- `load-500.jsonl` - Default load file (fits default 500-record license)
- `load-{5K,10K,25K,50K,100K}.json[l]` - Larger datasets (require license)
- `del-{500,1K,5K,10K}.jsonl` - Delete test data
- `search-{50,5K}.jsonl` - Search test data
- `*-with-errors.jsonl` - Files with intentional errors for testing

## Key Conventions

- `SZ_WITH_INFO` flag on `add_record()`/`delete_record()` returns affected entity details for downstream processing
- Always randomize input data when loading with multiple threads to avoid entity contention
- Purge repository between load tests for accurate performance measurements
- Python uses `senzing` and `senzing_abstract` packages from the Senzing SDK
3 changes: 0 additions & 3 deletions .claude/commands/senzing-code-review.md

This file was deleted.

3 changes: 3 additions & 0 deletions .claude/commands/senzing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Senzing

- Perform the steps specified by <https://raw.githubusercontent.com/senzing-factory/claude/refs/tags/v1/commands/senzing.md>
File renamed without changes.
49 changes: 31 additions & 18 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,26 +3,34 @@

version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
- package-ecosystem: github-actions
cooldown:
default-days: 21
directory: /
schedule:
interval: "daily"
- package-ecosystem: "docker"
directory: "/"
interval: daily
- package-ecosystem: docker
cooldown:
default-days: 21
directory: /
schedule:
interval: "daily"
- package-ecosystem: "pip"
directory: "/"
interval: daily
- package-ecosystem: pip
cooldown:
default-days: 21
directory: /
schedule:
interval: "daily"
- package-ecosystem: "maven"
directory: "/java/"
interval: daily
- package-ecosystem: maven
cooldown:
default-days: 21
directory: /java/
schedule:
interval: "daily"
- package-ecosystem: "nuget"
interval: daily
- package-ecosystem: nuget
cooldown:
default-days: 90
include:
include:
- "Senzing.Sdk"
directories:
- /csharp/snippets/configuration/InitDefaultConfig/
Expand Down Expand Up @@ -55,12 +63,17 @@ updates:
- /csharp/runner/SnippetRunner/
ignore:
- dependency-name: "Senzing.Sdk"
update-types: ["version-update:semver-major","version-update:semver-minor","version-update:semver-patch"]
update-types:
[
"version-update:semver-major",
"version-update:semver-minor",
"version-update:semver-patch",
]
groups:
all:
patterns:
patterns:
- "*"
exclude-patterns:
exclude-patterns:
- "Senzing.Sdk"
schedule:
interval: "daily"
interval: daily
2 changes: 2 additions & 0 deletions .github/linters/zizmor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ rules:
config:
policies:
"*": ref-pin
use-trusted-publishing:
disable: true
2 changes: 1 addition & 1 deletion .github/workflows/add-labels-standardized.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add labels standardized
name: Add labels standardized

on:
issues:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/add-to-project-senzing-dependabot.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add to project senzing github organization dependabot
name: Add to project senzing github organization dependabot

on:
pull_request:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/add-to-project-senzing.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: add to project senzing github organization
name: Add to project senzing github organization

on:
issues:
Expand Down
11 changes: 7 additions & 4 deletions .github/workflows/bandit.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
name: bandit
name: Bandit

on:
push:
branches-ignore: [main]
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand All @@ -16,7 +18,8 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.12"]
python-version: ["3.13"]
timeout-minutes: 10

steps:
- name: Run Bandit Scan
Expand Down
14 changes: 9 additions & 5 deletions .github/workflows/bearer.yaml
Original file line number Diff line number Diff line change
@@ -1,25 +1,29 @@
name: bearer
name: Bearer

on:
push:
branches-ignore: [main]
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
rule_check:
permissions:
contents: read
runs-on: ubuntu-latest
timeout-minutes: 10

steps:
- uses: actions/checkout@v6
- name: Checkout repository
uses: actions/checkout@v6
with:
persist-credentials: false

- name: Bearer
uses: bearer/bearer-action@v2
with:
skip-rule: "java_lang_information_leakage"
skip-rule: "java_lang_information_leakage,java_lang_path_using_user_input,python_lang_path_traversal"
13 changes: 8 additions & 5 deletions .github/workflows/black.yaml
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
name: black
name: Black

on:
push:
branches-ignore: [main]
pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
black:
name: black Python ${{ matrix.python-version }}
name: Black Python ${{ matrix.python-version }}
permissions:
contents: read
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12", "3.13"]
timeout-minutes: 10

steps:
- name: Checkout repository
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/claude-pr-review.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
name: Claude PR Review

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

on:
pull_request:
types: [opened, synchronize]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/csharp-darwin-snippets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
- cron: "15 7 * * *"
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/csharp-linux-snippets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ on:
schedule:
- cron: "15 7 * * *"

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/csharp-windows-snippets.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ on:
- cron: "15 7 * * *"
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.ref_name }}
cancel-in-progress: true

permissions: {}

jobs:
Expand Down
Loading
Loading