Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# How to Organize Goldsets in an Azure Storage Container Using Virtual Folders

## Overview

This guide explains how to organize existing “goldset” data in an Azure Storage container into an archive (or “old”) folder using Azure’s virtual folder structure. It clarifies the limitation of the Azure Portal user interface (UI) and describes the programmatic approach needed to create and manage folder-like structures.

## Prerequisites

- Access to the Azure subscription and resource group containing the storage account:
- Subscription: `5ec99459-f478-4662-b5ca-e10141b9940c`
- Resource group: `distillery-dev-rg`
- Storage account: `distillerydev`
- Container: `tower-demo-data`
- Permission to read and write blobs in the `tower-demo-data` container.
- A tool or SDK that can interact with Azure Blob Storage programmatically, for example:
- Azure CLI, Azure PowerShell, or
- An SDK (e.g., Python, .NET, JavaScript) or
- A storage client such as Azure Storage Explorer.

> Note: The Slack discussion implies a programmatic approach is required but does not specify a particular tool or language. You will need to choose one based on your environment.

## Explanation: Virtual Folders in Azure Blob Storage

Azure Blob Storage containers do not support true folders in the same way as a traditional file system. Instead, they use **virtual folders**, which are represented by blob names that contain the `/` character.

For example:
- Blob without folder:
`goldset1.parquet`
- Blob in a virtual folder called `archive`:
`archive/goldset1.parquet`
- Blob in a nested virtual folder:
`archive/2023/goldset1.parquet`

The Azure Portal UI does not allow you to create folders directly, but when you upload or rename blobs with `/` in their names (via code or tools), the portal will display those segments as folders.

## Step-by-Step: Archiving Existing Goldsets

The Slack conversation raises the question: “How do we do this to our dozens of existing goldsets?” The general approach is:

1. **Decide on a folder structure**

For example:
- `archive/` for all old goldsets, or
- `archive/YYYY/` or `archive/YYYY-MM/` for time-based organization.

Example target path for an existing goldset:
- From: `goldset_2023_01.parquet`
- To: `archive/goldset_2023_01.parquet`

2. **List existing goldsets**

Use your chosen tool to list blobs in the `tower-demo-data` container.
Conceptually:
- Container: `tower-demo-data`
- Path: root (all blobs that represent goldsets)

3. **Copy or move blobs into the archive “folder”**

Since folders are virtual, “moving” a blob means:
- Copy the blob to a new name that includes the folder path (e.g., `archive/...`).
- Delete the original blob.

Pseudocode-style steps (tool-agnostic):

- For each existing goldset blob (e.g., `goldset_X.parquet`):
1. Copy from:
- Source: `tower-demo-data/goldset_X.parquet`
2. To:
- Destination: `tower-demo-data/archive/goldset_X.parquet`
3. Verify the copy succeeded.
4. Delete the original `goldset_X.parquet` blob.

After this, the Azure Portal will show an `archive` folder under the `tower-demo-data` container, containing the moved goldsets.

4. **Verify in the Azure Portal**

- Open the Azure Portal.
- Navigate to:
- Storage account: `distillerydev`
- Container: `tower-demo-data`
- Confirm that:
- The `archive` (or chosen folder name) appears as a folder.
- The expected goldsets are visible under that folder.
- The original top-level goldsets have been removed (if you chose to delete them).

## Important Notes and Caveats

- **No folder creation in the UI**
The Azure Portal UI does not allow you to explicitly create folders. Folders appear automatically when blob names contain `/`.

- **Programmatic requirement**
Creating and organizing folders must be done programmatically or via tools that support specifying blob paths with `/`. This is what was meant in the Slack answers by:
- “Yeah only through programmatic by adding `/` right”
- “but not through the UI”

- **One-time reorganization**
The Slack discussion notes that reorganizing existing goldsets “would only need to be done once,” but it can still be a non-trivial batch operation if there are many blobs.

- **Naming consistency**
Decide on a consistent naming convention for archive folders (e.g., `archive/`, `archive/2024/`) before moving blobs, to avoid future rework.

- **Potential costs and time**
Copying and deleting many blobs can incur data operation costs and may take time depending on the volume and size of your goldsets.

## Troubleshooting Tips

- **Folders not visible in the Portal**
- Ensure the blob names actually contain `/` (for example, `archive/goldset_1.parquet`).
- Refresh the container view in the Azure Portal.

- **Blobs appear duplicated**
- If you see both `goldset_X.parquet` and `archive/goldset_X.parquet`, the original may not have been deleted after the copy.
- Confirm your process includes a delete step after a successful copy.

- **Unclear how to run the move operation**
- The Slack thread does not specify which language, CLI, or tool your team uses. You will need:
- A chosen tool (e.g., Azure CLI, Azure Storage Explorer, or an SDK).
- Example commands or code snippets for that tool to:
- List blobs
- Copy blobs to a new path
- Delete the original blobs

Additional information needed:
- Preferred automation method (CLI, PowerShell, Python, etc.).
- Any existing scripts or pipelines that already interact with `tower-demo-data`.
- Whether you want a one-time script or a recurring process to archive older goldsets automatically.

Once those details are known, this guide can be extended with concrete commands or code samples tailored to your environment.

---
*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-infobot/p1734557414016369)*