From 9d6a6d5a9178af16e4d8c03189c830216a6a2bf7 Mon Sep 17 00:00:00 2001 From: yenchiafeng Date: Thu, 11 Dec 2025 09:39:31 -0800 Subject: [PATCH] docs: Add How to Organize Goldsets in an Azure Storage Container Using Virtual Folders --- ...ets-in-an-azure-storage-container-using.md | 131 ++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 docs/access/how-to-organize-goldsets-in-an-azure-storage-container-using.md diff --git a/docs/access/how-to-organize-goldsets-in-an-azure-storage-container-using.md b/docs/access/how-to-organize-goldsets-in-an-azure-storage-container-using.md new file mode 100644 index 0000000..3e3b7bf --- /dev/null +++ b/docs/access/how-to-organize-goldsets-in-an-azure-storage-container-using.md @@ -0,0 +1,131 @@ +# How to Organize Goldsets in an Azure Storage Container Using Virtual Folders + +## Overview + +This guide explains how to organize existing “goldset” data in an Azure Storage container into an archive (or “old”) folder using Azure’s virtual folder structure. It clarifies the limitation of the Azure Portal user interface (UI) and describes the programmatic approach needed to create and manage folder-like structures. + +## Prerequisites + +- Access to the Azure subscription and resource group containing the storage account: + - Subscription: `5ec99459-f478-4662-b5ca-e10141b9940c` + - Resource group: `distillery-dev-rg` + - Storage account: `distillerydev` + - Container: `tower-demo-data` +- Permission to read and write blobs in the `tower-demo-data` container. +- A tool or SDK that can interact with Azure Blob Storage programmatically, for example: + - Azure CLI, Azure PowerShell, or + - An SDK (e.g., Python, .NET, JavaScript) or + - A storage client such as Azure Storage Explorer. + +> Note: The Slack discussion implies a programmatic approach is required but does not specify a particular tool or language. You will need to choose one based on your environment. + +## Explanation: Virtual Folders in Azure Blob Storage + +Azure Blob Storage containers do not support true folders in the same way as a traditional file system. Instead, they use **virtual folders**, which are represented by blob names that contain the `/` character. + +For example: +- Blob without folder: + `goldset1.parquet` +- Blob in a virtual folder called `archive`: + `archive/goldset1.parquet` +- Blob in a nested virtual folder: + `archive/2023/goldset1.parquet` + +The Azure Portal UI does not allow you to create folders directly, but when you upload or rename blobs with `/` in their names (via code or tools), the portal will display those segments as folders. + +## Step-by-Step: Archiving Existing Goldsets + +The Slack conversation raises the question: “How do we do this to our dozens of existing goldsets?” The general approach is: + +1. **Decide on a folder structure** + + For example: + - `archive/` for all old goldsets, or + - `archive/YYYY/` or `archive/YYYY-MM/` for time-based organization. + + Example target path for an existing goldset: + - From: `goldset_2023_01.parquet` + - To: `archive/goldset_2023_01.parquet` + +2. **List existing goldsets** + + Use your chosen tool to list blobs in the `tower-demo-data` container. + Conceptually: + - Container: `tower-demo-data` + - Path: root (all blobs that represent goldsets) + +3. **Copy or move blobs into the archive “folder”** + + Since folders are virtual, “moving” a blob means: + - Copy the blob to a new name that includes the folder path (e.g., `archive/...`). + - Delete the original blob. + + Pseudocode-style steps (tool-agnostic): + + - For each existing goldset blob (e.g., `goldset_X.parquet`): + 1. Copy from: + - Source: `tower-demo-data/goldset_X.parquet` + 2. To: + - Destination: `tower-demo-data/archive/goldset_X.parquet` + 3. Verify the copy succeeded. + 4. Delete the original `goldset_X.parquet` blob. + + After this, the Azure Portal will show an `archive` folder under the `tower-demo-data` container, containing the moved goldsets. + +4. **Verify in the Azure Portal** + + - Open the Azure Portal. + - Navigate to: + - Storage account: `distillerydev` + - Container: `tower-demo-data` + - Confirm that: + - The `archive` (or chosen folder name) appears as a folder. + - The expected goldsets are visible under that folder. + - The original top-level goldsets have been removed (if you chose to delete them). + +## Important Notes and Caveats + +- **No folder creation in the UI** + The Azure Portal UI does not allow you to explicitly create folders. Folders appear automatically when blob names contain `/`. + +- **Programmatic requirement** + Creating and organizing folders must be done programmatically or via tools that support specifying blob paths with `/`. This is what was meant in the Slack answers by: + - “Yeah only through programmatic by adding `/` right” + - “but not through the UI” + +- **One-time reorganization** + The Slack discussion notes that reorganizing existing goldsets “would only need to be done once,” but it can still be a non-trivial batch operation if there are many blobs. + +- **Naming consistency** + Decide on a consistent naming convention for archive folders (e.g., `archive/`, `archive/2024/`) before moving blobs, to avoid future rework. + +- **Potential costs and time** + Copying and deleting many blobs can incur data operation costs and may take time depending on the volume and size of your goldsets. + +## Troubleshooting Tips + +- **Folders not visible in the Portal** + - Ensure the blob names actually contain `/` (for example, `archive/goldset_1.parquet`). + - Refresh the container view in the Azure Portal. + +- **Blobs appear duplicated** + - If you see both `goldset_X.parquet` and `archive/goldset_X.parquet`, the original may not have been deleted after the copy. + - Confirm your process includes a delete step after a successful copy. + +- **Unclear how to run the move operation** + - The Slack thread does not specify which language, CLI, or tool your team uses. You will need: + - A chosen tool (e.g., Azure CLI, Azure Storage Explorer, or an SDK). + - Example commands or code snippets for that tool to: + - List blobs + - Copy blobs to a new path + - Delete the original blobs + + Additional information needed: + - Preferred automation method (CLI, PowerShell, Python, etc.). + - Any existing scripts or pipelines that already interact with `tower-demo-data`. + - Whether you want a one-time script or a recurring process to archive older goldsets automatically. + +Once those details are known, this guide can be extended with concrete commands or code samples tailored to your environment. + +--- +*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-infobot/p1734557414016369)*