Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# How to Access the Latest Promotions Data from MongoDB for Local Debugging

## Overview

This guide explains how to obtain the latest promotions data from MongoDB so you can download it and debug or analyze it locally in a notebook environment (for example, Jupyter or similar).

## Prerequisites

- Access to the MongoDB instance or data export process used by your team.
- Permission to read the promotions data (either:
- Direct access to the MongoDB collection containing promotions, or
- Access to the latest raw file export of the promotions data).
- A local environment set up for analysis (for example, Python with Jupyter Notebook, and relevant libraries such as `pymongo` or `pandas`).

> Note: The original Slack exchange implies that data is often shared as a “promotions file” or “latest raw file,” but does not specify the exact MongoDB database name, collection name, or export location. These details will need to be confirmed within your team.

## Instructions

### Option 1: Use the Promotions Table (Preferred)

1. **Identify the promotions data source**
- Confirm with your data or engineering team:
- The database name that contains promotions (for example, `marketing`, `promotions_db`, etc.).
- The collection or table name (for example, `promotions`).
- Determine whether there is an existing export process that regularly writes a “promotions file” (for example, a `.json`, `.csv`, or `.parquet` file) to a shared location (such as Amazon S3, Google Cloud Storage, or a network drive).

2. **Request or locate the latest promotions file**
- If your team already exports this data:
- Ask for the path or URL to the latest promotions file (for example, `s3://your-bucket/path/to/latest_promotions.json`).
- If the file is provided manually:
- Coordinate with the responsible teammate (for example, a data engineer) to receive the latest promotions file.

3. **Download the file locally**
- Use the appropriate method depending on where the file is stored:
- Cloud storage (example):
```bash
aws s3 cp s3://your-bucket/path/to/latest_promotions.json ./latest_promotions.json
```
- Shared drive: copy the file via your file browser or mapped network drive.

4. **Load the file into your notebook**
- Example in Python for a JSON file:
```python
import pandas as pd

df = pd.read_json("latest_promotions.json", lines=True) # use lines=True if it is JSON Lines
df.head()
```
- Example in Python for a CSV file:
```python
import pandas as pd

df = pd.read_csv("latest_promotions.csv")
df.head()
```

5. **Run transformations locally**
- Apply the same transformation logic locally that is used in production (for example, the same Python scripts, SQL queries, or transformation framework).
- Validate your transformations by comparing local results to expected outputs or production tables.

### Option 2: Export the Latest Raw File Directly from MongoDB

If you do not have a pre-exported promotions file, you can export the latest raw data from MongoDB and then run transformations locally.

1. **Confirm collection and “latest” criteria**
- Ask your team for:
- The MongoDB database name (for example, `marketing_db`).
- The collection name that stores promotions (for example, `promotions_raw`).
- The field used to determine “latest” (for example, `created_at`, `updated_at`, or `ingestion_timestamp`).

2. **Connect to MongoDB**
- Obtain the MongoDB connection string (for example, `mongodb+srv://user:password@cluster-url/dbname`).
- Use the MongoDB command-line tools or a driver (for example, `pymongo` in Python).

3. **Export the latest promotions data**
- Example using `mongoexport` (command-line tool):
```bash
mongoexport \
--uri="YOUR_MONGODB_URI" \
--collection=promotions_raw \
--out=latest_promotions.json
```
- To filter for the most recent records, you may need a query (this depends on your schema). For example, if you only want records from the last day:
```bash
mongoexport \
--uri="YOUR_MONGODB_URI" \
--collection=promotions_raw \
--query='{"ingestion_date": {"$gte": {"$date": "2025-12-10T00:00:00Z"}}}' \
--out=latest_promotions.json
```
- Adjust the query to match your actual “latest” definition and field names.

4. **Load and debug in a notebook**
- Use the same loading examples as in Option 1 to bring `latest_promotions.json` into your notebook.
- Run and debug your transformations locally.

## Important Notes and Caveats

- **Missing details**:
The Slack conversation does not specify:
- The exact MongoDB database or collection names.
- The field used to define “latest.”
- The storage location or format of the “promotions file.”

These must be clarified with your team before you can implement the steps above.

- **Data volume**:
Exporting the entire promotions collection may produce a large file. Consider filtering by date or other criteria if you only need recent data.

- **Data sensitivity**:
Promotions data may contain sensitive or proprietary information. Ensure you follow your organization’s data handling and security policies when downloading and storing files locally.

- **Transformation parity**:
To accurately debug, use the same code, configuration, and dependencies locally that are used in your production or staging environment.

## Troubleshooting

- **You cannot find the promotions table or collection**
- Confirm the exact database and collection names with your data engineering or analytics team.
- Ask whether promotions are stored in a “raw” collection and then transformed into a “promotions” table downstream.

- **You do not know where the latest promotions file is stored**
- Ask the teammate who typically provides the file (for example, “I can get the latest promotions file to you in a min”) to document:
- The storage location (S3 bucket, shared drive, etc.).
- The naming convention (for example, `promotions_YYYYMMDD.json`).
- The schedule for updates.

- **File format issues when loading into a notebook**
- If `pd.read_json` or `pd.read_csv` fails:
- Check whether the file is JSON Lines (one JSON object per line) and set `lines=True`.
- Confirm the file encoding (for example, UTF-8).
- Open a small sample of the file in a text editor to inspect its structure.

- **Performance problems with large files**
- Sample a subset of the data:
```python
import pandas as pd

df_iter = pd.read_json("latest_promotions.json", lines=True, chunksize=10000)
df_sample = next(df_iter)
```
- Filter the export in MongoDB to only include the subset you need for debugging (for example, a specific date range or campaign).

If any of the required details (database name, collection name, export location, or “latest” definition) are unclear, coordinate with your data engineering or analytics team to fill in those gaps before proceeding.

---
*Source: [Original Slack thread](https://distylai.slack.com/archives/impl-tower-infobot/p1737576658573569)*