Skip to content

Using Data Job Properties vs Secrets

dakodakov edited this page Aug 3, 2023 · 12 revisions

Properties vs Secrets

This article outlines when and how you should use Data Job Properties or Secrets. While both mechanisms can be used somewhat interchangeably there are certain things you should be aware of:

  • Properties are used to store state and non-sensitive data and are generally faster to access and modify. If you need to overwrite a value often, sometimes on multiple occasions during the execution of a data job - properties are the way to go. They are stored in plain text in the VDK Control Service Data Base.
  • Secrets are generally fast to access (somewhat slower than Properties), but slow to modify, as they are encrypted/decrypted during the storage/retrieval process. They are best suited for storing sensitive data - secrets, passwords, credentials, tokens, API keys, etc. They are stored in an encrypted stated in a Secure Storage - for example a Hashicorp Vault instance.

Properties Examples

You can use the "vdk properties" command to store and retrieve properties via the command line. You can set a property via the command line using the "--set" option:

vdk properties -n my-job -t my-team --set "key1" "value1"

You can get the value of a single property with the "--get" option:

vdk properties -n my-job -t my-team --get "key1"

Or get all the properties via the "--list" option:

vdk properties -n my-job -t my-team --list

Finally, you can delete a property, using the "--delete" option:

Using Properties in a data job

In a data job, you can access Job Properties via the JobInput's properties methods. In the following example we get all the properties, modify some of them and store them back, to save the last date when we processed data:

def run(job_input):
    # get the properties
    properties = job_input.get_all_properties()

    current_date = str(date.today())

    logging.info("Current date is  %s", current_date)
    if 'last_ingested_timestamp' in properties:
        logging.info("Last ingested timestamp is %s", properties['last_ingested_timestamp'])
    if ('last_ingested_timestamp' not in properties) or current_date != properties['last_ingested_timestamp']:

        logging.info("Getting data from Influx") 
        # some very complex processing goes here...
        
        # update the property value and store it
        properties['last_ingested_timestamp'] = current_date
        job_input.set_all_properties(properties)
    else:
        logging.info("Skipped ingestion")

Secrets Example

You can use the "vdk secrets" command to store and retrieve secrets via the command line. If you are using the vdk cli on a private/secure console, you can directly set a secret via the following command

vdk secrets -n my-job -t my-team --set "api_key" "<your API Key goes here>"

Alternatively you can pass just the key for you secret to the command and then you'll get prompted to enter it and it won't be kept in your console's history.

vdk secrets -n my-job -t my-team --set "api_key"

You can get the value of a single secret with the "--get" option:

vdk secrets-n my-job -t my-team --get "key1"

Or get all the secrets via the "--list" option:

vdk secrets -n my-job -t my-team --list

Finally, you can delete a secret, using the "--delete" option:

Using Secrets in a data job

In a data job, you can access Job Secrets via the JobInput's secrets methods. In the following example we'll get the value of a single secret and use it to make an authenticated REST call:

import requests
from datetime import date, timedelta
from vdk.api.job_input import IJobInput


def run(job_input: IJobInput):
    # Get the API Key from the Job Secrets
    api_key = job_input.get_secret('api_key')
    # Get yesterday's date
    yesterday_date = date.today() - timedelta(days=1)

    # Get the data
    url = "https://newsapi.org/v2/everything"
    params = {
        "q": "Taylor Swift",
        "from": yesterday_date.strftime("%Y-%m-%d"),
        "sortBy": "popularity",
        "language": "en",
        "apiKey": api_key,
    }
    response = requests.get(url, params=params)
    response.raise_for_status()
    data = response.json()

    # Process the data...

Conclusion

Congratulations, you've reached the end of this tutorial!

Clone this wiki locally