# Built-in Modules

Most CloudWright modules are available as templates which are only useful when provided details about a specific user's resources — for example, the MySQL module is only useful when instantiated with a hostname, username, and password.

However, CloudWright ships with several built-in modules which can be used without custom configuration by any Application. This section outlines the functionality of several built-in CloudWright Modules.

# Persistent State Module

The CloudWright state module allows application to easily store and retrieve key-value pairs between runs. The key-value pairs stored in the State Module are scoped per-application — stored records are fully encrypted and cannot be shared between Applications.

One common use of the Persistent State Module is to track the timestamp of the last run of an application (for example, to query for only new database records). The get and set methods on the state module provide a simple API to store and query this state:

from datetime import datetime

state = CloudWright.get_module("state")

# Prints the timestamp of the previous run
print(state.get("last_run")) 

# Update the time that script was last run
state.set("last_run", datetime.now().timestamp())

Values are stored as JSON, so any value which is compatible with Python's JSON serialization is supported:

state = CloudWright.get_module("state")

# All of the following are valid state values
state.set("my-boolean-value", True)

state.set("my-numeric-value", 10)

state.set("my-dict-value", {"dict-key" : "dict-value"})

Values stored via the state module will be retained for 30 days, but may be expired after that window. Writing, modifying, or re-writing the value associated with a key will reset the 30 day window.

# CloudFlow (long-running applications)

CloudWright applications are deployed as serverless functions — Lambdas on AWS, or Cloud Functions on GCP. Cloud providers limit serverless function execution times to 15 minutes on AWS and 9 minutes on GCP.

While most applications are able to successfully finish well-within this execution window, some applications require more processing time — for example, an application launching a PySpark job may need to launch a job, wait 30 minutes for the job to complete, and then read the output.

The CloudFlow module allows applications to orchestrate long-running processes or sequences of steps by using repeated, quick, serverless function invocations.

The CloudFlow module runs a linear sequence of steps (implemented as plain Python functions). Each step should execute quickly and tell CloudWright to either RERUN the step, CONTINUE to the next step, or FAIL the workflow. Each step receives a context object, which is a Persistent State Module with keys and values are scoped to the current CloudFlow run. The context object can be used to pass information between steps.

For example, a CloudWright application interacting with a (generic) big-data engine may need to implement a three-step workflow:

  • Launch a job against a remote service (here, data_engine)
  • Wait for the job to complete
  • Email a particular user on completion

This application can be implemented as the three-step CloudFlow workflow shown below:

# Any code outside of a method will execute every time CloudFlow checks for step 
# completion; we can set up modules and get inputs here.
cloudflow = CloudWright.get_module("cloudflow")
data_engine = CloudWright.get_module("data_engine")
email = CloudWright.get_module("gmail")
notify_email = CloudWright.inputs.get("notification_email")

# This step launches a job against the remote 'data_engine' service, and 
# marks the step as complete
def launch_job(context):
  job_id = data_engine.start_job(custom_inputs)
    
  # Store the `job_id` in our context object so we can check the status 
  # in later steps
  context.set("job_id", job_id)
  return cloudflow.CONTINUE

# Retry this step until the remote job has either completed or failed 
def wait_completion(context):
  job_id = context.get("job_id")
  status = data_engine.get_job(job_id).status
  if status == "complete":
    return cloudflow.CONTINUE
  else if stats == 'failed'
    return cloudflow.FAIL
  else:
    return cloudflow.RERUN

# Once the job is complete, email the user to let them know
def email_user(context):
  job_id = context.get("job_id)
  email.send_email(f"Job Complete: {job_id}", "Job complete", notify_email)
  return cloudflow.CONTINUE

# `cloudflow.run` takes a list of steps and executes them in order
cloudflow.run(launch_job, wait_for_job, email_user)

CloudFlow will check for step completion approximately once per minute until the workflow is complete.