Skip to content

Workflow Basics

Learn the fundamentals of writing Dagu workflows.

Your First Workflow

Create hello.yaml:

yaml
steps:
  - command: echo "Hello from Dagu!"

Run it:

bash
dagu start hello.yaml

Workflow Structure

A complete workflow contains:

yaml
# Metadata
name: data-pipeline
description: Process daily data
tags: [etl, production]

# Configuration  
schedule: "0 2 * * *"
params:
  - DATE: ${DATE:-today}

# Steps
steps:
  - name: process
    command: python process.py ${DATE}

# Handlers
handlerOn:
  failure:
    command: notify-error.sh

Steps

The basic unit of execution.

Step Names

Step names are optional. When omitted, Dagu automatically generates names based on the step type:

yaml
steps:
  - command: echo "First step"              # Auto-named: cmd_1
  - script: |                      # Auto-named: script_2
      echo "Multi-line"
      echo "Script"
  - name: explicit-name            # Explicit name
    command: echo "Third step"
  - type: http                     # Auto-named: http_4
    config:
      url: https://api.example.com
  - call: child-workflow            # Auto-named: dag_5

Auto-generated names follow the pattern {type}_{number}:

  • cmd_N - Command steps
  • script_N - Script steps
  • http_N - HTTP steps
  • dag_N - DAG steps
  • container_N - Docker/container steps
  • ssh_N - SSH steps
  • mail_N - Mail steps
  • jq_N - JQ steps

For parallel steps (see below), the pattern is parallel_{group}_{type}_{index}.

Shorthand Command Syntax

For simple commands, you can use an even more concise syntax:

yaml
steps:
  - command: echo "Hello World"
  - command: ls -la
  - command: python script.py

This is equivalent to:

yaml
type: graph
steps:
  - name: step 1
    command: echo "Hello World"
  - name: step 2
    command: ls -la
    depends: step 1
  - name: step 3
    command: python script.py
    depends: step 2

Multiple Commands

Multiple commands share the same step configuration:

yaml
steps:
  - name: build-and-test
    command:
      - npm install
      - npm run build
      - npm test
    env:
      - NODE_ENV: production
    workingDir: /app
    retryPolicy:
      limit: 3

Instead of duplicating env, workingDir, retryPolicy, preconditions, container, etc. across multiple steps, combine commands into one step.

Commands run in order and stop on first failure. Retries restart from the first command.

Trade-off: You lose the ability to retry or resume from the middle of the command list. If you need granular control over individual command retries, use separate steps.

Supported step types: shell, command, docker, container, ssh

Not supported: jq, http, archive, mail, github_action, dag (these only accept single commands)

Multi-line Scripts

yaml
steps:
  - script: |
      #!/bin/bash
      set -e

      echo "Processing..."
      python analyze.py data.csv
      echo "Complete"

If you omit shell, Dagu uses the interpreter declared in the script's shebang (#!) when present.

Shell Selection

Set a default shell for every step at the DAG level, and override it per step when needed:

yaml
shell: ["/bin/bash", "-e", "-u"]  # Default shell + args for the whole workflow
steps:
  - name: bash-task
    command: echo "Runs with bash -e -u"

  - name: zsh-override
    shell: /bin/zsh                # Step-level override
    command: echo "Uses zsh instead"

The shell value accepts either a string ("bash -e") or an array (["bash", "-e"]). Arrays avoid quoting issues when you need multiple flags.

When you omit a step-level shell, Dagu runs through the DAG shell (or system default) and automatically adds -e on Unix-like shells so scripts stop on first error. If you explicitly set shell on a step, include -e yourself if you want the same errexit behavior.

yaml
steps:
  - shell: python3
    script: |
      import pandas as pd
      df = pd.read_csv('data.csv')
      print(df.head())

Dependencies

Steps run sequentially by default. Use depends for parallel execution or to control order.

yaml
steps:
  - name: download
    command: wget data.csv
    
  - name: process
    command: python process.py
    
  - name: upload
    command: aws s3 cp output.csv s3://bucket/

Parallel Execution

You can run steps in parallel using explicit dependencies:

yaml
type: graph
steps:
  - name: setup
    command: echo "Setup"

  - name: task1
    command: echo "Task 1"
    depends: setup

  - name: task2
    command: echo "Task 2"
    depends: setup

  - name: finish
    command: echo "All tasks complete"
    depends: [task1, task2]

Working Directory

Set where commands execute:

yaml
steps:
  - name: in-project
    workingDir: /home/user/project
    command: python main.py
    
  - name: in-data
    workingDir: /data/input
    command: ls -la

Environment Variables

System Environment Security

Dagu filters environment variables passed to step processes for security:

  • System variables work in DAG YAML for expansion: ${VAR}
  • Only filtered variables are passed to step execution environment
  • Whitelisted: PATH, HOME, LANG, TZ, SHELL
  • Allowed Prefixes: DAGU_*, LC_*, DAG_*

To make other variables available in step processes, define them in env.

Global Environment

yaml
env:
  - API_KEY: secret123
  - ENV: production
  - AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID}  # Explicit reference required

steps:
  - name: use-env
    command: echo "Running in $ENV"

Step-Level Environment

Steps can have their own environment variables that override DAG-level ones:

yaml
env:
  - ENV: production

steps:
  - name: dev-test
    command: echo "Running in $ENV"
    env:
      - ENV: development  # Overrides DAG-level
      - TEST_FLAG: true
    # Output: Running in development

Load from .env Files

yaml
dotenv:
  - .env
  - .env.production

steps:
  - name: use-dotenv
    command: echo $DATABASE_URL

Capturing Output

Store command output in variables:

yaml
steps:
  - name: get-version
    command: git rev-parse --short HEAD
    output: VERSION
    
  - name: build
    command: docker build -t app:${VERSION} .

Basic Error Handling

Continue on Failure

yaml
steps:
  - name: optional-step
    command: maybe-fails.sh
    continueOn:
      failure: true
      
  - name: always-runs
    command: cleanup.sh

Simple Retry

yaml
steps:
  - name: flaky-api
    command: curl https://unstable-api.com
    retryPolicy:
      limit: 3

Timeouts

Prevent steps from running forever:

yaml
steps:
  - name: long-task
    command: echo "Processing data"
    timeoutSec: 300  # 5 minutes

Step Descriptions

Document your steps:

yaml
steps:
  - name: etl-process
    description: |
      Extract data from API, transform to CSV,
      and load into data warehouse
    command: python etl.py

Tags and Organization

Group related workflows:

yaml
name: customer-report
tags: 
  - reports
  - customer
  - daily

group: Analytics  # UI grouping

See Also

Released under the MIT License.