Using the CLI

The THOA CLI gives you full control over workloads, compute resources, and environments. Ideal for automation and reproducibility.
With the CLI, you can submit jobs, manage datasets, explore available tools, and monitor job history — all from your terminal.

Tip:
You can always add --help to any command (like thoa run --help) to see a full list of options and detailed usage examples right in your terminal.

Here's a quick overview of available commands:

thoa run – Submit and run jobs in the cloud using custom code and compute specs
thoa dataset – Manage, list, and query your datasets
thoa tools – Search for available tools and environments
thoa jobs – Monitor your past jobs and query job metadata
thoa envs – View and manage your compute environments

How to Use the API Key

Before running any CLI commands, you need to authenticate using your THOA API key. This key allows the CLI to connect securely to your workspace and submit jobs.

1. Create an API Key

You can create an API key here:
Create an API Key

Important: Anyone with access to this key can run jobs from your account and view, modify, or delete all of your datasets and results. Store it securely and never commit it to GitHub or share it.

2. Save the API Key

To use your API key with the CLI, it needs to be set as an environment variable called THOA_API_KEY.

To save it permanently:

macOS / Linux (bash / zsh)

You have two options:

Option 1: Manually edit your shell config file

Open your shell config file:

nano ~/.bashrc

(or ~/.zshrc if you are using zsh)

Add this line at the bottom:

export THOA_API_KEY="your-api-key-here"

Then reload your config:

source ~/.bashrc

Option 2: Append it automatically from the command line

echo 'export THOA_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc

Windows (PowerShell)

Use setx to store the variable permanently:

setx THOA_API_KEY "your-api-key-here"

Important: You must restart PowerShell (or open a new terminal window) after using setx before the variable becomes available.

To set it just for the current session:

If you only want to set the API key temporarily (until you close the terminal), use the following:

macOS / Linux

export THOA_API_KEY="your-api-key-here"

Windows (PowerShell)

$env:THOA_API_KEY = "your-api-key-here"

These will only apply to the current terminal session. Once you close the terminal, you'll need to run the command again.

thoa run

The thoa run command submits jobs to Thoa's compute infrastructure. Jobs run arbitrary code in isolated, reproducible environments — any size, no infrastructure to manage.

Every thoa run invocation requires:

A command (--cmd)
Exactly one environment spec (--tools, --env-source, or --env-id)
Optionally, input data (--input or --input-dataset)

Specifying Input Data

There are three ways to provide input data to a job.

Upload local files with `--input`

Pass one or more local paths. Files and directories are both supported.

# Single directory
thoa run --input ./reads/ --cmd "..."

# Multiple specific files
thoa run --input sample1.fastq.gz --input sample2.fastq.gz --cmd "..."

# A mix of files and directories
thoa run --input ./config.yml --input ./data/ --cmd "..."

Files are uploaded before the job starts and mounted inside the container at the same relative path they have on your machine (relative to your current working directory). For example, if you run from /home/user/project/ and pass --input ./reads/sample.fastq.gz, the file will be available at reads/sample.fastq.gz inside the job.

Limit: Up to 1,000 files per job. For larger datasets, consider using --input-dataset with a dataset you have already uploaded.

Reuse an existing dataset with `--input-dataset`

If you have previously run a job or uploaded a dataset, you can reuse it without re-uploading. Pass the dataset UUID (visible in the UI or via thoa dataset list):

thoa run \
  --input-dataset 157d2823-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
  --cmd "python analyse.py" \
  --env-source environment.yml

This skips the upload step entirely, saving time and bandwidth. The files will be staged inside the container at the same paths they had in the original dataset.

--input and --input-dataset are mutually exclusive. Use one or the other.

No input files

If your job doesn't need input data (e.g. a data generation script), simply omit both flags:

thoa run --cmd "python generate_data.py" --tools python

Specifying an Environment

Every job needs exactly one environment source. The three options are mutually exclusive.

Option 1 — `--tools` (quick tool list)

The simplest option. Pass a comma-separated list of tool names from Bioconda or conda-forge. Thoa builds a conda environment with those tools before running your job.

thoa run --tools "bwa,samtools=1.9,python" --cmd "..."

Pin specific versions with =:

thoa run --tools "fastqc=0.12.1,multiqc,trimmomatic=0.39" --cmd "..."

--tools is best for quick, ad-hoc jobs where you just need a few packages. For reproducibility and complex dependency trees, prefer --env-source.

Option 2 — `--env-source` (conda YAML file)

Point to a local conda environment YAML file (.yml or .yaml). The full conda spec — channels, packages, pinned versions — is sent to Thoa and built before the job runs.

thoa run --env-source environment.yml --cmd "bash run_pipeline.sh"

A typical environment.yml looks like this:

name: my-analysis
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - bwa=0.7.17
  - samtools=1.18
  - fastqc=0.12.1
  - multiqc
  - pandas
  - numpy

Only .yml and .yaml files are accepted. The name: field in the YAML is ignored by Thoa — the environment is identified by its UUID.

Once built, the environment is stored and can be reused in future jobs with --env-id (see below), which avoids the rebuild step.

Option 3 — `--env-id` (reuse a built environment)

If you have a previously validated environment, pass its UUID directly. The job skips the environment build step and starts immediately.

thoa run \
  --env-id b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
  --input ./reads/ \
  --cmd "bash run_pipeline.sh"

To find your environment UUIDs, use:

thoa envs list

Only environments with status validated are usable. If thoa envs list shows validation_failed for an environment, inspect it with thoa envs show <uuid> -v to see the build logs.

Examples

Running a Python script

thoa run \
  --cmd "python script.py" \
  --input ./inputdata \
  --tools "python" \
  --n-cores 16 \
  --ram 64 \
  --storage 10

Running an R script

thoa run \
  --cmd "Rscript analysis.R" \
  --input ./inputdata \
  --tools "r-base" \
  --n-cores 16 \
  --ram 64 \
  --storage 10

Running a Bash script

thoa run \
  --cmd "bash pipeline.sh" \
  --input ./inputdata \
  --tools "bash,coreutils" \
  --n-cores 8 \
  --ram 32 \
  --storage 20

Using a full conda YAML environment

For pipelines with complex or pinned dependencies, define a environment.yml and pass it with --env-source:

# environment.yml
name: wgs-pipeline
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - bwa=0.7.17
  - samtools=1.18
  - picard=3.1.1
  - gatk4=4.4.0.0
  - fastqc=0.12.1
  - multiqc

thoa run \
  --cmd "bash wgs_pipeline.sh" \
  --input ./fastq_files/ \
  --env-source environment.yml \
  --n-cores 32 \
  --ram 128 \
  --storage 500

Reusing an existing environment

After running a job once, the environment is saved in Thoa. Reuse it by ID to skip the build step:

# Find the UUID of your validated environment
thoa envs list

# Submit a new job reusing it
thoa run \
  --cmd "python analyse.py" \
  --input ./new_data/ \
  --env-id b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
  --n-cores 16 \
  --ram 64 \
  --storage 100

Running a Nextflow pipeline

You can run Nextflow workflows as a single Thoa job today:

thoa run \
  --cmd "nextflow run pipeline.nf -profile standard" \
  --input ./pipeline/ \
  --tools "nextflow,openjdk" \
  --n-cores 32 \
  --ram 128 \
  --storage 500

Current status: Nextflow pipelines run as a single Thoa job — all steps execute sequentially within that job's allocated resources. We are actively working on native Nextflow integration that will split individual workflow steps into separate Thoa jobs, with each step visible, linkable, and independently re-runnable in the Thoa interface.

Running a Snakemake workflow

thoa run \
  --cmd "snakemake --cores 16 --snakefile Snakefile" \
  --input ./workflow/ \
  --tools "snakemake,python" \
  --n-cores 16 \
  --ram 64 \
  --storage 200

Current status: Snakemake workflows run as a single Thoa job. All rules execute within the resources allocated to that job. Native Thoa–Snakemake integration — where each rule becomes an individually tracked Thoa job — is in development.

Arguments

Flag	Description
`--cmd` (req)	The shell command to run inside the compute environment.
`--input`	Local path(s) to upload. Use multiple flags for multiple paths. Supports directories.
`--input-dataset`	UUID of an existing dataset to reuse (skips upload). Mutually exclusive with `--input`.
`--output`	Path inside the container where output files will be found. Defaults to `./`.
`--tools`	Comma-separated tool names from Bioconda / conda-forge (e.g. `"bwa,samtools=1.9"`).
`--env-source`	Path to a conda `.yml` / `.yaml` file defining the environment.
`--env-id`	UUID of an existing validated environment to reuse.
`--n-cores`	Number of CPU cores to allocate. Default: 16.
`--ram`	GB of RAM to allocate. Default: 64.
`--storage`	GB of free disk space for outputs (after inputs are mounted). Default: 200.
`--download-dir`	Local directory where output files are downloaded after the job finishes.
`--run-async`	Stream job logs to the terminal in real time. Default: false.
`--job-name`	Optional custom name for the job.
`--job-description`	Optional description for tracking purposes.
`--dry-run`	Validate inputs and print a cost estimate without submitting the job.

thoa dataset

The thoa dataset command group lets you manage and interact with datasets stored in Thoa.
Datasets are collections of uploaded input files used when launching jobs.
Each dataset is uniquely versioned and stored in the cloud, so you can reuse them across multiple jobs without re-uploading the same data.

Examples

List your top 10 datasets by size

thoa dataset list --sort-by size --number 10

Download specific files from a dataset

thoa dataset download <DATASET_UUID> ./localdir --include "reads/*.fastq.gz"

Find the largest dataset by number of files

thoa dataset list --sort-by files --number 1

Commands

`thoa dataset list`

Lists all datasets available to your workspace.

thoa dataset list

Flag	Description
`--number, -n`	Number of datasets to display
`--sort-by, -s`	Sort field: `created`, `size`, or `files` (default: `created`, descending)
`--desc`	Sort in descending order (default: true)

`thoa dataset download <DATASET_UUID> <DESTINATION_PATH>`

Downloads a dataset (or selected files from it) to a local folder.

thoa dataset download <DATASET_ID> ./outputdir

Flag	Description
`--include, -i`	Only download files matching these public IDs or globs
`--exclude, -e`	Exclude files matching these public IDs or globs

`thoa dataset ls <DATASET_UUID>`

Lists all files in a dataset by its UUID.

thoa dataset ls 157d2823-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Flag	Description
`--level, -l`	How many levels of the file hierarchy to display (int)

thoa tools

The thoa tools command displays information about the tools currently supported by Thoa job environments.
At the moment, Thoa supports tools from Bioconda and conda-forge, with additional repositories planned.

thoa tools

Supported Repositories

Repository	Contents	Package index
Bioconda	Bioinformatics-focused tools and pipelines	bioconda.github.io
conda-forge	Scientific, data-processing, and general-purpose tools	conda-forge.org/packages

thoa jobs

The thoa jobs command group lets you view and inspect jobs you have previously run. Each job record includes execution status, timestamps, compute specs, and links to input and output datasets.

Examples

List your most recent jobs

thoa jobs list

Show the 5 newest jobs

thoa jobs list --number 5

Sort jobs alphabetically by status

thoa jobs list --sort-by status --asc

Commands

`thoa jobs list`

Displays a table of your jobs with the following columns:

Column	Description
Name	Job name (or auto-generated ID if no name was specified)
ID	Full job UUID
Started	When the job began executing
Status	`created`, `running`, `completed`, `failed`, …
Input Dataset	Name of the input dataset used
Output Dataset	Name of the output dataset produced

thoa jobs list

Flag	Description
`--number, -n`	Limit how many jobs to display
`--sort-by, -s`	Sort field: `started` or `status` (default: `started`)
`--asc`	Sort in ascending order (default: descending)

thoa envs

The thoa envs command group lets you view and inspect the compute environments associated with your jobs.

When you submit a job with --tools or --env-source, Thoa builds and validates a conda environment. This environment is saved with a UUID and can be reused in future jobs using --env-id, avoiding the rebuild step.

Environment Statuses

Status	Meaning
`created`	Environment record created; build not yet started
`validating`	Environment is currently being built and validated
`validated`	Build succeeded; environment is ready to use with `--env-id`
`validation_failed`	Build failed; check build logs with `thoa envs show <uuid> -v`

Examples

List all your environments

thoa envs list

Show the 5 most recent environments

thoa envs list --number 5

Inspect a specific environment

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Inspect with full build logs (useful for debugging failures)

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx --verbose

Commands

`thoa envs list`

Displays all your environments in a table:

Column	Description
ID	Full environment UUID (use this with `--env-id`)
Type	Environment type (e.g. `conda`)
Status	Color-coded: green = validated, red = failed, blue = validating, yellow = created
Created	When the environment was created
Tools	First few packages from the environment spec

thoa envs list

Flag	Description
`--number, -n`	Limit how many environments to display

`thoa envs show <UUID>`

Prints full details of a single environment: ID, type, status, created/updated timestamps, and the full conda spec YAML.

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Add --verbose (or -v) to also print the build logs. This is especially useful for diagnosing why an environment failed to validate:

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx -v

Flag	Description
`--verbose, -v`	Also print environment build logs

Summary

Use the CLI to:

Upload input data or reuse existing datasets across multiple jobs
Define reproducible environments with --tools, --env-source, or --env-id
Launch jobs at any scale without managing infrastructure
Inspect past jobs and environments from the terminal
Automate large-scale workflows with Nextflow or Snakemake

Using the CLI

How to Use the API Key

1. Create an API Key

2. Save the API Key

To save it permanently:

macOS / Linux (bash / zsh)

Windows (PowerShell)

To set it just for the current session:

macOS / Linux

Windows (PowerShell)

thoa run

Specifying Input Data

Upload local files with --input

Reuse an existing dataset with --input-dataset

No input files

Specifying an Environment

Option 1 — --tools (quick tool list)

Option 2 — --env-source (conda YAML file)

Option 3 — --env-id (reuse a built environment)

Examples

Running a Python script

Running an R script

Running a Bash script

Using a full conda YAML environment

Reusing an existing environment

Running a Nextflow pipeline

Running a Snakemake workflow

Arguments

thoa dataset

Examples

List your top 10 datasets by size

Download specific files from a dataset

Find the largest dataset by number of files

Commands

thoa dataset list

thoa dataset download <DATASET_UUID> <DESTINATION_PATH>

thoa dataset ls <DATASET_UUID>

thoa tools

Supported Repositories

thoa jobs

Examples

List your most recent jobs

Show the 5 newest jobs

Sort jobs alphabetically by status

Commands

thoa jobs list

thoa envs

Environment Statuses

Examples

List all your environments

Show the 5 most recent environments

Inspect a specific environment

Inspect with full build logs (useful for debugging failures)

Commands

thoa envs list

thoa envs show <UUID>

Summary

Upload local files with `--input`

Reuse an existing dataset with `--input-dataset`

Option 1 — `--tools` (quick tool list)

Option 2 — `--env-source` (conda YAML file)

Option 3 — `--env-id` (reuse a built environment)

`thoa dataset list`

`thoa dataset download <DATASET_UUID> <DESTINATION_PATH>`

`thoa dataset ls <DATASET_UUID>`

`thoa jobs list`

`thoa envs list`

`thoa envs show <UUID>`