Using the CLI

The THOA CLI gives you full control over workloads, compute resources, and environments. Ideal for automation and reproducibility.
With the CLI, you can submit jobs, manage datasets, explore available tools, and monitor job history — all from your terminal.

Tip:
You can always add --help to any command (like thoa run --help) to see a full list of options and detailed usage examples right in your terminal.

Here's a quick overview of available commands:

  • thoa run – Submit and run jobs in the cloud using custom code and compute specs
  • thoa dataset – Manage, list, and query your datasets
  • thoa tools – Search for available tools and environments
  • thoa jobs – Monitor your past jobs and query job metadata
  • thoa envs – View and manage your compute environments

How to Use the API Key

Before running any CLI commands, you need to authenticate using your THOA API key. This key allows the CLI to connect securely to your workspace and submit jobs.

1. Create an API Key

You can create an API key here:
Create an API Key

Important: Anyone with access to this key can run jobs from your account and view, modify, or delete all of your datasets and results. Store it securely and never commit it to GitHub or share it.


2. Save the API Key

To use your API key with the CLI, it needs to be set as an environment variable called THOA_API_KEY.

To save it permanently:

macOS / Linux (bash / zsh)

You have two options:

Option 1: Manually edit your shell config file

Open your shell config file:

nano ~/.bashrc

(or ~/.zshrc if you are using zsh)

Add this line at the bottom:

export THOA_API_KEY="your-api-key-here"

Then reload your config:

source ~/.bashrc

Option 2: Append it automatically from the command line

echo 'export THOA_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc

Windows (PowerShell)

Use setx to store the variable permanently:

setx THOA_API_KEY "your-api-key-here"

Important: You must restart PowerShell (or open a new terminal window) after using setx before the variable becomes available.


To set it just for the current session:

If you only want to set the API key temporarily (until you close the terminal), use the following:

macOS / Linux
export THOA_API_KEY="your-api-key-here"
Windows (PowerShell)
$env:THOA_API_KEY = "your-api-key-here"

These will only apply to the current terminal session. Once you close the terminal, you'll need to run the command again.


thoa run

The thoa run command submits jobs to Thoa's compute infrastructure. Jobs run arbitrary code in isolated, reproducible environments — any size, no infrastructure to manage.

Every thoa run invocation requires:

  • A command (--cmd)
  • Exactly one environment spec (--tools, --env-source, or --env-id)
  • Optionally, input data (--input or --input-dataset)

Specifying Input Data

There are three ways to provide input data to a job.

Upload local files with --input

Pass one or more local paths. Files and directories are both supported.

# Single directory
thoa run --input ./reads/ --cmd "..."

# Multiple specific files
thoa run --input sample1.fastq.gz --input sample2.fastq.gz --cmd "..."

# A mix of files and directories
thoa run --input ./config.yml --input ./data/ --cmd "..."

Files are uploaded before the job starts and mounted inside the container at the same relative path they have on your machine (relative to your current working directory). For example, if you run from /home/user/project/ and pass --input ./reads/sample.fastq.gz, the file will be available at reads/sample.fastq.gz inside the job.

Limit: Up to 1,000 files per job. For larger datasets, consider using --input-dataset with a dataset you have already uploaded.

Reuse an existing dataset with --input-dataset

If you have previously run a job or uploaded a dataset, you can reuse it without re-uploading. Pass the dataset UUID (visible in the UI or via thoa dataset list):

thoa run \
  --input-dataset 157d2823-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
  --cmd "python analyse.py" \
  --env-source environment.yml

This skips the upload step entirely, saving time and bandwidth. The files will be staged inside the container at the same paths they had in the original dataset.

--input and --input-dataset are mutually exclusive. Use one or the other.

No input files

If your job doesn't need input data (e.g. a data generation script), simply omit both flags:

thoa run --cmd "python generate_data.py" --tools python

Specifying an Environment

Every job needs exactly one environment source. The three options are mutually exclusive.


Option 1 — --tools (quick tool list)

The simplest option. Pass a comma-separated list of tool names from Bioconda or conda-forge. Thoa builds a conda environment with those tools before running your job.

thoa run --tools "bwa,samtools=1.9,python" --cmd "..."

Pin specific versions with =:

thoa run --tools "fastqc=0.12.1,multiqc,trimmomatic=0.39" --cmd "..."

--tools is best for quick, ad-hoc jobs where you just need a few packages. For reproducibility and complex dependency trees, prefer --env-source.


Option 2 — --env-source (conda YAML file)

Point to a local conda environment YAML file (.yml or .yaml). The full conda spec — channels, packages, pinned versions — is sent to Thoa and built before the job runs.

thoa run --env-source environment.yml --cmd "bash run_pipeline.sh"

A typical environment.yml looks like this:

name: my-analysis
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - bwa=0.7.17
  - samtools=1.18
  - fastqc=0.12.1
  - multiqc
  - pandas
  - numpy

Only .yml and .yaml files are accepted. The name: field in the YAML is ignored by Thoa — the environment is identified by its UUID.

Once built, the environment is stored and can be reused in future jobs with --env-id (see below), which avoids the rebuild step.


Option 3 — --env-id (reuse a built environment)

If you have a previously validated environment, pass its UUID directly. The job skips the environment build step and starts immediately.

thoa run \
  --env-id b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
  --input ./reads/ \
  --cmd "bash run_pipeline.sh"

To find your environment UUIDs, use:

thoa envs list

Only environments with status validated are usable. If thoa envs list shows validation_failed for an environment, inspect it with thoa envs show <uuid> -v to see the build logs.


Examples

Running a Python script

thoa run \
  --cmd "python script.py" \
  --input ./inputdata \
  --tools "python" \
  --n-cores 16 \
  --ram 64 \
  --storage 10

Running an R script

thoa run \
  --cmd "Rscript analysis.R" \
  --input ./inputdata \
  --tools "r-base" \
  --n-cores 16 \
  --ram 64 \
  --storage 10

Running a Bash script

thoa run \
  --cmd "bash pipeline.sh" \
  --input ./inputdata \
  --tools "bash,coreutils" \
  --n-cores 8 \
  --ram 32 \
  --storage 20

Using a full conda YAML environment

For pipelines with complex or pinned dependencies, define a environment.yml and pass it with --env-source:

# environment.yml
name: wgs-pipeline
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.11
  - bwa=0.7.17
  - samtools=1.18
  - picard=3.1.1
  - gatk4=4.4.0.0
  - fastqc=0.12.1
  - multiqc
thoa run \
  --cmd "bash wgs_pipeline.sh" \
  --input ./fastq_files/ \
  --env-source environment.yml \
  --n-cores 32 \
  --ram 128 \
  --storage 500

Reusing an existing environment

After running a job once, the environment is saved in Thoa. Reuse it by ID to skip the build step:

# Find the UUID of your validated environment
thoa envs list

# Submit a new job reusing it
thoa run \
  --cmd "python analyse.py" \
  --input ./new_data/ \
  --env-id b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
  --n-cores 16 \
  --ram 64 \
  --storage 100

Running a Nextflow pipeline

You can run Nextflow workflows as a single Thoa job today:

thoa run \
  --cmd "nextflow run pipeline.nf -profile standard" \
  --input ./pipeline/ \
  --tools "nextflow,openjdk" \
  --n-cores 32 \
  --ram 128 \
  --storage 500

Current status: Nextflow pipelines run as a single Thoa job — all steps execute sequentially within that job's allocated resources. We are actively working on native Nextflow integration that will split individual workflow steps into separate Thoa jobs, with each step visible, linkable, and independently re-runnable in the Thoa interface.

Running a Snakemake workflow

thoa run \
  --cmd "snakemake --cores 16 --snakefile Snakefile" \
  --input ./workflow/ \
  --tools "snakemake,python" \
  --n-cores 16 \
  --ram 64 \
  --storage 200

Current status: Snakemake workflows run as a single Thoa job. All rules execute within the resources allocated to that job. Native Thoa–Snakemake integration — where each rule becomes an individually tracked Thoa job — is in development.


Arguments

FlagDescription
--cmd (req)The shell command to run inside the compute environment.
--inputLocal path(s) to upload. Use multiple flags for multiple paths. Supports directories.
--input-datasetUUID of an existing dataset to reuse (skips upload). Mutually exclusive with --input.
--outputPath inside the container where output files will be found. Defaults to ./.
--toolsComma-separated tool names from Bioconda / conda-forge (e.g. "bwa,samtools=1.9").
--env-sourcePath to a conda .yml / .yaml file defining the environment.
--env-idUUID of an existing validated environment to reuse.
--n-coresNumber of CPU cores to allocate. Default: 16.
--ramGB of RAM to allocate. Default: 64.
--storageGB of free disk space for outputs (after inputs are mounted). Default: 200.
--download-dirLocal directory where output files are downloaded after the job finishes.
--run-asyncStream job logs to the terminal in real time. Default: false.
--job-nameOptional custom name for the job.
--job-descriptionOptional description for tracking purposes.
--dry-runValidate inputs and print a cost estimate without submitting the job.

thoa dataset

The thoa dataset command group lets you manage and interact with datasets stored in Thoa.
Datasets are collections of uploaded input files used when launching jobs.
Each dataset is uniquely versioned and stored in the cloud, so you can reuse them across multiple jobs without re-uploading the same data.


Examples

List your top 10 datasets by size

thoa dataset list --sort-by size --number 10

Download specific files from a dataset

thoa dataset download <DATASET_UUID> ./localdir --include "reads/*.fastq.gz"

Find the largest dataset by number of files

thoa dataset list --sort-by files --number 1

Commands

thoa dataset list

Lists all datasets available to your workspace.

thoa dataset list
FlagDescription
--number, -nNumber of datasets to display
--sort-by, -sSort field: created, size, or files (default: created, descending)
--descSort in descending order (default: true)

thoa dataset download <DATASET_UUID> <DESTINATION_PATH>

Downloads a dataset (or selected files from it) to a local folder.

thoa dataset download <DATASET_ID> ./outputdir
FlagDescription
--include, -iOnly download files matching these public IDs or globs
--exclude, -eExclude files matching these public IDs or globs

thoa dataset ls <DATASET_UUID>

Lists all files in a dataset by its UUID.

thoa dataset ls 157d2823-xxxx-xxxx-xxxx-xxxxxxxxxxxx
FlagDescription
--level, -lHow many levels of the file hierarchy to display (int)

thoa tools

The thoa tools command displays information about the tools currently supported by Thoa job environments.
At the moment, Thoa supports tools from Bioconda and conda-forge, with additional repositories planned.

thoa tools

Supported Repositories

RepositoryContentsPackage index
BiocondaBioinformatics-focused tools and pipelinesbioconda.github.io
conda-forgeScientific, data-processing, and general-purpose toolsconda-forge.org/packages

thoa jobs

The thoa jobs command group lets you view and inspect jobs you have previously run. Each job record includes execution status, timestamps, compute specs, and links to input and output datasets.


Examples

List your most recent jobs

thoa jobs list

Show the 5 newest jobs

thoa jobs list --number 5

Sort jobs alphabetically by status

thoa jobs list --sort-by status --asc

Commands

thoa jobs list

Displays a table of your jobs with the following columns:

ColumnDescription
NameJob name (or auto-generated ID if no name was specified)
IDFull job UUID
StartedWhen the job began executing
Statuscreated, running, completed, failed, …
Input DatasetName of the input dataset used
Output DatasetName of the output dataset produced
thoa jobs list
FlagDescription
--number, -nLimit how many jobs to display
--sort-by, -sSort field: started or status (default: started)
--ascSort in ascending order (default: descending)

thoa envs

The thoa envs command group lets you view and inspect the compute environments associated with your jobs.

When you submit a job with --tools or --env-source, Thoa builds and validates a conda environment. This environment is saved with a UUID and can be reused in future jobs using --env-id, avoiding the rebuild step.


Environment Statuses

StatusMeaning
createdEnvironment record created; build not yet started
validatingEnvironment is currently being built and validated
validatedBuild succeeded; environment is ready to use with --env-id
validation_failedBuild failed; check build logs with thoa envs show <uuid> -v

Examples

List all your environments

thoa envs list

Show the 5 most recent environments

thoa envs list --number 5

Inspect a specific environment

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Inspect with full build logs (useful for debugging failures)

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx --verbose

Commands

thoa envs list

Displays all your environments in a table:

ColumnDescription
IDFull environment UUID (use this with --env-id)
TypeEnvironment type (e.g. conda)
StatusColor-coded: green = validated, red = failed, blue = validating, yellow = created
CreatedWhen the environment was created
ToolsFirst few packages from the environment spec
thoa envs list
FlagDescription
--number, -nLimit how many environments to display

thoa envs show <UUID>

Prints full details of a single environment: ID, type, status, created/updated timestamps, and the full conda spec YAML.

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Add --verbose (or -v) to also print the build logs. This is especially useful for diagnosing why an environment failed to validate:

thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx -v
FlagDescription
--verbose, -vAlso print environment build logs

Summary

Use the CLI to:

  • Upload input data or reuse existing datasets across multiple jobs
  • Define reproducible environments with --tools, --env-source, or --env-id
  • Launch jobs at any scale without managing infrastructure
  • Inspect past jobs and environments from the terminal
  • Automate large-scale workflows with Nextflow or Snakemake