Using the CLI
The THOA CLI gives you full control over workloads, compute resources, and environments. Ideal for automation and reproducibility.
With the CLI, you can submit jobs, manage datasets, explore available tools, and monitor job history — all from your terminal.
Tip:
You can always add --help to any command (like thoa run --help) to see a full list of options and detailed usage examples right in your terminal.
Here's a quick overview of available commands:
- thoa run – Submit and run jobs in the cloud using custom code and compute specs
- thoa dataset – Manage, list, and query your datasets
- thoa tools – Search for available tools and environments
- thoa jobs – Monitor your past jobs and query job metadata
- thoa envs – View and manage your compute environments
How to Use the API Key
Before running any CLI commands, you need to authenticate using your THOA API key. This key allows the CLI to connect securely to your workspace and submit jobs.
1. Create an API Key
You can create an API key here:
Create an API Key
Important: Anyone with access to this key can run jobs from your account and view, modify, or delete all of your datasets and results. Store it securely and never commit it to GitHub or share it.
2. Save the API Key
To use your API key with the CLI, it needs to be set as an environment variable called THOA_API_KEY.
To save it permanently:
macOS / Linux (bash / zsh)
You have two options:
Option 1: Manually edit your shell config file
Open your shell config file:
nano ~/.bashrc
(or ~/.zshrc if you are using zsh)
Add this line at the bottom:
export THOA_API_KEY="your-api-key-here"
Then reload your config:
source ~/.bashrc
Option 2: Append it automatically from the command line
echo 'export THOA_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc
Windows (PowerShell)
Use setx to store the variable permanently:
setx THOA_API_KEY "your-api-key-here"
Important: You must restart PowerShell (or open a new terminal window) after using setx before the variable becomes available.
To set it just for the current session:
If you only want to set the API key temporarily (until you close the terminal), use the following:
macOS / Linux
export THOA_API_KEY="your-api-key-here"
Windows (PowerShell)
$env:THOA_API_KEY = "your-api-key-here"
These will only apply to the current terminal session. Once you close the terminal, you'll need to run the command again.
thoa run
The thoa run command submits jobs to Thoa's compute infrastructure.
Jobs run arbitrary code in isolated, reproducible environments — any size, no infrastructure to manage.
Every thoa run invocation requires:
- A command (
--cmd) - Exactly one environment spec (
--tools,--env-source, or--env-id) - Optionally, input data (
--inputor--input-dataset)
Specifying Input Data
There are three ways to provide input data to a job.
Upload local files with --input
Pass one or more local paths. Files and directories are both supported.
# Single directory
thoa run --input ./reads/ --cmd "..."
# Multiple specific files
thoa run --input sample1.fastq.gz --input sample2.fastq.gz --cmd "..."
# A mix of files and directories
thoa run --input ./config.yml --input ./data/ --cmd "..."
Files are uploaded before the job starts and mounted inside the container at the same relative path they have on your machine (relative to your current working directory). For example, if you run from /home/user/project/ and pass --input ./reads/sample.fastq.gz, the file will be available at reads/sample.fastq.gz inside the job.
Limit: Up to 1,000 files per job. For larger datasets, consider using --input-dataset with a dataset you have already uploaded.
Reuse an existing dataset with --input-dataset
If you have previously run a job or uploaded a dataset, you can reuse it without re-uploading. Pass the dataset UUID (visible in the UI or via thoa dataset list):
thoa run \
--input-dataset 157d2823-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--cmd "python analyse.py" \
--env-source environment.yml
This skips the upload step entirely, saving time and bandwidth. The files will be staged inside the container at the same paths they had in the original dataset.
--input and --input-dataset are mutually exclusive. Use one or the other.
No input files
If your job doesn't need input data (e.g. a data generation script), simply omit both flags:
thoa run --cmd "python generate_data.py" --tools python
Specifying an Environment
Every job needs exactly one environment source. The three options are mutually exclusive.
Option 1 — --tools (quick tool list)
The simplest option. Pass a comma-separated list of tool names from Bioconda or conda-forge. Thoa builds a conda environment with those tools before running your job.
thoa run --tools "bwa,samtools=1.9,python" --cmd "..."
Pin specific versions with =:
thoa run --tools "fastqc=0.12.1,multiqc,trimmomatic=0.39" --cmd "..."
--tools is best for quick, ad-hoc jobs where you just need a few packages. For reproducibility and complex dependency trees, prefer --env-source.
Option 2 — --env-source (conda YAML file)
Point to a local conda environment YAML file (.yml or .yaml). The full conda spec — channels, packages, pinned versions — is sent to Thoa and built before the job runs.
thoa run --env-source environment.yml --cmd "bash run_pipeline.sh"
A typical environment.yml looks like this:
name: my-analysis
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- python=3.11
- bwa=0.7.17
- samtools=1.18
- fastqc=0.12.1
- multiqc
- pandas
- numpy
Only .yml and .yaml files are accepted. The name: field in the YAML is ignored by Thoa — the environment is identified by its UUID.
Once built, the environment is stored and can be reused in future jobs with --env-id (see below), which avoids the rebuild step.
Option 3 — --env-id (reuse a built environment)
If you have a previously validated environment, pass its UUID directly. The job skips the environment build step and starts immediately.
thoa run \
--env-id b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--input ./reads/ \
--cmd "bash run_pipeline.sh"
To find your environment UUIDs, use:
thoa envs list
Only environments with status validated are usable. If thoa envs list shows validation_failed for an environment, inspect it with thoa envs show <uuid> -v to see the build logs.
Examples
Running a Python script
thoa run \
--cmd "python script.py" \
--input ./inputdata \
--tools "python" \
--n-cores 16 \
--ram 64 \
--storage 10
Running an R script
thoa run \
--cmd "Rscript analysis.R" \
--input ./inputdata \
--tools "r-base" \
--n-cores 16 \
--ram 64 \
--storage 10
Running a Bash script
thoa run \
--cmd "bash pipeline.sh" \
--input ./inputdata \
--tools "bash,coreutils" \
--n-cores 8 \
--ram 32 \
--storage 20
Using a full conda YAML environment
For pipelines with complex or pinned dependencies, define a environment.yml and pass it with --env-source:
# environment.yml
name: wgs-pipeline
channels:
- bioconda
- conda-forge
- defaults
dependencies:
- python=3.11
- bwa=0.7.17
- samtools=1.18
- picard=3.1.1
- gatk4=4.4.0.0
- fastqc=0.12.1
- multiqc
thoa run \
--cmd "bash wgs_pipeline.sh" \
--input ./fastq_files/ \
--env-source environment.yml \
--n-cores 32 \
--ram 128 \
--storage 500
Reusing an existing environment
After running a job once, the environment is saved in Thoa. Reuse it by ID to skip the build step:
# Find the UUID of your validated environment
thoa envs list
# Submit a new job reusing it
thoa run \
--cmd "python analyse.py" \
--input ./new_data/ \
--env-id b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx \
--n-cores 16 \
--ram 64 \
--storage 100
Running a Nextflow pipeline
You can run Nextflow workflows as a single Thoa job today:
thoa run \
--cmd "nextflow run pipeline.nf -profile standard" \
--input ./pipeline/ \
--tools "nextflow,openjdk" \
--n-cores 32 \
--ram 128 \
--storage 500
Current status: Nextflow pipelines run as a single Thoa job — all steps execute sequentially within that job's allocated resources. We are actively working on native Nextflow integration that will split individual workflow steps into separate Thoa jobs, with each step visible, linkable, and independently re-runnable in the Thoa interface.
Running a Snakemake workflow
thoa run \
--cmd "snakemake --cores 16 --snakefile Snakefile" \
--input ./workflow/ \
--tools "snakemake,python" \
--n-cores 16 \
--ram 64 \
--storage 200
Current status: Snakemake workflows run as a single Thoa job. All rules execute within the resources allocated to that job. Native Thoa–Snakemake integration — where each rule becomes an individually tracked Thoa job — is in development.
Arguments
| Flag | Description |
|---|---|
--cmd (req) | The shell command to run inside the compute environment. |
--input | Local path(s) to upload. Use multiple flags for multiple paths. Supports directories. |
--input-dataset | UUID of an existing dataset to reuse (skips upload). Mutually exclusive with --input. |
--output | Path inside the container where output files will be found. Defaults to ./. |
--tools | Comma-separated tool names from Bioconda / conda-forge (e.g. "bwa,samtools=1.9"). |
--env-source | Path to a conda .yml / .yaml file defining the environment. |
--env-id | UUID of an existing validated environment to reuse. |
--n-cores | Number of CPU cores to allocate. Default: 16. |
--ram | GB of RAM to allocate. Default: 64. |
--storage | GB of free disk space for outputs (after inputs are mounted). Default: 200. |
--download-dir | Local directory where output files are downloaded after the job finishes. |
--run-async | Stream job logs to the terminal in real time. Default: false. |
--job-name | Optional custom name for the job. |
--job-description | Optional description for tracking purposes. |
--dry-run | Validate inputs and print a cost estimate without submitting the job. |
thoa dataset
The thoa dataset command group lets you manage and interact with datasets stored in Thoa.
Datasets are collections of uploaded input files used when launching jobs.
Each dataset is uniquely versioned and stored in the cloud, so you can reuse them across multiple jobs without re-uploading the same data.
Examples
List your top 10 datasets by size
thoa dataset list --sort-by size --number 10
Download specific files from a dataset
thoa dataset download <DATASET_UUID> ./localdir --include "reads/*.fastq.gz"
Find the largest dataset by number of files
thoa dataset list --sort-by files --number 1
Commands
thoa dataset list
Lists all datasets available to your workspace.
thoa dataset list
| Flag | Description |
|---|---|
--number, -n | Number of datasets to display |
--sort-by, -s | Sort field: created, size, or files (default: created, descending) |
--desc | Sort in descending order (default: true) |
thoa dataset download <DATASET_UUID> <DESTINATION_PATH>
Downloads a dataset (or selected files from it) to a local folder.
thoa dataset download <DATASET_ID> ./outputdir
| Flag | Description |
|---|---|
--include, -i | Only download files matching these public IDs or globs |
--exclude, -e | Exclude files matching these public IDs or globs |
thoa dataset ls <DATASET_UUID>
Lists all files in a dataset by its UUID.
thoa dataset ls 157d2823-xxxx-xxxx-xxxx-xxxxxxxxxxxx
| Flag | Description |
|---|---|
--level, -l | How many levels of the file hierarchy to display (int) |
thoa tools
The thoa tools command displays information about the tools currently supported by Thoa job environments.
At the moment, Thoa supports tools from Bioconda and conda-forge, with additional repositories planned.
thoa tools
Supported Repositories
| Repository | Contents | Package index |
|---|---|---|
| Bioconda | Bioinformatics-focused tools and pipelines | bioconda.github.io |
| conda-forge | Scientific, data-processing, and general-purpose tools | conda-forge.org/packages |
thoa jobs
The thoa jobs command group lets you view and inspect jobs you have previously run.
Each job record includes execution status, timestamps, compute specs, and links to input and output datasets.
Examples
List your most recent jobs
thoa jobs list
Show the 5 newest jobs
thoa jobs list --number 5
Sort jobs alphabetically by status
thoa jobs list --sort-by status --asc
Commands
thoa jobs list
Displays a table of your jobs with the following columns:
| Column | Description |
|---|---|
| Name | Job name (or auto-generated ID if no name was specified) |
| ID | Full job UUID |
| Started | When the job began executing |
| Status | created, running, completed, failed, … |
| Input Dataset | Name of the input dataset used |
| Output Dataset | Name of the output dataset produced |
thoa jobs list
| Flag | Description |
|---|---|
--number, -n | Limit how many jobs to display |
--sort-by, -s | Sort field: started or status (default: started) |
--asc | Sort in ascending order (default: descending) |
thoa envs
The thoa envs command group lets you view and inspect the compute environments associated with your jobs.
When you submit a job with --tools or --env-source, Thoa builds and validates a conda environment. This environment is saved with a UUID and can be reused in future jobs using --env-id, avoiding the rebuild step.
Environment Statuses
| Status | Meaning |
|---|---|
created | Environment record created; build not yet started |
validating | Environment is currently being built and validated |
validated | Build succeeded; environment is ready to use with --env-id |
validation_failed | Build failed; check build logs with thoa envs show <uuid> -v |
Examples
List all your environments
thoa envs list
Show the 5 most recent environments
thoa envs list --number 5
Inspect a specific environment
thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Inspect with full build logs (useful for debugging failures)
thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx --verbose
Commands
thoa envs list
Displays all your environments in a table:
| Column | Description |
|---|---|
| ID | Full environment UUID (use this with --env-id) |
| Type | Environment type (e.g. conda) |
| Status | Color-coded: green = validated, red = failed, blue = validating, yellow = created |
| Created | When the environment was created |
| Tools | First few packages from the environment spec |
thoa envs list
| Flag | Description |
|---|---|
--number, -n | Limit how many environments to display |
thoa envs show <UUID>
Prints full details of a single environment: ID, type, status, created/updated timestamps, and the full conda spec YAML.
thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Add --verbose (or -v) to also print the build logs. This is especially useful for diagnosing why an environment failed to validate:
thoa envs show b0f9fefe-xxxx-xxxx-xxxx-xxxxxxxxxxxx -v
| Flag | Description |
|---|---|
--verbose, -v | Also print environment build logs |
Summary
Use the CLI to:
- Upload input data or reuse existing datasets across multiple jobs
- Define reproducible environments with
--tools,--env-source, or--env-id - Launch jobs at any scale without managing infrastructure
- Inspect past jobs and environments from the terminal
- Automate large-scale workflows with Nextflow or Snakemake