Using the CLI
The THOA CLI gives you full control over workloads, compute resources, and environments. Ideal for automation and reproducibility.
With the CLI, you can submit jobs, manage datasets, explore available tools, and monitor job history all from your terminal.
Tip:
You can always add --help to any command (like thoa run --help) to see a full list of options and detailed usage examples right in your terminal.
Here’s a quick overview of available commands:
- thoa run – Submit and run jobs in the cloud using custom code and compute specs
- thoa dataset – Manage, list, and query your datasets
- thoa tools – Search for available tools and environments
- thoa jobs – Monitor your past jobs and query job metadata
How to Use the API Key
Before running any CLI commands, you need to authenticate using your THOA API key. This key allows the CLI to connect securely to your workspace and submit jobs.
1. Create an API Key
You can create an API key here:
Create an API Key
Important: Anyone with access to this key can run jobs from your account and view, modify, or delete all of your datasets and results. Store it securely and never commit it to GitHub or share it.
2. Save the API Key
To use your API key with the CLI, it needs to be set as an environment variable called THOA_API_KEY.
To save it permanently:
macOS / Linux (bash / zsh)
You have two options:
Option 1: Manually edit your shell config file
Open your shell config file:
nano ~/.bashrc
(or ~/.zshrc if you are using zsh)
Add this line at the bottom:
export THOA_API_KEY="your-api-key-here"
Then reload your config:
source ~/.bashrc
Option 2: Append it automatically from the command line
echo 'export THOA_API_KEY="your-api-key-here"' >> ~/.bashrc
source ~/.bashrc
Windows (PowerShell)
Use setx to store the variable permanently:
setx THOA_API_KEY "your-api-key-here"
Important: You must restart PowerShell (or open a new terminal window) after using setx before the variable becomes available.
To set it just for the current session:
If you only want to set the API key temporarily (until you close the terminal), use the following:
macOS / Linux
export THOA_API_KEY="your-api-key-here"
Windows (PowerShell)
$env:THOA_API_KEY = "your-api-key-here"
These will only apply to the current terminal session. Once you close the terminal, you’ll need to run the command again.
thoa run
Description
The thoa run command is used to submit jobs to Thoa’s compute infrastructure.
Jobs let you run arbitrary code in isolated, reproducible environments of nearly any size, all without managing infrastructure.
Examples
Running a Python script
thoa run --cmd "python script.py" \
--input ./inputdata \
--output ./ \
--tools python \
--n-cores 16 \
--ram 64 \
--storage 10
Running an R script
thoa run --cmd "Rscript script.R" \
--input ./inputdata \
--output ./ \
--tools r-base \
--n-cores 16 \
--ram 64 \
--storage 10
Running a Bash script
thoa run --cmd "bash script.sh" \
--input ./inputdata \
--output ./ \
--tools bash \
--n-cores 16 \
--ram 64 \
--storage 10
Running raw Bash
thoa run --cmd 'echo "Hello from THOA!" && touch file.txt' \
--input ./inputdata \
--output ./outputdata \
--tools bash \
--n-cores 16 \
--ram 64 \
--storage 10
🧪 Running a Nextflow workflow (Coming soon!)
thoa run --cmd 'nextflow run pipeline.nf' \
--input ./inputdata \
--output ./outputdata \
--tools nextflow \
--n-cores 16 \
--ram 64 \
--storage 50
🧪 Running a Snakemake workflow (Coming soon!)
Check back later for more details on running Snakemake pipelines in Thoa
Arguments
| Flag | Description |
|---|---|
--input | Path(s) to input files or folders. Defaults to the current directory. |
--input-dataset | Use a pre-uploaded dataset by ID instead of re-uploading files. |
--output (req) | Folder path inside the job container where output files will be found. |
--cmd | The shell command to run inside the compute environment. |
--tools | Comma-separated tools to load (e.g. fastqc, r-base, python). |
--env-source | Define the environment using a Conda YAML, Docker image, or saved env. |
--download-dir | Local directory where output files will be downloaded after the job finishes. |
--n-cores | Number of CPU cores to allocate. |
--ram | GB of RAM to allocate. |
--storage | GB of free space available for outputs after inputs are uploaded. |
--run-async | If true, streams logs and outputs in real time. Defaults to false. |
--job-name | (Optional) Give your job a custom name. |
--job-description | (Optional) Text description to help you track the job. |
--dry-run | Runs validation only, without submitting the job. Useful for testing. |
thoa dataset
Description
The thoa dataset command group lets you manage and interact with datasets stored in Thoa.
Datasets are collections of uploaded input files used when launching jobs.
Each dataset is uniquely versioned and stored in the cloud, so you can reuse them across multiple jobs without re-uploading the same data.
Examples
List your top 10 datasets by size
thoa dataset list --sort-by size --number 10
Download specific files from a dataset
thoa dataset download <DATASET_UUID> ./localdir --include "reads/*.fastq.gz"
Find the largest dataset by number of files
thoa dataset list --sort-by files --number 1
Arguments
thoa dataset list
Lists all datasets available to your workspace.
thoa dataset list
Optional flags:
| Flag | Description |
|---|---|
--number, -n | Number of datasets to display |
--sort-by, -s | Sort field: created, size, or files (default: created, descending) |
--desc | Sort in descending order (default: true) |
thoa dataset download <DATASET_UUID> <DESTINATION_PATH>
Downloads a dataset (or selected files from it) to a local folder.
thoa dataset download <DATASET_ID> ./outputdir
Optional flags:
| Flag | Description |
|---|---|
--include, -i | Only download files matching these public IDs or globs |
--exclude, -e | Exclude files matching these public IDs or globs |
thoa dataset ls <DATASET_UUID>
Lists all files in a dataset by its UUID.
thoa dataset ls 157d2823-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Optional flags:
| Flag | Description |
|---|---|
--level, -l | How many levels of the file hierarchy to display (int) |
thoa tools
Description
The thoa tools command displays information about the tools currently supported by Thoa job environments.
At the moment, Thoa supports tools from Bioconda and conda-forge, with additional repositories planned soon.
Running thoa tools provides direct links to the complete lists of available tools.
Examples
Show all supported tools
thoa tools
Supported Tools
At the moment, Thoa supports every package available from the following repositories:
-
Bioconda Bioinformatics-focused tools and pipelines. https://bioconda.github.io/conda-package_index.html
-
conda-forge A wide range of scientific, data-processing, and general-purpose tools. https://conda-forge.org/packages/
thoa jobs
Description
The thoa jobs command group lets you view jobs you have previously run with Thoa.
Each job represents a single execution and includes details such as execution status, timestamps, and associated datasets.
Use thoa jobs to track job progress and explore your job history.
Examples
List your most recent jobs
thoa jobs list
Show the 5 newest jobs
thoa jobs list --number 5
Sort jobs alphabetically by status
thoa jobs list --sort-by status --asc
Commands
thoa jobs list
Displays jobs in a table with columns:
- Name – Job name
- ID – Full job ID
- Started – When the job began
- Status – Job state (
created,running,completed,completed, …) - Input Dataset – Input dataset name
- Output Dataset – Output dataset name
thoa jobs list
Options
| Flag | Description |
|---|---|
--number, -n | Limit how many jobs to display |
--sort-by, -s | Sort field: started or status (default: started) |
--asc | Sort in ascending order (default: descending) |
Summary
Use the CLI to:
- Define compute resources explicitly
- Launch reproducible containerized jobs
- Manage your environments and datasets
- Automate large-scale workflows easily