Home NewsX Nextflow using GA4GH TES

Nextflow using GA4GH TES

by info.odysseyx@gmail.com
0 comment 17 views


The GA4GH (Global Alliance for Genomics and Health) Job Execution Service (TES) API is a standardized schema and API for describing and executing batch execution jobs.

We are pleased to announce this. Nextflowa powerful workflow management system for data-driven computational pipelines, is now fully supported. Global Genome and Health Alliance GA4GH Task Execution Service (TES)This integration provides seamless scalability, improved efficiency, and robust performance for data processing tasks across cloud and local computing.

What is TES?

The Task Execution Service (TES) API is a standardized schema and API for describing and executing batch execution tasks. It provides a common way to submit and manage tasks across a variety of computing environments, including on-premises high-performance computing and high-throughput computing (HPC/HTC) systems, cloud computing platforms, and hybrid environments. The TES API is designed to be flexible and extensible, enabling it to be applied to a wide range of use cases, such as “applying compute to data” solutions for federated and distributed data analytics, or load balancing across multi-cloud infrastructures.

Why Nextflow and TES?

Nextflow with TES is an ideal choice for managing computational workflows because it abstracts and simplifies the composition of complex data processing tasks. The standardized TES API provides a unified approach to task execution, ensuring compatibility across a variety of computational environments. This integration not only improves portability and scalability, but also significantly reduces the number of configuration files required to set up and manage workflows in a variety of cloud and local computing environments. This streamlined approach allows researchers and developers to focus more on their core scientific goals rather than the complexities of infrastructure management.

How does it work?

To run previously Nextflow Pipelines on Azure You need to create a configuration file that specifies the compute resources to use. By default, Nextflow uses a single compute configuration for all jobs. An example is shown below.

process { 

    executor="azurebatch" 

    queue="Standard_E2d_v4" 

    withLabel:process_low   {queue="Standard_E2d_v4"} 

    withLabel:process_medium {queue="Standard_E8d_v4"} 

    withLabel:process_high {queue="Standard_E16d_v4"} 

    withLabel:process_high_memory {queue="Standard_E32d_v4"} 

} 

  

azure { 

    storage { 

        accountName = "" 

        sasToken = "" 

    } 

    batch { 

        location = "" 

        accountName = "" 

        accountKey = "" 

        autoPoolMode = false 

        allowPoolCreation = true 

        pools { 

            Standard_E2d_v4 { 

                autoScale = true 

                vmType="Standard_E2d_v4" 

                vmCount = 2 

                maxVmCount = 20 

            } 

            Standard_E8d_v4 { 

                autoScale = true 

                vmType="Standard_E8d_v4" 

                vmCount = 2 

                maxVmCount = 20 

            } 

            Standard_E16d_v4 { 

                autoScale = true 

                vmType="Standard_E16d_v4" 

                vmCount = 2 

                maxVmCount = 20 

            } 

            Standard_E32d_v4 { 

                autoScale = true 

                vmType="Standard_E32d_v4" 

                vmCount = 2 

                maxVmCount = 10 

            } 

        } 

    } 

} 

Integrating Nextflow with TES simplifies this configuration and lets you tell Azure what type of minimum machine requirements you need via basic Nextflow compute directives (e.g. CPU, memory, disk). TES looks at the available batch quota and minimum compute requirements and selects the lowest cost available compute that meets the minimum requirements for each process.

plugins { 

   id 'nf-ga4gh' 

} 

  

process { 

  executor="tes" 

} 

  

azure { 

       storage { 

          accountName = """" 

          accountKey = "" 

       } 

    } 

  

 

tes.endpoint= "" 

tes.basicUsername = "" 

tes.basicPassword = "

How to get started

To help you get started quickly, we introduce `.nf-hello-godk` Project, a Nextflow pipeline example designed to showcase the powerful capabilities of Nextflow. This pipeline demonstrates how to use Nextflow to analyze genomic data using the Genome Analysis Toolkit (GATK) and how to efficiently scale compute resources by leveraging Azure Batch.

  1. Deploying TES on Azure: Follow these steps: guide Deploy TES on Azure.
  2. Install Nextflow: Follow these steps: Nextflow Installation Guide Set up Nextflow on your local machine or in a cloud environment.
  3. Create TES Configuration: Create the following configuration with your TES and Azure credentials and save it as tes.config.
process { 

  executor="tes" 

} 

  

azure { 

       storage { 

          accountName = """" 

          accountKey = "" 

       } 

    } 

tes.endpoint= "" 

tes.basicUsername = "" 

tes.basicPassword = "
  1. execution :
 ./nextflow run seqeralabs/nf-hello-gatk  -c tes.config -w 'az://work' --outdir 'az://outputs' -r main 

After completion, all results can be found in the blob container prefix specified by: –outdir.

Improved workflow management

This integration makes it easier to manage and run Nextflow workflows on Azure Batch. This includes:

  • Auto Scaling: Dynamically scale computing resources based on the needs of your workflow.
  • Cost Effectiveness: Optimize your cloud spend with Azure Batch’s cost-effective pricing model.
  • Seamless integration: Easily interact with Azure Batch using the TES API.

We believe this integration will significantly enhance your data processing capabilities, making it easier to handle large-scale workflows with greater efficiency and cost-effectiveness. Expect more updates and community contributions as we continue to enhance support for Nextflow on Azure Batch using TES.

Acknowledgements:

We would like to acknowledge the following contributions: Liam Beckman ~ in Oregon Health & Science University Computational Biologyand Ben ShermanSoftware Engineer SekeraContributed basic support for Nextflow and TES.





Source link

You may also like

Leave a Comment

Our Company

Welcome to OdysseyX, your one-stop destination for the latest news and opportunities across various domains.

Newsletter

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

Laest News

@2024 – All Right Reserved. Designed and Developed by OdysseyX