Performs end-to-end WGS VCF analysis in ≈1 minute on a standard Ryzen 7 / 32GB MiniPC. Optimized for speed, accuracy, and compatibility with a wide range of VCF inputs.
Upload any VCF, optionally specify a target phenotype, and let Genetase handle the rest. Our ultra-fast engine delivers a prioritized report with an interactive genome browser in under one minute. Future updates will include expanded AI-assisted interpretation capabilities.
Specify a target phenotype and Genetase runs a parallel analysis to:
Your report summarizes variant-level annotations and scoring, including:
Explore and analyze variants using flexible filters, annotations, and scoring tracks:
Before annotation, every VCF undergoes a 4-stage preprocessing pipeline. This ensures that complex variants are properly interpreted, regardless of the caller or reference genome used. Data are prepared carefully for the next sub-minute WGS analysis without compromising accuracy.
Accepts any VCF 4.x format from any caller: GATK, DeepVariant, FreeBayes, DRAGEN, and more. Handles bgzip/gzip compression and multi-sample VCFs.
Auto-detects hg19/hg38/GRCh38. Automatically lifts over to the reference build used by our annotation databases. No manual intervention required.
Left-aligns variants and splits multi-allelic sites into biallelic records. Ensures consistent representation across all downstream analyses.
Splits MNVs and complex indels while preserving haplotype context via CMPLX_id tracking. Enables accurate annotation of every complex variant.
# Original complex record (haplotype block) detected and annotated
9 133255928 . CCCCCCAG GCCCCCAT . PASS CMPLX_id=9:133255928-CCCCCCAG>GCCCCCAT;CHILD=9:133255928-C>G,9:133255935-G>T GT 1/1
# Decomposed into atomic variants
9 133255928 . C G . PASS CMPLX_id=9:133255928-CCCCCCAG>GCCCCCAT GT 1/1
9 133255935 . G T . PASS CMPLX_id=9:133255928-CCCCCCAG>GCCCCCAT GT 1/1
# Original complex record (insertion / unequal-length indel) detected and annotated
9 135927964 . GAGCACACACG CAGAGCACACACGCA . PASS CMPLX_id=9:135927964-GAGCACACACG>CAGAGCACACACGCA;CHILD=9:135927963-A>ACA,9:135927974-G>GCA GT 0/1
# Decomposed into atomic variants
9 135927963 . A ACA . PASS CMPLX_id=9:135927964-GAGCACACACG>CAGAGCACACACGCA GT 0/1
9 135927974 . G GCA . PASS CMPLX_id=9:135927964-GAGCACACACG>CAGAGCACACACGCA GT 0/1CMPLX_id links to support pedigree, phasing, and haplotype consistencySub-minute WGS analysis is achieved through a combination of algorithmic complexity reduction, a stratified filtering model, and parallel I/O. The following outlines the core principles that enable scalable, rapid variant interpretation.
Many traditional variant pipelines rely on full-table scans or row-wise processing, which can become a bottleneck for large whole-genome datasets. Our system achieves sub-minute rare variant analysis by combining:
Instead of a monolithic annotation pass, the pipeline applies a cascade of increasingly expensive operations
This stratified approach reduces the number of variants requiring full, computationally expensive annotation by >99%, directly decreasing overall runtime while maintaining high sensitivity for variants of potential biological interest.
Hardware-level concurrency is leveraged to minimize idle time:
N = total number of variants, D = number of annotation databases. All variants are passed to the annotation stage.
Nf ≪ N (typically <0.1%), so only a small subset is fully annotated.
* Empirical performance: The annotation stage completes in ~1 minute for a whole-genome sample (~4.5M variants, 30× coverage) on consumer-grade hardware. Preprocessing time (e.g., parsing and normalization) depends on input characteristics and is not included.
By applying early filtering and parallelization, the pipeline shifts from O(N * D) to O(N + Nf * D), where Nf ≪ N. This minimizes costly full annotations, enabling rapid, scalable whole-genome analysis without sacrificing the depth of variant interpretation.
Genetase uses a machine learning model to automatically rank variants by impact and potential biological relevance. Model performance is continuously improved through updates to underlying datasets, annotations, and training signals.
A gradient-boosted model (XGBoost) learns feature weights from curated variant annotation datasets. Each variant receives a Genetase Priority Score based on:
Thresholds adapt automatically based on dataset composition and annotation context.
Allele frequency strategy: Weighted combination of gnomAD (primary) and dbSNP. When both are available, gnomAD frequencies are used for higher accuracy.
Each variant is assigned a priority score reflecting predicted impact:
Try our interactive demo to explore variant scoring and annotations. Genetase is currently provided for research purposes only. Learn how it can support genomic studies and share your feedback or interest in future access.