Baseline System

Table of Contents

  1. Status
  2. Motivation
  3. Emphasis
  4. Approach
  5. Baseline System Tasks
  6. Model Components

Status

The baseline system is under development. Further details will be shared shortly.

Motivation

  1. Reduced barrier for entry and participation
  2. Simple starting point for research for potential participants

Back to top

Emphasis

  1. Use of well-adopted open-source components
  2. Simplicity of the baseline system itself

Back to top

Approach

  1. Leverage End-to-End Speech Processing Toolkit (ESPnet)
    • Widely adopted in the community
    • Support for data augmentations: noise & reverb
    • Trainer and multi-GPU support
    • Learning rate schedulers
    • Support for neural codecs
    • Reduces the amount of effort and open-source code
  2. Separate baselines for Track 1 and Track 2

Back to top

Baseline System Tasks

  1. Training data generation (either offline or online - TBD): download datasets, apply the challenge provided curation lists, run augmentations
  2. Run model training
  3. Run VERSA toolkit-based objective metric-based evaluation
  4. Provide script for challenge rule validation: compute, latency

Back to top

Model Components

  1. Encoder
    • Strided Convolutions + Residual Units (causal)
  2. Quantizer
    • Hard residual VQ with STE (straight-through) gradient estimation, or an FSQ based method
    • Support for Multi-bitrate training (1 and 6kbps)
  3. Decoder
    • Transposed Convolutions + Residual Units (causal)
  4. Discriminator
    • Borrow design from EnCodec and trim down channels

Back to top