MEGConformer for Phoneme Classification

Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds.

Model Performance

Seed Val F1-Macro Checkpoint
7 (best) 63.92% seed-7/pytorch_model.ckpt
18 63.86% seed-18/pytorch_model.ckpt
17 58.74% seed-17/pytorch_model.ckpt
1 58.64% seed-1/pytorch_model.ckpt
2 58.10% seed-2/pytorch_model.ckpt

Note: Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved 65.8% F1-macro on the competition holdout.

Quick Start

Single Model Inference

import torch
from huggingface_hub import hf_hub_download

from libribrain_experiments.models.configurable_modules.classification_module import (
    ClassificationModule,
)

# Download best checkpoint (seed-7)
checkpoint_path = hf_hub_download(
    repo_id="zuazo/megconformer-phoneme-classification",
    filename="seed-7/pytorch_model.ckpt",
)

# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model
model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
model.eval()

# Inference
meg_signal = torch.randn(1, 306, 125, device=device)  # (batch, channels, time)

with torch.no_grad():
    logits = model(meg_signal)
    probabilities = torch.softmax(logits, dim=1)
    prediction = torch.argmax(logits, dim=1)

print(f"Predicted phoneme class: {prediction.item()}")
print(f"Confidence: {probabilities[0, prediction].item():.2%}")

Ensemble Inference (Recommended)

The ensemble approach averages predictions from all 5 seeds and achieves the best performance:

import torch
from huggingface_hub import hf_hub_download

from libribrain_experiments.models.configurable_modules.classification_module import (
    ClassificationModule,
)

# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load all available seeds (as in the paper)
seeds = [7, 18, 17, 1, 2]
models = []

for seed in seeds:
    checkpoint_path = hf_hub_download(
        repo_id="zuazo/megconformer-phoneme-classification",
        filename=f"seed-{seed}/pytorch_model.ckpt",
    )
    model = ClassificationModule.load_from_checkpoint(
        checkpoint_path, map_location=device
    )
    model.eval().to(device)
    models.append(model)

# Example MEG input: (batch=1, channels=306, time=125)
meg_signal = torch.randn(1, 306, 125, device=device)

with torch.no_grad():
    probs_list = []
    preds_list = []

    for model in models:
        logits = model(meg_signal)  # (1, C)
        probs = torch.softmax(logits, dim=1)  # (1, C)
        probs_list.append(probs)
        preds_list.append(probs.argmax(dim=1))  # (1,)

    # Stack predictions from all models: shape (num_models, batch_size)
    preds = torch.stack(preds_list, dim=0)  # (M, 1)

    # We have a single example in the batch, so index 0
    per_model_preds = preds[:, 0]  # (M,)

    num_classes = probs_list[0].size(1)
    # Count votes per class
    votes = torch.bincount(per_model_preds, minlength=num_classes).float()

    # Majority-vote class (ties resolved by smallest index)
    majority_class = int(votes.argmax().item())

    # "Confidence" = fraction of models voting for the chosen class
    confidence = (votes[majority_class] / votes.sum()).item()

print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}")
print(f"Vote share for that class: {confidence:.2%}")

Model Details

  • Architecture: Conformer (custom size)
    • Hidden size: 256
    • FFN dim: 2048
    • Layers: 7
    • Attention heads: 12
    • Depthwise conv kernel: 31
  • Input: 306-channel MEG signals
  • Window size: 0.5 seconds (125 samples at 250 Hz)
  • Output: 39-class phoneme classification (ARPAbet phoneme set)
  • Training: LibriBrain 2025 Standard track
  • Grouping: 100 single-trial examples averaged per training sample

Reproducibility

All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved 65.8% F1-macro on the competition holdout set.

Citation

@misc{dezuazo2025megconformerconformerbasedmegdecoder,
      title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification}, 
      author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
      year={2025},
      eprint={2512.01443},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.01443}, 
}

License

The 3-Clause BSD License

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train zuazo/megconformer-phoneme-classification

Evaluation results

  • F1-macro on LibriBrain 2025 PNPL (Standard track, phoneme task)
    self-reported
    0.658