YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DA-2 WebGPU Port

This repository contains a port of the DA-2 (Depth Anything in Any Direction) model to run entirely in the browser using WebGPU and ONNX Runtime.

The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.

πŸ”— Original Work

DA2: Depth Anything in Any Direction

Please cite the original paper if you use this work:

@article{li2025da2,
  title={DA2: Depth Anything in Any Direction},
  author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
  journal={arXiv preprint arXiv:2509.26618},
  year={2025}
}

πŸš€ WebGPU Demo

This project includes a web-based demo that runs the model directly in your browser.

Prerequisites

  • Python 3.10+ (for model export)
  • Web Browser with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).

Installation

  1. Clone the repository:

    git clone <your-repo-url>
    cd DA-2-Web
    
  2. Set up Python environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
    

Model Preparation

To run the demo, you first need to convert the PyTorch model to ONNX format.

  1. Download the model weights: Download model.safetensors from the HuggingFace repository and place it in the root directory of this project.

  2. Export to ONNX: Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing clamp with max/min).

    python export_onnx.py
    

    This will generate da2_model.onnx.

  3. Merge ONNX files: The export process might generate external data files. Use the merge script to create a single .onnx file for easier web loading.

    python merge_onnx.py
    

    This will generate da2_model_single.onnx.

Running the Demo

  1. Start a local web server: You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.

    python3 -m http.server 8000
    
  2. Open in Browser: Navigate to http://localhost:8000/web/ in your WebGPU-compatible browser.

  3. Usage:

    • Click "Choose File" to upload a panoramic image.
    • Click "Run Inference" to generate the depth map.
    • The process runs entirely locally on your GPU.

πŸ› οΈ Technical Details of the Port

  • Precision: The model was converted to FP16 (Half Precision) to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
  • Opset: Exported using ONNX Opset 17.
  • Modifications:
    • The SphereViT and ViT_w_Esphere modules were modified to ensure strict FP16 compatibility.
    • torch.clamp operations were replaced with torch.max and torch.min combinations to avoid Clip operator issues in onnxruntime-web when handling mixed scalar/tensor inputs.
    • Sphere embeddings are pre-calculated and cast to FP16 within the model graph.

πŸ“„ License

This project follows the license of the original DA-2 repository. Please refer to the original repository for license details.

Downloads last month
74
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support