DA-2 WebGPU Port
This repository contains a port of the DA-2 (Depth Anything in Any Direction) model to run entirely in the browser using WebGPU and ONNX Runtime.
The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.
π Original Work
DA2: Depth Anything in Any Direction
- Repository: EnVision-Research/DA-2
- Paper: arXiv:2509.26618
- Project Page: depth-any-in-any-dir.github.io
Please cite the original paper if you use this work:
@article{li2025da2,
title={DA2: Depth Anything in Any Direction},
author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
journal={arXiv preprint arXiv:2509.26618},
year={2025}
}
π WebGPU Demo
This project includes a web-based demo that runs the model directly in your browser.
Prerequisites
- Python 3.10+ (for model export)
- Web Browser with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).
Installation
Clone the repository:
git clone <your-repo-url> cd DA-2-WebSet up Python environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
Model Preparation
To run the demo, you first need to convert the PyTorch model to ONNX format.
Download the model weights: Download
model.safetensorsfrom the HuggingFace repository and place it in the root directory of this project.Export to ONNX: Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing
clampwithmax/min).python export_onnx.pyThis will generate
da2_model.onnx.Merge ONNX files: The export process might generate external data files. Use the merge script to create a single
.onnxfile for easier web loading.python merge_onnx.pyThis will generate
da2_model_single.onnx.
Running the Demo
Start a local web server: You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
python3 -m http.server 8000Open in Browser: Navigate to
http://localhost:8000/web/in your WebGPU-compatible browser.Usage:
- Click "Choose File" to upload a panoramic image.
- Click "Run Inference" to generate the depth map.
- The process runs entirely locally on your GPU.
π οΈ Technical Details of the Port
- Precision: The model was converted to FP16 (Half Precision) to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
- Opset: Exported using ONNX Opset 17.
- Modifications:
- The
SphereViTandViT_w_Espheremodules were modified to ensure strict FP16 compatibility. torch.clampoperations were replaced withtorch.maxandtorch.mincombinations to avoidClipoperator issues inonnxruntime-webwhen handling mixed scalar/tensor inputs.- Sphere embeddings are pre-calculated and cast to FP16 within the model graph.
- The
π License
This project follows the license of the original DA-2 repository. Please refer to the original repository for license details.
- Downloads last month
- 74