YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DA-2 WebGPU Port

This repository contains a port of the DA-2 (Depth Anything in Any Direction) model to run entirely in the browser using WebGPU and ONNX Runtime.

The original work was developed by EnVision-Research. This port enables real-time, client-side depth estimation from panoramic images without requiring a backend server for inference.

🔗 Original Work

DA²: Depth Anything in Any Direction

Repository: EnVision-Research/DA-2
Paper: arXiv:2509.26618
Project Page: depth-any-in-any-dir.github.io

Please cite the original paper if you use this work:

@article{li2025da2,
  title={DA2: Depth Anything in Any Direction},
  author={Li, Haodong and Zheng, Wangguangdong and He, Jing and Liu, Yuhao and Lin, Xin and Yang, Xin and Chen, Ying-Cong and Guo, Chunchao},
  journal={arXiv preprint arXiv:2509.26618},
  year={2025}
}

🚀 WebGPU Demo

This project includes a web-based demo that runs the model directly in your browser.

Prerequisites

Python 3.10+ (for model export)
Web Browser with WebGPU support (Chrome 113+, Edge 113+, or Firefox Nightly).

Installation

Clone the repository:
```
git clone <your-repo-url>
cd DA-2-Web
```

Set up Python environment:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Model Preparation

To run the demo, you first need to convert the PyTorch model to ONNX format.

Download the model weights: Download model.safetensors from the HuggingFace repository and place it in the root directory of this project.
Export to ONNX: Run the export script. This script handles the conversion to FP16 and applies necessary fixes for WebGPU compatibility (e.g., replacing clamp with max/min).
```
python export_onnx.py
```
This will generate da2_model.onnx.
Merge ONNX files: The export process might generate external data files. Use the merge script to create a single .onnx file for easier web loading.
```
python merge_onnx.py
```
This will generate da2_model_single.onnx.

Running the Demo

Start a local web server: You need to serve the files over HTTP(S) for the browser to load the model and WebGPU context.
```
python3 -m http.server 8000
```
Open in Browser: Navigate to http://localhost:8000/web/ in your WebGPU-compatible browser.
Usage:
- Click "Choose File" to upload a panoramic image.
- Click "Run Inference" to generate the depth map.
- The process runs entirely locally on your GPU.

🛠️ Technical Details of the Port

Precision: The model was converted to FP16 (Half Precision) to reduce file size (~1.4GB -> ~700MB) and improve performance on consumer GPUs.
Opset: Exported using ONNX Opset 17.
Modifications:
- The SphereViT and ViT_w_Esphere modules were modified to ensure strict FP16 compatibility.
- torch.clamp operations were replaced with torch.max and torch.min combinations to avoid Clip operator issues in onnxruntime-web when handling mixed scalar/tensor inputs.
- Sphere embeddings are pre-calculated and cast to FP16 within the model graph.

📄 License

This project follows the license of the original DA-2 repository. Please refer to the original repository for license details.

Downloads last month: 74

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support