rewardfm
/

jesse-alldata-rfm-qwen-4gpu-bs16-pref-prog-sim-succ

preference_comparisons

Model card Files Files and versions

rewardfm/jesse-alldata-rfm-qwen-4gpu-bs16-pref-prog-sim-succ

Model Details

Base Model: Qwen/Qwen3-VL-4B-Instruct
Model Type: qwen3_vl

Training Run

Wandb Run: jesse_alldata_rfm_qwen_4gpu_bs16_pref_prog_sim_succ
Wandb ID: ng2lt4sm
Project: rfm

Citation

If you use this model, please cite:

Downloads last month: 71

Safetensors

Model size

4B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rewardfm/jesse-alldata-rfm-qwen-4gpu-bs16-pref-prog-sim-succ

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(123)

this model