DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition Paper • 2303.14953 • Published Mar 27, 2023
WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition Paper • 2103.04098 • Published Mar 6, 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting Paper • 2112.01518 • Published Dec 2, 2021
BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving Paper • 2205.09743 • Published May 19, 2022
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond Paper • 2405.03520 • Published May 6, 2024 • 1
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models Paper • 2407.11213 • Published Jul 15, 2024 • 3
DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation Paper • 2303.05021 • Published Mar 9, 2023
OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline Paper • 2312.00343 • Published Dec 1, 2023
RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer Paper • 2505.23171 • Published May 29 • 3
WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration Paper • 2506.20590 • Published Jun 25
VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction Paper • 2509.19297 • Published Sep 23 • 24
VLA-R1: Enhancing Reasoning in Vision-Language-Action Models Paper • 2510.01623 • Published Oct 2 • 10
Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization Paper • 2509.10140 • Published Sep 12 • 2
R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation Paper • 2510.08547 • Published Oct 9 • 4
DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion Paper • 2510.15264 • Published Oct 17 • 1
GigaBrain-0: A World Model-Powered Vision-Language-Action Model Paper • 2510.19430 • Published Oct 22 • 48
ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows Paper • 2510.20279 • Published Oct 23
EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer Paper • 2509.22407 • Published Sep 26
BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View Paper • 2112.11790 • Published Dec 22, 2021
Distractor-aware Siamese Networks for Visual Object Tracking Paper • 1808.06048 • Published Aug 18, 2018