Zheng Zhu's picture

1 3

Zheng Zhu

ZhengZhu

·

AI & ML interests

None yet

Recent Activity

authored a paper 8 days ago

DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition

authored a paper 8 days ago

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

authored a paper 8 days ago

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

View all activity

Organizations

None yet

authored 20 papers 8 days ago

DyGait: Exploiting Dynamic Representations for High-performance Gait Recognition

Paper • 2303.14953 • Published Mar 27, 2023

WebFace260M: A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition

Paper • 2103.04098 • Published Mar 6, 2021

DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting

Paper • 2112.01518 • Published Dec 2, 2021

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

Paper • 2205.09743 • Published May 19, 2022

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

Paper • 2405.03520 • Published May 6, 2024 • 1

OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models

Paper • 2407.11213 • Published Jul 15, 2024 • 3

DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation

Paper • 2303.05021 • Published Mar 9, 2023

OpenStereo: A Comprehensive Benchmark for Stereo Matching and Strong Baseline

Paper • 2312.00343 • Published Dec 1, 2023

RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer

Paper • 2505.23171 • Published May 29 • 3

WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration

Paper • 2506.20590 • Published Jun 25

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

Paper • 2509.19297 • Published Sep 23 • 24

VLA-R1: Enhancing Reasoning in Vision-Language-Action Models

Paper • 2510.01623 • Published Oct 2 • 10

Scalable Training for Vector-Quantized Networks with 100% Codebook Utilization

Paper • 2509.10140 • Published Sep 12 • 2

R2RGEN: Real-to-Real 3D Data Generation for Spatially Generalized Manipulation

Paper • 2510.08547 • Published Oct 9 • 4

DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion

Paper • 2510.15264 • Published Oct 17 • 1

GigaBrain-0: A World Model-Powered Vision-Language-Action Model

Paper • 2510.19430 • Published Oct 22 • 48

ResearchGPT: Benchmarking and Training LLMs for End-to-End Computer Science Research Workflows

Paper • 2510.20279 • Published Oct 23

EMMA: Generalizing Real-World Robot Manipulation via Generative Visual Transfer

Paper • 2509.22407 • Published Sep 26

BEVDet: High-performance Multi-camera 3D Object Detection in Bird-Eye-View

Paper • 2112.11790 • Published Dec 22, 2021

Distractor-aware Siamese Networks for Visual Object Tracking

Paper • 1808.06048 • Published Aug 18, 2018