Academic-DavidZ
Academic-DavidZ
Home
Publications
Projects
Experience
Awards
Curriculum Vitae
Light
Dark
Automatic
English
English
中文 (简体)
Deep Learning
Pos3R: 6D Pose Estimation for Unseen Objects Made Easy
We present Pos3R, a training-free method for estimating the 6D pose of any object from a single RGB image by leveraging a 3D foundation model, eliminating the need for pose supervision or task-specific training.
Weijian Deng
,
Dylan Campbell
,
Chunyi Sun
,
Jiahao Zhang
,
Shubham Kanitkar
,
Matthew E. Shaffer
,
Stephen Gould
Cite
Manual-PA
Official implementation of Manual-PA: Learning 3D Part Assembly from Instruction Diagrams.
Code
TDGV
Official implementation of WACV 2025 Temporal Instructional Diagram Grounding in Unconstrained Videos.
Code
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
We introduce Manual-PA, a transformer-based framework that leverages diagrammatic assembly manuals to guide both the selection and 6D pose estimation of furniture parts, enabling efficient and realistic 3D assembly by aligning parts with instructional illustrations.
Jiahao Zhang
,
Anoop Cherian
,
Cristian Rodriguez
,
Weijian Deng
,
Stephen Gould
PDF
Cite
Code
ArXiv
Temporally Grounding Instructional Diagrams in Unconstrained Videos
We introduce a new approach to simultaneously localize a sequence of instructional diagrams in videos by modeling their mutual relationships and temporal order, rather than handling each step independently.
Jiahao Zhang
,
Frederic Zhang
,
Cristian Rodriguez
,
Yizhak Ben-Shabat
,
Anoop Cherian
,
Stephen Gould
PDF
Cite
Code
Dataset
Poster
Slides
DOI
ArXiv
Assembly Video Manual Alignment
Official implementation of CVPR 2023 Aligning Step-by-Step Instructional Diagrams to Video Demonstrations.
Code
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
We introduce a new framework and dataset (IAW) for aligning assembly diagrams from instruction manuals with real-world video segments, enabling cross-modal retrieval and step-level alignment between illustrated instructions and assembly actions in videos.
Jiahao Zhang
,
Anoop Cherian
,
Yanbin Liu
,
Yizhak Ben-Shabat
,
Cristian Rodriguez
,
Stephen Gould
PDF
Cite
Code
Dataset
Poster
Slides
Video
DOI
ArXiv
Supplementary
GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.
Zheyu Zhuang
,
Yizhak Ben-Shabat
,
Jiahao Zhang
,
Stephen Gould
,
Robert Mahony
PDF
Cite
Video
DOI
ArXiv
Image Caption Generator
An encoder(Resnet152)-decoder(LSTM) implementation of image caption model.
Code
Cite
×