Academic-DavidZ
Academic-DavidZ
Home
Publications
Projects
Experience
Awards
Curriculum Vitae
Light
Dark
Automatic
English
English
中文 (简体)
Deep Learning
Manual-PA
Official implementation of Manual-PA: Learning 3D Part Assembly from Instruction Diagrams.
Code
TDGV
Official implementation of WACV 2025 Temporal Instructional Diagram Grounding in Unconstrained Videos.
Code
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
This paper presents a transformer-based framework that leverages instructional diagrams to guide 3D part assembly, addressing challenges in sequencing and pose estimation. Using contrastive learning and cross-modal attention, it aligns 2D manual steps with 3D parts, predicts assembly order, and refines poses, achieving state-of-the-art performance on PartNet and IKEA-Manual datasets. The method demonstrates strong generalization to real-world scenarios, significantly improving accuracy and robustness in automated assembly tasks. (Generated by ChatGPT4o).
Jiahao Zhang
,
Anoop Cherian
,
Cristian Rodriguez
,
Weijian Deng
,
Stephen Gould
PDF
Cite
Code
ArXiv
Temporally Grounding Instructional Diagrams in Unconstrained Videos
This paper introduces a method for simultaneously localizing multiple instructional diagram queries in videos, addressing the limitations of current approaches that handle queries individually. The proposed method uses composite queries combining visual features and positional embeddings, reducing overlaps and correcting temporal misalignment. Tested on the IAW and YouCook2 datasets, this approach significantly improves grounding accuracy by leveraging self-attention and cross-attention mechanisms, outperforming existing methods while maintaining the temporal structure of instructional steps. (Generated by ChatGPT4o).
Jiahao Zhang
,
Frederic Zhang
,
Cristian Rodriguez
,
Yizhak Ben-Shabat
,
Anoop Cherian
,
Stephen Gould
PDF
Cite
Code
ArXiv
Assembly Video Manual Alignment
Official implementation of CVPR 2023 Aligning Step-by-Step Instructional Diagrams to Video Demonstrations.
Code
Aligning Step-by-Step Instructional Diagrams to Video Demonstrations
This paper introduces a supervised contrastive learning approach that learns to align videos with the subtle details of assembly diagrams, guided by a set of novel losses. To study this problem and evaluate the effectiveness of their method, they introduce a new dataset: IAW—for Ikea assembly in the wild—consisting of 183 hours of videos from diverse furniture assembly collections and nearly 8,300 illustrations from their associated instruction manuals and annotated for their ground truth alignments. They define two tasks on this dataset: First, nearest neighbor retrieval between video segments and illustrations, and, second, alignment of instruction steps and the segments for each video. Extensive experiments on IAW demonstrate superior performance of their approach against alternatives. (Generated by New Bing).
Jiahao Zhang
,
Anoop Cherian
,
Yanbin Liu
,
Yizhak Ben-Shabat
,
Cristian Rodriguez
,
Stephen Gould
PDF
Cite
Code
Dataset
Poster
Slides
Video
DOI
ArXiv
Supplementary
GoferBot: A Visual Guided Human-Robot Collaborative Assembly System
GoferBot is a novel assembly system that seamlessly integrates all sub-modules by utilising implicit semantic information purely from visual perception.
Zheyu Zhuang
,
Yizhak Ben-Shabat
,
Jiahao Zhang
,
Stephen Gould
,
Robert Mahony
PDF
Cite
Video
DOI
ArXiv
Image Caption Generator
An encoder(Resnet152)-decoder(LSTM) implementation of image caption model.
Code
Cite
×