This article is an interpretation of the ICLR 2022 selected paper, VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects. The paper was completed by Dong Hao's research group of the Frontier Computing Research Center of Peking University in cooperation with Stanford University and Tencent Artificial Intelligence Laboratory.

This paper proposes a new type of object function operability representation, and designs a framework for perceptual learning through interaction to learn this representation and complete the operation task on a variety of objects

。

ICLR 2022 | Visual manipulation trajectory learning for manipulating 3D articulated objects

Thesis link: https://arxiv.org/pdf/2106.14440.pdf

Project Home:https://hyperplane-lab.github.io/vat-mart/

ICLR 2022 | Visual manipulation trajectory learning for manipulating 3D articulated objects

First, the research background

The home assistant robot of the future will need the ability to sense and manipulate large-scale, diverse 3D objects in the human environment. In 3D objects, 3D articulated objects contain articulated parts with important functional and semantic information (e.g., cabinet doors and drawers) with which humans and home assistant robots often interact, so they deserve our attention. However, articulated objects have higher degrees of freedom and are more difficult for robots to understand and interact with than ordinary rigid objects with only 6 degrees of freedom (DoF).

In previous work, most of the methods used to estimate the joints, part poses, dynamic models, and so on of 3D articulated objects to understand and manipulate 3D articulated objects. In this paper, we propose a new visual representation of operability by predicting the operability score of each point on the articulated part of the target object, as well as proposing a diversity trajectory for completing the target task at each point (Figure 1). Such a visually maneuverable representation can be generalized to objects of different shapes, independent of the model of the robot operating the object. To obtain this prior representation of visual operability, we designed a framework for perceptual learning through interaction VAT-Mart.

Figure 1. Entering a 3D articulated object, our method outputs a operability score for each point, as well as a variety of operation trajectories

Second, the method

Our proposed VAT-Mart framework (Figure 2) consists of two modules: an interactive action trajectory exploration module based on reinforcement learning, and a visual operability perception module. The trajectory exploration module proposes operability and diversity of operation trajectory data for the perception module, and the perception module integrates operability and operation trajectory information from the data of the trajectory exploration module, and uses the curiosity mechanism to provide guidance for the trajectory diversity of the trajectory exploration module.

Specifically, the Interactive Manipulation Trajectory Exploration Module uses reinforcement learning methods based on the state of the target object to generate trajectories on different objects, different articulated parts that can complete different tasks, and the operability of interaction points. In order to collect diverse trajectories, the rewards of the reinforcement learning method used by the Operational Trajectory Exploration module consist of two parts: an external reward for whether the trajectory can complete a task, and an internal reward for whether the current trajectory is novel and diverse provided by the perception module. The perception module is composed of three sub-modules: the operability prediction module, the trajectory proposal module, and the trajectory scoring module, which predict the operability of each point, propose a variety of trajectories that can complete the specified task, and predict whether the trajectory can complete the specified task. The output of the Trajectory Scoring Module will also be used as an internal reward to motivate the Trajectory Exploration Module to explore diverse trajectories.

Figure 2. Frame structure

3. Experiments

We experimented on a large-scale PartNet-Mobility dataset using the SAPIEN simulator. We selected 2 common types of joints: door (rotation) and drawer (translation), open and close doors or drawers as 4 types of tasks, and 7 types of objects, for each task, we divided the objects into train-cat and a new category (test-cat) that did not appear in training. For each task, our framework predicts the operability score for each point on the object, as well as proposes a variety of action trajectories (Figure 3).

Figure 3. On different tasks and different objects, the operability score of each point, as well as a variety of operation trajectories

Further, we experimented on real-world scanned 3D objects (Google Scan, RBO, Our Scan) (the left half of Figure 4) and performed real machine experiments using the franka panda robotic arm (the right half of Figure 4).

On large-scale datasets and real-world data, real machines, our methods demonstrate the ability to efficiently predict operability and propose trajectories, and show good generalization capabilities on new environments and new categories of objects.

Figure 4. Effects on real-world data (left), as well as real-machine experiments (right).

4. Summary

In this paper, in order to perceive and manipulate 3D articulated objects, we propose a novel generalizable visual operability representation, and design a VAT-Mart framework to predict the operability of each point on the articulated part of the target object, as well as to propose a variety of operational trajectories that can complete the target task. Experiments on large-scale PartNet-Mobility datasets and real-world data, real robotic arms, demonstrate the efficiency of the framework we propose.

This article is from: public number [Peking University Frontier Computer Research Center]

Author: PKU Hyperplane

Illustration by Tatyana Krasutskaya from icons8

-The End-

Scan the code to watch!

New this week!

About my "door"

▼

Shomen is a new venture capital firm focused on discovering, accelerating and investing in technology-driven startups, including Shomun Innovation Services, Shomun Technology Community and Shomun Venture Capital Fund.

Founded at the end of 2015, the founding team was built by the original team of Microsoft Venture Capital in China, and has selected and in-depth incubated 126 innovative technology-based startups for Microsoft.

If you are a start-up in the field of technology, you not only want to get investment, but also want to get a series of continuous, valuable post-investment services,

Welcome to send or recommend items to my "door":

⤵ One click to send you into techbeat happy planet