PairGS

Relation-Centric Open-Vocabulary 3D Gaussian Segmentation

Seoul National University, Republic of Korea
*Equal contribution Corresponding author
PairGS overview figure

PairGS reframes 3D Gaussian segmentation as relation estimation between Gaussians, enabling clean instance separation, multi-granular queries, and efficient training-free processing.

Abstract

Open-vocabulary 3D Gaussian segmentation is challenging because it requires language understanding for diverse queries and accurate separation of Gaussians along object boundaries. Prior approaches either embed language knowledge into individual Gaussians to improve query responsiveness or optimize per-Gaussian instance features to encode object identity. However, these strategies may produce noisy Gaussian segmentations or rely on cost-inefficient per-scene optimization.

We propose PairGS, a framework that reframes Gaussian segmentation as modeling pairwise relations between Gaussians. 3D Gaussian representations provide rich signals for relation estimation, such as view contribution weights and multi-view mask evidence. By leveraging these cues, PairGS explicitly constructs a relation graph for segmentation without a heavy optimization process.

PairGS first proposes sparse edge candidates using low-dimensional descriptors, computes precise pairwise affinities only on those candidates, and builds a hierarchical cluster tree for multi-granular querying. It achieves state-of-the-art results on open-vocabulary 3D Gaussian segmentation benchmarks, while the fast variant is 50x faster than optimization-based instance-feature approaches.

Method

PairGS method pipeline
01

Lift Semantics

PairGS starts from a pretrained 3DGS scene and initializes Gaussian features by lifting SAM-masked CLIP features from multi-view images.

02

Propose Sparse Edges

Lightweight semantic descriptors and scaled Gaussian positions form node descriptors for k-NN edge candidate proposal.

03

Estimate Pairwise Affinity

Multi-view mask evidence and Gaussian view contribution weights produce precise relation scores on the sparse graph.

04

Build a Cluster Tree

TreeDBSCAN constructs a parent-child-aware hierarchy, pruning redundant and spurious clusters for multi-granular selection.

Indoor Dataset Results

Open-vocabulary 3D segmentation results on indoor scenes.

LERF
Mip-NeRF 360

KITTI Dataset Result

Open-vocabulary segmentation result on an outdoor driving scene.

Application

Language-Driven Robot Manipulation

Language-selected 3D Gaussians are converted into a point cloud for downstream grasp generation.

BibTeX

TBD