Abstract:
Recent advancements in self-supervised learning in the point cloud domain have
demonstrated significant potential. However, these methods often suffer from drawbacks, including lengthy pre-training time, the necessity of reconstruction in the input space, or the necessity of additional modalities. In order to address these issues, we introduce Point-JEPA, a joint embedding predictive architecture designed specifically for the point cloud domain. We introduce a sequencer that orders point cloud tokens to efficiently compute and utilize tokens’ proximity based on their indices. This allows shared computation of proximity for point cloud tokens, allowing the efficient selection of spatially contiguous context and target blocks. Experimentally, our method achieves competitive results with state-of-the-art methods while avoiding the reconstruction in the input space or additional modality. Specifically, it outperforms other self-supervised learning methods on linear evaluation and few-shot classification on ModelNet40, showing the robustness of the learned representation. The results show that Point-JEPA is an alternative efficient pre-training method to pre-existing methods in the point cloud domain.