由于仅依赖图像输入进行三维目标定位本质上是一个病态问题,基于 […]
Bootstrap Masked Visual Modeling via Hard Patch Mining (TPAMI 2025)
研究介绍: 典型的视觉掩码建模方法局限于模型预测被掩码标记的 […]
Reconstructive Visual Instruction Tuning.International Conference on Learning Representations (ICLR 2025)
本研究提出了一种重构式视觉指令调优框架(ROSS),该框架创 […]
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness (ICCV 2025)
研究介绍: 随着二维图像与视频处理领域大规模多模态模型(LM […]
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-scale Scenes (ICLR 2025)
研究介绍 3D 高斯投影(3D Gaussian Splat […]
MCOP: Multi-UAV Collaborative Occupancy Prediction (ICCV 2025)
研究介绍: 为应对无人机集群系统中多样化任务对高效协同感知的 […]
ENHANCING END-TO-END AUTONOMOUS DRIVING WITH LATENT WORLD MODEL (ICLR 2025)
最近端到端规划方法在自动驾驶领域受到了广泛关注,因其相较于传 […]
FreeSim:Toward Free-viewpoint Camera Simulation in Driving Scenes (CVPR2025)
研究介绍: 我们提出了FreeSim,一种面向自动驾驶的相机 […]
FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering (cvpr2025)
研究介绍: 利用3D高斯泼溅技术,驾驶场景重建和渲染取得了显 […]
UIPro: Unleashing Superior Interaction Capability For GUI Agents (ICCV 2025)
研究介绍: 我们提出了UIPro,在大规模统一数据之上训练得 […]
