SLAM is regarded as a fundamental task in mobile robots and AR, implementing localization and mapping in certain circumstances.However, with only RGB images as input, monocular SLAM systems suffer problems of scale ambiguity and tracking difficulty in dynamic scenes.Moreover, high-level semantic information can always contribute to the SLAM process due to its similarity to human vision.
Addressing these problems, we propose a monocular object-level SLAM system enhanced by real-time joint depth estimation and semantic segmentation.The multi-task network, called JSDNet, is designed to predict depth and semantic segmentation simultaneously, with princess polly dresses long sleeve four contributions that include depth discretization, feature fusion, a weight-learned loss function, and semantic consistency optimization.Specifically, feature fusion facilitates the sharing of features between the two tasks, while semantic consistency aims to guarantee the semantic segmentation and depth consistency among various views.
Based on the results of JSDNet, we design an object-level system that combines both pixel-level and object-level semantics with traditional tracking, mapping, and optimization processes.In addition, a scale recovery process is also integrated into the system diegojavierfares.com to evaluate the truth scale.Experimental results on NYU depth v2 demonstrate state-of-the-art depth estimation and considerable segmentation precision under real-time performance, while the trajectory accuracy on TUM RGB-D shows less errors compared with other SLAM systems.