У нас вы можете посмотреть бесплатно CNNs vs. Transformers for Urban Semantic Segmentation или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
Abstract. Urban semantic segmentation is a key perception task for autonomous driving and mobile robotics, where models must operate with limited labels, imperfect visual streams, and strict latency–memory budgets. Reported CNN and Transformer results are often hard to compare because training recipes, preprocessing, and measurement practices differ across studies. This paper presents a controlled Cityscapes evaluation that fixes the official split, label mapping, resizing to 512×1024, normalization, and metric code, then compares a DeepLabV3 CNN baseline against SegFormer under identical evaluation rules. We analyze four deployment-facing axes: clean validation accuracy, label efficiency, robustness to common corruptions (severity-3 blur, noise, JPEG compression, brightness/contrast), and measured computational cost (latency, peak VRAM, parameters). Results show a regime-dependent trade-off: the CNN achieves the best clean score under full supervision (0.7217 mIoU), while SegFormer is stronger in the low-label regime (0.6476 mIoU at 10%). Robustness diverges sharply under Gaussian noise and JPEG compression, where SegFormer degrades far less than the CNN, and it runs with a much smaller footprint (3.72M parameters, 0.64 GB VRAM) and slightly lower latency, motivating closer analysis of these operating points.