We show that pretraining specifically for remote sensing applications does not always improve over general-purpose pretraining of visual models.
We show that Qwen-based visual-language models do not understand simple maps. We fine-tune them to teach that specific skill and improve navigation based on maps and visuals.
We describe a ViT-based solution for predicting indoor radio maps, as part of ICASSP 2025 SPGC Challenge
We design a benchmark for evaluating remote sensing foundation models in cross-satellite generalization settings. We also design a strong self-supervised baseline.
Data specialization is when you pick a subset of the dataset, and it improves the model performance in a compute-controlled setting. We show an example on how it works on a dataset of satellite images from Maxar.
We show how incorrect maps from online mapping services can be improved on the ground by leveraging radio signal parameters across antennas and devices.
A U-Net based method for predicting indoor radio map when only a few sparse measurements are available. Our solution for MLSP 2025 Challenge
This work provides a systematic study in the field of wireless communication of synthetic-to-real transfer in RF localization and highlights the value of simulation-aware pretraining for generalizing DL models to real-world scenarios.
This work introduces an automated pipeline for generating large-scale IoT network datasets by bringing together the Contiki-NG firmware, parameterized topology generation, and Slurm-based orchestration of Cooja simulations. The system supports a variety of network structures, scalable node counts, randomized battery allocations, and routing protocols to reproduce diverse failure modes.
This paper addresses the CityNav aerial navigation benchmark by fine-tuning a small, open-source Vision-Language Model, Qwen2.5-VL-3B.
Spurious correlations cause serious problems in all ML algorithms. In this paper we investigate the challenges in image classification tasks within the in-context learning paradigm with transformers.
In this paper we investigate the differences between various self-supervised algorithms for learning visual encoders (e.g. MAE, DINO). We highlight critical issues with MAE-like methods.
We show that having perfect information about radio signals received at a device from even a single base antenna can be enough to localize the device.
We show that pretrained image representations including DINOv2 contain spurious correlations which can harm classification accuracy. We propose a method to remove such correlations from the representations.
In many practical applications training and test data come from slightly different distributions. This paper provides a comprehensive analysis on how ML models fail in such scenarios.
A comprehensive analysis of semi-supervised learning methods for computer vision.