AI for Robotics

We develop models that help robots perceive and understand their surroundings, enabling them to follow natural-language instructions and navigate autonomously. On the perception side, we focus on multispectral imagery (mostly satellite data) and radio-frequency (RF) signals for wireless sensing. On the action side, we study drone navigation in complex environments. We have built a strong vision encoder for satellite imagery with robust cross-satellite generalization. We have also developed neural networks for processing RF data to perform device localization and environment reconstruction. Finally, we fine-tune Qwen and Isaac vision-language models for navigation in simulated urban environments.

2025

Do Satellite Tasks Need Special Pretraining?

We show that pretraining specifically for remote sensing applications does not always improve over general-purpose pretraining of visual models.

2025

Teaching Visual Language Models to Navigate using Maps

We show that Qwen-based visual-language models do not understand simple maps. We fine-tune them to teach that specific skill and improve navigation based on maps and visuals.

2025

Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction

We describe a ViT-based solution for predicting indoor radio maps, as part of ICASSP 2025 SPGC Challenge

2025

GeoCrossBench: Cross-Band Generalization for Remote Sensing

We design a benchmark for evaluating remote sensing foundation models in cross-satellite generalization settings. We also design a strong self-supervised baseline.

2025

Less is More? Data Specialization for Self-Supervised Remote Sensing Models

Data specialization is when you pick a subset of the dataset, and it improves the model performance in a compute-controlled setting. We show an example on how it works on a dataset of satellite images from Maxar.

2025

Fusion of Pervasive RF Data with Spatial Images via Vision Transformers for Enhanced Mapping in Smart Cities

We show how incorrect maps from online mapping services can be improved on the ground by leveraging radio signal parameters across antennas and devices.

2025

U-Net for Indoor Pathloss Prediction from Sparse Measurements with Physics-Informed Features

A U-Net based method for predicting indoor radio map when only a few sparse measurements are available. Our solution for MLSP 2025 Challenge

2025

Bridging the Sim-to-Real Gap in RF Localization with Large-Scale Synthetic Pretraining

This work provides a systematic study in the field of wireless communication of synthetic-to-real transfer in RF localization and highlights the value of simulation-aware pretraining for generalizing DL models to real-world scenarios.

2025

Scalable Generation of Synthetic IoT Network Datasets: A Case Study with Cooja

This work introduces an automated pipeline for generating large-scale IoT network datasets by bringing together the Contiki-NG firmware, parameterized topology generation, and Slurm-based orchestration of Cooja simulations. The system supports a variety of network structures, scalable node counts, randomized battery allocations, and routing protocols to reproduce diverse failure modes.

2025

Towards Fine-tuning a Small Vision-Language Model for Aerial Navigation

This paper addresses the CityNav aerial navigation benchmark by fine-tuning a small, open-source Vision-Language Model, Qwen2.5-VL-3B.

2024

In-context learning in presence of spurious correlations

Spurious correlations cause serious problems in all ML algorithms. In this paper we investigate the challenges in image classification tasks within the in-context learning paradigm with transformers.

2024

Analyzing Local Representations of Self-supervised Vision Transformers

In this paper we investigate the differences between various self-supervised algorithms for learning visual encoders (e.g. MAE, DINO). We highlight critical issues with MAE-like methods.

2024

Deep learning with synthetic data for wireless NLOS positioning with a single base station

We show that having perfect information about radio signals received at a device from even a single base antenna can be enough to localize the device.

2023

Identifying and disentangling spurious features in pretrained image representations

We show that pretrained image representations including DINOv2 contain spurious correlations which can harm classification accuracy. We propose a method to remove such correlations from the representations.

2021

Failure Modes of Domain Generalization Algorithms

In many practical applications training and test data come from slightly different distributions. This paper provides a comprehensive analysis on how ML models fail in such scenarios.

2021

Deep Semi-Supervised Image Classification Algorithms: a Survey

A comprehensive analysis of semi-supervised learning methods for computer vision.

2020

Robust classification under class-dependent domain shift