Building the future of instantly searchable geospatial data with vector embedding

Imagine a world where the vast expanse of geospatial data, from sprawling urban landscapes captured by satellites to the intricate details of street-level surveys, becomes instantly searchable. Not just by keywords or metadata, but by the very essence of the images themselves. This isn't a futuristic fantasy; it's the reality Sensat is building, powered by the transformative potential of vector embeddings.
In December 2024, I posed a simple yet profound question: "What if you could query the world as easily as you search the internet?" This sparked an ambitious experiment, a dual exploration into the capabilities of vector embeddings for geospatial data. One aspect focused on searching through the sheer scale of remote sensing data, the other on the ingenuity of a recent graduate named Josh.

Josh, fresh from UCL with a passion for computer vision and geospatial technology, approached us with an infectious eagerness to help validate our hypotheses. He wasn't just looking for a job; he was looking for a challenge. And we had one ready for him: to build a vector embedding-powered image search engine in just three weeks.
The data deluge: a need for smarter search
Sensat operates in a world of data abundance. We capture and process terabytes of geospatial information, from high-resolution UAV imagery to detailed mobile mapping surveys. This data holds immense value, but its sheer volume presents a significant challenge. Traditional methods of searching and analysing this data, manual inspection, keyword searches which rely on often incomplete metadata, and cumbersome compute heavy analysis are slow, inefficient, and often miss crucial insights.
Vector embeddings offer a radical departure. They transform complex data, like images, into numerical representations that capture their underlying semantic. This allows us to perform lightning-fast searches based on visual content, uncovering patterns and relationships that were previously hidden.
Josh's three-week sprint: from theory to reality
Josh’s mission was ambitious: to develop a system capable of indexing and comparing diverse geospatial datasets using vector embeddings. To manage the complexity, we structured his work into three stages, beginning with terrestrial image data. Stage 1 focused on MMS images, providing a controlled starting point before scaling to larger and more varied datasets.
His task was broken down into three stages:
- Stage 1: Street Level Insights: Automating the extraction and categorisation of bridges from a massive MMS image dataset.
- Stage 2: Scaling to the Skies: Extending the vector embedding approach to UAV orthomosaics, identifying features like roads, buildings, and vegetation.
- Stage 3: A geospatial microscope: Building a search engine that could seamlessly navigate orthomosaics at varying levels of detail.
Street-level insights: The MMS experiment
Josh's journey began with the challenge of identifying unique bridges within a dataset of over 2000 spherical MMS images. He leveraged OpenAI’s foundation models like CLIP, fine-tuned on street-level imagery, transforming raw image data into semantically rich vector embeddings. This allowed him to cluster images of individual bridges, even under challenging conditions. Finally, he developed a similarity search system, allowing him to quickly compare embeddings and identify images containing unique bridges.
While the initial results were promising, Josh encountered challenges in distinguishing between closely located bridges. This highlighted the nuances of working with real-world geospatial data, where context and proximity play a crucial role. Josh’s search capability could also locate and group images with traffic lights, street signs and various other street furniture.


Scaling to the Skies: the orthomosaic challenge
Next, Josh tackled the more complex task of working with UAV orthomosaics, using the SensatUrban dataset as a benchmark. SensatUrban covers 6 km² across two UK cities, featuring meticulously labelled urban elements like roads, buildings, parks, trees, rivers, car parks, lakes, and bridges for precise evaluation. He incorporated OpenStreetMap (OSM) data to enrich the semantic information, navigating the inconsistencies and complexities of this open-source resource.
He experimented with various vision-language models (VLMs), eventually finding success with GeoRSCLIP, a model specifically designed for remote sensing imagery. This model proved significantly more accurate than similar VLM such as SatCLIP and RemoteCLIP. These models are designed to understand the relationship between images and text, making them useful for generating semantically rich embeddings.
Crucially, Josh discovered the role of tile size affecting the accuracy. He found that smaller tiles yielded greater detail, though they increased processing time. Conversely, larger tiles offered speed, but at the cost of accuracy. This paved the way for the final stage of his project: building a multi-resolution search engine that could unlock the full potential of Sensat's geospatial data.

A geospatial microscope: multi-resolution search
The final stage of Josh's project involved building a multi-resolution search engine, allowing users to seamlessly explore orthomosaics at varying zoom levels. This presented the challenge of maintaining accuracy and efficiency across different levels of detail.
Josh experimented with slippy tiles and various zoom levels, finding that zoom level 20, with its finer detail, yielded the best results. He also explored aggregation techniques, but found that weighted averaging of predictions from different zoom levels was not effective in capturing the complex hierarchical relationships between features at different scales, highlighting the need for more sophisticated, hierarchical indexing methods.

The Sensat advantage: real-world impact
Josh's three-week sprint has demonstrated the transformative potential of vector embeddings for geospatial search. This technology offers:
- Unprecedented speed: Searching terabytes of data in seconds.
- Pinpoint accuracy: Identifying features with semantic understanding, predominantly focusing on the visual content..
- Hidden insights: Uncovering patterns and relationships that were previously invisible.
This ability to rapidly explore geospatial data at scale unlocks a new era of data-driven decision-making across countless sectors, from urban planning and infrastructure management to environmental monitoring and beyond. And as we expand beyond images to incorporate multimodal vector embeddings, the potential grows exponentially. The aim is to seamlessly search across not just images, but also textual reports, GIS vector data, and even point clouds, all indexed and searchable within a unified embedding space!
The Sensat platform, powered by vector embeddings, provides a fundamentally different experience with geospatial data. For our internal teams, the time saved in tasks like generating training data for semantic segmentation is priceless. Clients often request bespoke feature identification, which traditionally requires manually sifting through tens of thousands of aerial and satellite images to create training datasets. Vector embeddings can automate this process, instantly identifying and clustering relevant image patches, drastically reducing manual effort and accelerating algorithm development.
And for our clients, particularly in sectors like energy and water, the Sensat platform becomes an even more powerful optioneering tool. With the power of vector embeddings, they can quickly search for and identify relevant natural features, perhaps finding locations where existing vegetation complements planned green infrastructure, or conversely, identifying areas where natural features might hinder development, allowing for more informed and sustainable planning decisions.
This translates to faster project turnaround times, reduced operational costs, and enhanced decision-making for our clients.
Lessons learned and future frontiers: unveiling the nuances of geospatial AI
Josh's three-week journey at Sensat wasn't merely a demonstration of technological potential; it was an expedition into the practicalities of applying cutting-edge AI to the intricate world of geospatial data. The lessons we gleaned will be pivotal in shaping our future trajectory, guiding our development and deployment of vector embedding powered technologies.
1. Specialised AI models deliver precision
A key revelation was the undeniable superiority of specialised vision-language models (VLMs) like GeoRSCLIP. While general-purpose VLMs provide a broad understanding, they often lack the precision needed for the unique characteristics of geospatial data. GeoRSCLIP's performance highlighted the necessity of models trained on domain-specific datasets, such as satellite imagery, aerial orthomosaics, and street-level surveys. In the future, we will explore fine-tuning our own models, leveraging our proprietary data to achieve even greater accuracy and relevance. We also know that fine-tuning should be done with specific use cases in mind.
2. The delicate balance of granularity
The exploration of tile size and zoom levels illuminated the delicate interplay between detail and efficiency. Zoom level 20, with its finer granularity, consistently yielded the most accurate results, but at the cost of increased processing time. This highlighted the need for adaptive algorithms that can dynamically adjust tile size and zoom levels based on the specific search task and data characteristics. For instance, a broad overview search might benefit from larger tiles and lower zoom levels when identifying vegetation patterns across a city, while a detailed inspection of cracks on a bridge would require the opposite. We will also investigate the use of dynamic tile creation, such as generating tiles on demand for real-time construction site monitoring, as opposed to static tiles, which are more suited for fixed datasets like historical land use maps.
3. The complexity of multi-resolution search
The limitations of simple aggregation techniques, such as weighted average predictions from different zoom levels, underscored the inherent complexity of multi-resolution search. Geospatial data is inherently hierarchical, with features revealing different details at varying zoom levels in images. A dense forest, for example, may appear as a uniform green canopy from a distance, but as you zoom in, individual trees, branches, and even leaf textures become visible.
. This necessitates a more sophisticated approach than simple aggregation. We must develop algorithms that can understand and navigate these hierarchical relationships, perhaps through hierarchical indexing, adaptive search strategies, or graph-based representations.
4. Data quality as the bedrock
Working with OpenStreetMap (OSM) data underscored the critical importance of data quality and consistency. Inconsistencies and inaccuracies in open-source data can significantly impact the reliability of our search engine. This lesson extends beyond OSM to all geospatial data sources. We must prioritise robust data cleaning and validation processes, implementing automated quality checks and integrating data from multiple sources to ensure that our search engine is built on a foundation of trust.
5. The importance of contextual awareness
While vector embeddings excel at capturing visual features, they often lack contextual understanding. Factors like proximity, spatial relationships, and temporal changes can significantly impact the interpretation of geospatial data. A patch of green, for example, might be a park in one context and a field in another. We must develop algorithms that can incorporate contextual information, leveraging spatial reasoning and temporal analysis to enhance our ability to understand the complex relationships between features.
6. User-centric design
Josh's project highlighted the need for intuitive user interfaces that make geospatial search accessible to a wide range of users. Geospatial professionals, city planners, and environmental scientists all have unique needs and workflows. We must design user-friendly interfaces that allow users to easily explore and interact with geospatial data, regardless of their technical expertise. This includes features like intelligent search suggestions, seamless layer toggling, adaptive filtering, and responsive visualisations that adjust to different levels of detail. An important aspect of this is scene-level understanding, where the system can interpret spatial context, such as distinguishing between urban and natural environments or identifying key features within a landscape, to provide more meaningful insights.
The power of innovation and trust in young minds
Josh’s journey isn’t just about vector embeddings; it’s a shining example of what happens when you give bright, eager individuals the freedom to run with an idea. His tenacity, his sharp thinking, and that palpable enthusiasm were crucial to the project's rapid progress and its ultimate impact. It underscores a fundamental truth: true innovation often blossoms when you empower fresh talent, providing them with challenging problems and the autonomy to find creative solutions.
As we continue to explore the frontiers of geospatial AI, we remain committed to fostering a culture of innovation, where bright minds are given the ‘License to Innovate Boldly’ ‘
Seeking your input
How can Sensat capitalise on the success of this project, and what next big undertaking in this domain should we focus on? Do you have any impactful use cases that we can apply this technology to? We are very keen to hear your thoughts.