Home InternationalStreaming millions of TESSERA tiles over HTTP with...
International⭐ Featured

Streaming millions of TESSERA tiles over HTTP with Zarr v3

How we restructured TESSERA's geospatial embeddings from millions of individual numpy files into sharded Zarr v3 stores for efficient HTTP streaming, enabling everything from single-pixel mobile lookups to regional-scale analysis with just a couple of range requests.

7 April 2026 at 08:20 am
1 views
Streaming millions of TESSERA tiles over HTTP with Zarr v3

In recent years, the demand for efficient geospatial data management and analysis has surged, driven by advancements in machine learning and the increasing need for real-time spatial insights. TESSERA, a project focused on geospatial embeddings, faced the challenge of handling millions of numpy files, each containing spatial data, which posed significant storage and retrieval challenges. To address this, the team restructured their approach, opting to consolidate these files into a more efficient system using Zarr v3, a distributed array storage solution. This shift not only improved storage efficiency but also enabled faster data retrieval, paving the way for applications ranging from mobile-based single-pixel lookups to large-scale regional analysis.

Originally, TESSERA's geospatial embeddings were stored as individual numpy files, each representing a small tile of spatial data. While this approach was straightforward, it became increasingly difficult to manage as the dataset grew. Retrieving data for specific locations or regions required accessing numerous individual files, which was inefficient and slow. Moreover, scaling this system to handle larger datasets or higher traffic became a significant challenge.

The decision to migrate to Zarr v3 was driven by its ability to handle large-scale array data efficiently. Zarr v3 offers a distributed storage system that allows data to be sharded across multiple chunks, each stored in a separate file or location. This sharding mechanism enables efficient data retrieval, as only the necessary chunks are accessed, reducing the load on the system and minimizing latency.

The restructuring process involved several key steps. First, the team needed to understand the layout and structure of the existing data. Each numpy file represented a specific geographic tile, and the goal was to map these files into a Zarr v3 store. This involved determining the appropriate chunk size and sharding strategy to ensure that data could be accessed efficiently.

Once the data layout was established, the team began the migration process. Each numpy file was converted into a Zarr chunk, and these chunks were then stored in a distributed file system. The choice of file system was crucial, as it needed to support high-speed read and write operations and scale horizontally. Amazon S3 was selected for this purpose, offering a robust and scalable storage solution.

With the data migrated, the next step was to develop the infrastructure for efficient HTTP streaming. Zarr v3 provides a REST API that allows clients to request specific ranges of data. This capability was leveraged to enable users to retrieve data for single pixels or entire regions with minimal latency. The API was integrated into the TESSERA system, allowing clients to make range requests and receive the corresponding data in a streamed format.

The benefits of this restructuring are significant. For mobile applications requiring single-pixel lookups, the system can now retrieve data almost instantaneously, improving user experience and enabling real-time spatial analysis. Similarly, for regional-scale analysis, the ability to request a range of data in a few requests significantly reduces the time and resources needed compared to the old system.

Moreover, the use of Zarr v3 has made the TESSERA system more scalable. As the dataset grows, additional chunks can be added to the store without disrupting existing operations. This scalability ensures that the system can continue to meet the growing demands of users and applications.

In conclusion, TESSERA's migration from individual numpy files to a Zarr v3 store has transformed their geospatial data management and analysis capabilities. By leveraging the distributed storage and efficient range retrieval of Zarr v3, the project has achieved significant improvements in data access speed and system scalability. This restructuring not only enhances the performance of existing applications but also opens up new possibilities for geospatial analysis, from mobile devices to large-scale regional studies. The success of this migration highlights the potential of modern data storage solutions in addressing the challenges of handling large-scale spatial data efficiently.

Source: OCaml Planet
📰 Related News
Ollama 0.2.6 Released with Native Gemma 4 Support and Enhanced Performance
Ollama 0.2.6 Released with Native Gemma 4 Support and Enhanced Performance
Ollama 0.2.6 is now live, featuring native support for Google's Gemma 4 models and improved local inference performance for Windows, macOS, and Linux.
14 Apr
Weekly news roundup: Shortages spread to MLCCs; SK Hynix reportedly in talks with Microsoft and Google
Weekly news roundup: Shortages spread to MLCCs; SK Hynix reportedly in talks with Microsoft and Google
Below are the most-read DIGITIMES Asia stories from the week of April 6-April 13, 2026:
14 Apr
sparkstat added to PyPI
sparkstat added to PyPI
Real-time GPU monitor for NVIDIA DGX Spark and other unified memory (UMA) systems
14 Apr
sparkstat 0.1.0
sparkstat 0.1.0
Real-time GPU monitor for NVIDIA DGX Spark and other unified memory (UMA) systems
14 Apr
sparkstat 0.1.1
sparkstat 0.1.1
Real-time GPU monitor for NVIDIA DGX Spark and other unified memory (UMA) systems
14 Apr
cutile-stencil 0.2.0
cutile-stencil 0.2.0
An xDSL-based stencil compiler that generates optimized GPU kernels via NVIDIA cuTile
14 Apr
gswarp 1.0.3
gswarp 1.0.3
Pure-Python NVIDIA Warp backend for 3D Gaussian Splatting
14 Apr
merlin-llm added to PyPI
merlin-llm added to PyPI
Merlin — a fast local LLM for agentic coding on Apple Silicon
14 Apr
Fluent Cut - Craft and compose videos programmatically in PHP with an elegant fluent API
Fluent Cut - Craft and compose videos programmatically in PHP with an elegant fluent API
Craft and compose videos programmatically in PHP with an elegant fluent API - b7s/fluentcut
14 Apr
Crypto Investor at Center of Trump Corruption Allegations Now Sees Himself as ‘Victim’
Crypto Investor at Center of Trump Corruption Allegations Now Sees Himself as ‘Victim’
Justin Sun has accused Trump-affiliated World Liberty Financial of misconduct and a general lack of transparency.
14 Apr