Data locality refers to the principle of keeping data as close as possible to where it’s being processed to minimize data movement and access latency. This concept is crucial for optimizing performance in distributed systems and big data processing.
Types of Data Locality:
- Temporal Locality: When data accessed recently is likely to be accessed again soon (like caching frequently used data)Spatial Locality: When data physically stored close together tends to be accessed together (like sequential reads in an array)Processing Locality: When computation is moved closer to where data resides rather than moving data to the computation
If all the data that need to be processed is co-located, the need for reaching out to data goes away, thus speeding up data processing. However data locality is often contrived.
Origin: RW Design Patterns Every Data Engineer Should Know
References:
Created 2025-02-16
