SKID offers a multi-level identifier scheme for distributed systems, combining sortability, security, and zero-lookup verification.

In distributed systems, an identifier is not just a key in the database. It simultaneously participates in indexing, is passed through the API, and serves as a correlation token. The problem arises when requirements start to conflict: B-tree requires order, security demands hiding the structure, and integrations require verifiability without additional queries. Existing approaches — UUID v4/v7, Snowflake, ULID — address some tasks, but not all at once. As a result, teams often resort to a dual-identifier scheme, where one ID is stored in the database and another is used externally, increasing storage overhead and complicating indexing.

SKID (Source Known Identifiers) offers a three-level model where the same identifier is projected across different trust boundaries. At the database level, a 64-bit SKID is used: it includes a timestamp with 250 ms precision, application and instance identifiers, as well as a sequence counter. This provides a compact primary key (8 bytes) and natural sorting for B-tree. At the trusted environment level, this same ID is transformed into a 128-bit SKEID — a UUID-compatible value that adds entity type, epoch, and BLAKE3 MAC. This format allows for validating provenance and integrity without accessing the database (zero-lookup verification). For external clients, Secure SKEID is applied — the same SKEID, but fully encrypted via AES-256, eliminating metadata leakage.

A key engineering point is the deterministic transformations between levels. SKID is stored in the database, while SKEID and Secure SKEID are computed on the fly without I/O. This removes the need for additional columns and indexes. In a basic scenario, where auto-increment IDs, UUIDs, and created_at are typically used, the system requires 32 bytes per record and three indexes. SKID reduces this to a single 8-byte field and one index, resulting in up to 75% savings at the storage level. Sorting is preserved, and the timestamp is already embedded in the identifier.

From a performance perspective, the architecture is divided into three cost levels. Generating SKID takes about 35 ns due to simple bit-packing without cryptography. SKEID takes about 230 ns, despite the addition of MAC, and is faster than UUID v7 (~377 ns). Secure SKEID takes about 544 ns, which is approximately 1.4 times slower than UUID v7 due to AES-256. This is a clear trade-off: additional security versus latency in generation. However, even the “heaviest” option remains in the sub-microsecond range, making it acceptable for high-throughput systems.

An interesting aspect is zero-lookup verification. In classical systems, verifying an ID requires accessing the database or cache. Here, SKEID contains a BLAKE3 MAC, allowing for local ID validation. This reduces the load on storage and decreases latency in distributed interactions. However, the MAC is truncated to 32 bits, which in itself does not provide strong cryptographic resistance. The solution is compensated by a defense-in-depth architecture: in addition to the MAC, AES-256, marker bytes, entity type verification, and even the probability of record existence are used. Without passing through all levels, an attack becomes nearly impossible.

Particular attention is paid to the conflict between sortability and confidentiality. Timestamp-based schemes inherently reveal generation patterns. SKID addresses this through level separation: within the system, the ID remains ordered, while an encrypted form is presented externally. Thus, the same entity has different representations tailored to the context of use, without data duplication.

For the industry, this appears as a pragmatic alternative to the dual-identifier pattern. The approach is especially useful in high-load systems, where indexing efficiency and reduced I/O are crucial. It is also applicable in API design, where secure publication of identifiers without leaks is required. The limitation is the complexity of the cryptographic part and the need for key management (key rotation, key-ring). This adds operational overhead, especially for SRE and DevOps teams.

Overall, SKID is an example of how architectural decomposition by trust boundary allows for reconciling incompatible requirements. Instead of seeking a universal ID, a system is proposed where a single identifier changes form depending on the context while maintaining a strict correlation between representations.

Information source

arXiv is the largest open preprint repository (since 1991, under the auspices of Cornell), where researchers quickly post working versions of papers; the materials are publicly accessible but do not undergo full peer review, so results should be considered preliminary and, where possible, checked against updated versions or peer‑reviewed journals. arxiv.org

View the original research PDF

SKID offers a multi-level identifier scheme for distributed systems, combining sortability, security, and zero-lookup verification.

🚀 Deploy the Blocks