Skip to content

Latest commit

 

History

History
176 lines (135 loc) · 7.21 KB

File metadata and controls

176 lines (135 loc) · 7.21 KB
title IPIP-0445: Option to Skip Raw Blocks in Gateway Responses
date 2023-10-09
ipip open
editors
name github url affiliation
Hugo Valtier
Jorropo
name url
Protocol Labs
name github url affiliation
Marcin Rataj
lidel
name url
Protocol Labs
relatedIssues
order 445
tags
ipips

Summary

Introduce skip-raw-blocks flag for the :cite[trustless-gateway].

Motivation

Allow clients to read a stream which only contain proofs in a bottom heavy graph using raw codec for it's leaves.

Usefull for UnixFS for features like webseeds (ipfs/specs#444), where metadata about a DAG is fetched from a trustless gateway, but the actual raw data can be fetched from any source that supports either trustless gateway specification, or plain HTTP Range Requests, allowing for trustless and verifiable data retrieval from plain HTTP (non-IPFS) data sources.

Detailed design

The skip-raw-blocks URL query parameter on :cite[trustless-gateway] allows clients to download an entity except blocks with the multicodec raw (0x55).

  • When set to y, the parameter instructs the gateway not to transmit blocks referenced with a CID with the raw multicodec.
  • If set to n, or left unspecified, there is no special handling of raw multicodec blocks (the existing default behavior remains the same).

Importantly, unless explicitly specified as y, the default operational mode of the gateway MUST assume the value of skip-raw-blocks to be n.

Design rationale

User Benefit

Implementing the skip-raw-blocks parameter offers several benefits to users:

  1. Verification Flexibility: Clients can verify out-of-band (OOB) received files in their deserialized form without necessitating the transmission of raw blocks from the gateway.

  2. Incremental Download: Clients can incrementally download files in deserialized forms from non-IPFS servers. Allowing applications to share distribution for IPFS and non-IPFS clients.

  3. Efficient Block Discovery: With the skip-raw-blocks option enabled, clients can quickly discover numerous candidate blocks without being bottlenecked by the gateway's transmission of raw blocks.

  4. Non-IPFS HTTP Mirrors Become Useful: Legacy data that is already exposed over HTTP in deserialized form can now act as sources for specific block byte ranges, without having to support any IPFS specific APIs. Plain HTTP Range Requests can be used for fetching remaining raw block data, and the metadata read via skip-raw-blocks=y is enough for a client to verify the remaining raw block byte ranges fetched from non-IPFS system match expected CIDs.

Compatibility

Setting the default value of the skip-raw-blocks parameter to n ensures backward compatibility with existing clients and systems that are unaware of this new flag.

Alternatives

An alternative approach would be to request blocks individually. However, it adds extra round trips and more per HTTP request overhead and thus is undesirable.

Why not dag-scope=skip-raw-blocks ?

The existing dag-scope parameter determines the overall range of blocks to retrieve, while skip-raw-blocks selectively filters specific blocks across all scopes and ranges. Combining them under one parameter would restrict their combined utility.

For example:

  • A client is streaming a video from a webseed and the user seeks through the video, then the client would send dag-scope=entity&entity-bytes=42:1337 with skip-raw-blocks=y to download the proofs for the required section of the video, and then fetches remaining raw data byte ranges from a faster CDN.
  • A client is verifying an OOB transferred directory in deserialized form, then dag-scope=all with skip-raw-blocks=y makes sense.

Why not CAR content type parameter ?

CAR content type's (application/vnd.ipld.car) optional parameters like order and dups impact the way data is represented when returned as a CAR stream, but does modify the scope of the data itself. Does not add nor subtract data from the response.

The scope of the data is controlled by URL content path and optional dag-scope, entity-bytes URL parameters. This is where skip-raw-blocks belongs.

This is not just a matter of aesthetics: the URL path and query parameters allow for caching of different subsets of a DAG in a way that is interoperable with existing HTTP tools and clients, minimizes risk of caching incomplete DAG response due to HTTP cache misconfiguration. Thanks to skip-raw-blocks being in the URL query, we ensure CAR responses without raw blocks will be cached under different key than full responses (just like already existing dag-scope and entity-bytes).

Why not generic skip-leaves that skips all leaves, not just raw blocks?

Prevention of amplification attacks and efficient server operation.

By utilizing the raw (0x55) codec servers can trivially determine whether to fetch or skip a block without having to fetch it to learn any new information.

If we framed this feature around skipping all leaf nodes, that would require server to fetch the leaves to learn if they have any child nodes. This would force server to fetch data that is never returned to the client.

Although skip-raw-blocks is more limited and not able to handle UnixFS files chunked without --raw-leaves option, it allows both the client and server to trivially verify a block must not be fetched. Preventing issues of Amplification where a server could need to fetch multiple orders more data than the client when executing the request.

Security

This IPIP does not impact security model of trustless gateway.

Test fixtures

:::issue

TODO: update below section with CIDs or CARs from conformance tests

Scenarios we should check:

  • request for /ipfs/cid where CID has raw codec MUST return HTTP 400 (Bad Request)
  • reuse existing UnixFS DAG that has raw-leaves, request it with skip-raw-blocks=n, confirm the response includes expected raw leaves' CIDs
  • create a new CAR fixture that only have non-raw blocks. Request it with skip-raw-blocks=y, confirm the response includes expected CIDs and does not include raw blocks referenced by parents.
    • important part is creating CAR fixture by hand, and ensure the raw blocks are NEVER announced anywhere (generate fixture with random data, add to ipfs with raw-leaves option, then export DAG without raw blocks (use go-car's filter or similar)
      • Why? This goes extra mile, but ensures every conformant gateway implementation is not doing useless work of fetching raw blocks which are not required for fulfilling skip-raw-blocks=y requests). We did similar thing for entity-bytes and it was the only way we could show bugs in Saturn project's cache implementation at the time.

:::

Copyright

Copyright and related rights waived via CC0.