specs/src/ipips/ipip-0445.md at f96a92a5262fba7c9f788192aaea3f37ab0d8f06 · ipfs/specs

title

IPIP-0445: Option to Skip Raw Blocks in Gateway Responses

date

2023-10-09

ipip

open

editors

name

github

url

affiliation

Hugo Valtier

Jorropo

https://jorropo.net/

name	url
Protocol Labs	https://protocol.ai/

name

github

url

affiliation

Marcin Rataj

lidel

https://lidel.org/

name	url
Protocol Labs	https://protocol.ai/

relatedIssues

#444

order

445

Summary

Introduce skip-raw-blocks flag for the :cite[trustless-gateway].

Motivation

Allow clients to read a stream which only contain proofs in a bottom heavy graph using raw codec for it's leaves.

Usefull for UnixFS for features like webseeds (ipfs/specs#444), where metadata about a DAG is fetched from a trustless gateway, but the actual raw data can be fetched from any source that supports either trustless gateway specification, or plain HTTP Range Requests, allowing for trustless and verifiable data retrieval from plain HTTP (non-IPFS) data sources.

Detailed design

The skip-raw-blocks URL query parameter on :cite[trustless-gateway] allows clients to download an entity except blocks with the multicodec raw (0x55).

When set to y, the parameter instructs the gateway not to transmit blocks referenced with a CID with the raw multicodec.
If set to n, or left unspecified, there is no special handling of raw multicodec blocks (the existing default behavior remains the same).

Importantly, unless explicitly specified as y, the default operational mode of the gateway MUST assume the value of skip-raw-blocks to be n.

Design rationale

User Benefit

Implementing the skip-raw-blocks parameter offers several benefits to users:

Verification Flexibility: Clients can verify out-of-band (OOB) received files in their deserialized form without necessitating the transmission of raw blocks from the gateway.
Incremental Download: Clients can incrementally download files in deserialized forms from non-IPFS servers. Allowing applications to share distribution for IPFS and non-IPFS clients.
Efficient Block Discovery: With the skip-raw-blocks option enabled, clients can quickly discover numerous candidate blocks without being bottlenecked by the gateway's transmission of raw blocks.
Non-IPFS HTTP Mirrors Become Useful: Legacy data that is already exposed over HTTP in deserialized form can now act as sources for specific block byte ranges, without having to support any IPFS specific APIs. Plain HTTP Range Requests can be used for fetching remaining raw block data, and the metadata read via skip-raw-blocks=y is enough for a client to verify the remaining raw block byte ranges fetched from non-IPFS system match expected CIDs.

Compatibility

Setting the default value of the skip-raw-blocks parameter to n ensures backward compatibility with existing clients and systems that are unaware of this new flag.

Alternatives

An alternative approach would be to request blocks individually. However, it adds extra round trips and more per HTTP request overhead and thus is undesirable.

Why not `dag-scope=skip-raw-blocks` ?

The existing dag-scope parameter determines the overall range of blocks to retrieve, while skip-raw-blocks selectively filters specific blocks across all scopes and ranges. Combining them under one parameter would restrict their combined utility.

For example:

A client is streaming a video from a webseed and the user seeks through the video, then the client would send dag-scope=entity&entity-bytes=42:1337 with skip-raw-blocks=y to download the proofs for the required section of the video, and then fetches remaining raw data byte ranges from a faster CDN.
A client is verifying an OOB transferred directory in deserialized form, then dag-scope=all with skip-raw-blocks=y makes sense.

Why not CAR content type parameter ?

CAR content type's (application/vnd.ipld.car) optional parameters like order and dups impact the way data is represented when returned as a CAR stream, but does modify the scope of the data itself. Does not add nor subtract data from the response.

The scope of the data is controlled by URL content path and optional dag-scope, entity-bytes URL parameters. This is where skip-raw-blocks belongs.

This is not just a matter of aesthetics: the URL path and query parameters allow for caching of different subsets of a DAG in a way that is interoperable with existing HTTP tools and clients, minimizes risk of caching incomplete DAG response due to HTTP cache misconfiguration. Thanks to skip-raw-blocks being in the URL query, we ensure CAR responses without raw blocks will be cached under different key than full responses (just like already existing dag-scope and entity-bytes).

Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks?

Prevention of amplification attacks and efficient server operation.

By utilizing the raw (0x55) codec servers can trivially determine whether to fetch or skip a block without having to fetch it to learn any new information.

If we framed this feature around skipping all leaf nodes, that would require server to fetch the leaves to learn if they have any child nodes. This would force server to fetch data that is never returned to the client.

Although skip-raw-blocks is more limited and not able to handle UnixFS files chunked without --raw-leaves option, it allows both the client and server to trivially verify a block must not be fetched. Preventing issues of Amplification where a server could need to fetch multiple orders more data than the client when executing the request.

Security

This IPIP does not impact security model of trustless gateway.

Test fixtures

:::issue

TODO: update below section with CIDs or CARs from conformance tests

Scenarios we should check:

request for /ipfs/cid where CID has raw codec MUST return HTTP 400 (Bad Request)
reuse existing UnixFS DAG that has raw-leaves, request it with skip-raw-blocks=n, confirm the response includes expected raw leaves' CIDs
create a new CAR fixture that only have non-raw blocks. Request it with skip-raw-blocks=y, confirm the response includes expected CIDs and does not include raw blocks referenced by parents.
- important part is creating CAR fixture by hand, and ensure the raw blocks are NEVER announced anywhere (generate fixture with random data, add to ipfs with raw-leaves option, then export DAG without raw blocks (use go-car's filter or similar)
  - Why? This goes extra mile, but ensures every conformant gateway implementation is not doing useless work of fetching raw blocks which are not required for fulfilling skip-raw-blocks=y requests). We did similar thing for entity-bytes and it was the only way we could show bugs in Saturn project's cache implementation at the time.

:::

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary

Motivation

Detailed design

Design rationale

User Benefit

Compatibility

Alternatives

Why not `dag-scope=skip-raw-blocks` ?

Why not CAR content type parameter ?

Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks?

Security

Test fixtures

Copyright

FilesExpand file tree

ipip-0445.md

Latest commit

History

ipip-0445.md

File metadata and controls

Summary

Motivation

Detailed design

Design rationale

User Benefit

Compatibility

Alternatives

Why not dag-scope=skip-raw-blocks ?

Why not CAR content type parameter ?

Why not generic skip-leaves that skips all leaves, not just raw blocks?

Security

Test fixtures

Copyright

Why not `dag-scope=skip-raw-blocks` ?

Why not generic `skip-leaves` that skips all leaves, not just `raw` blocks?