| title | IPIP-0445: Option to Skip Raw Blocks in Gateway Responses | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date | 2023-10-09 | ||||||||||||||||||||||||||
| ipip | open | ||||||||||||||||||||||||||
| editors |
|
||||||||||||||||||||||||||
| relatedIssues | |||||||||||||||||||||||||||
| order | 445 | ||||||||||||||||||||||||||
| tags |
|
Introduce skip-raw-blocks flag for the :cite[trustless-gateway].
Allow clients to read a stream which only contain proofs in a bottom heavy
graph using raw codec for it's leaves.
Usefull for UnixFS for features like webseeds (ipfs/specs#444), where metadata about a DAG is fetched from a trustless gateway, but the actual raw data can be fetched from any source that supports either trustless gateway specification, or plain HTTP Range Requests, allowing for trustless and verifiable data retrieval from plain HTTP (non-IPFS) data sources.
The skip-raw-blocks URL query parameter on :cite[trustless-gateway]
allows clients to download an entity except blocks with the multicodec
raw (0x55).
- When set to
y, the parameter instructs the gateway not to transmit blocks referenced with a CID with therawmulticodec. - If set to
n, or left unspecified, there is no special handling ofrawmulticodec blocks (the existing default behavior remains the same).
Importantly, unless explicitly specified as y, the default operational
mode of the gateway MUST assume the value of skip-raw-blocks to be n.
Implementing the skip-raw-blocks parameter offers several benefits to users:
-
Verification Flexibility: Clients can verify out-of-band (OOB) received files in their deserialized form without necessitating the transmission of raw blocks from the gateway.
-
Incremental Download: Clients can incrementally download files in deserialized forms from non-IPFS servers. Allowing applications to share distribution for IPFS and non-IPFS clients.
-
Efficient Block Discovery: With the
skip-raw-blocksoption enabled, clients can quickly discover numerous candidate blocks without being bottlenecked by the gateway's transmission of raw blocks. -
Non-IPFS HTTP Mirrors Become Useful: Legacy data that is already exposed over HTTP in deserialized form can now act as sources for specific block byte ranges, without having to support any IPFS specific APIs. Plain HTTP Range Requests can be used for fetching remaining raw block data, and the metadata read via
skip-raw-blocks=yis enough for a client to verify the remaining raw block byte ranges fetched from non-IPFS system match expected CIDs.
Setting the default value of the skip-raw-blocks parameter to n ensures
backward compatibility with existing clients and systems that are unaware
of this new flag.
An alternative approach would be to request blocks individually. However, it adds extra round trips and more per HTTP request overhead and thus is undesirable.
The existing dag-scope parameter determines the overall range of blocks to retrieve,
while skip-raw-blocks selectively filters specific blocks across all scopes and ranges.
Combining them under one parameter would restrict their combined utility.
For example:
- A client is streaming a video from a webseed and the user seeks through the
video, then the client would send
dag-scope=entity&entity-bytes=42:1337withskip-raw-blocks=yto download the proofs for the required section of the video, and then fetches remaining raw data byte ranges from a faster CDN. - A client is verifying an OOB transferred directory in deserialized form,
then
dag-scope=allwithskip-raw-blocks=ymakes sense.
CAR content type's
(application/vnd.ipld.car)
optional parameters like order and dups impact the way data is represented
when returned as a CAR stream, but does modify the scope of the data itself.
Does not add nor subtract data from the response.
The scope of the data is controlled by URL content path and optional
dag-scope, entity-bytes URL parameters. This is where skip-raw-blocks
belongs.
This is not just a matter of aesthetics: the URL path and query parameters
allow for caching of different subsets of a DAG in a way that is interoperable
with existing HTTP tools and clients, minimizes risk of caching incomplete DAG
response due to HTTP cache misconfiguration. Thanks to skip-raw-blocks being
in the URL query, we ensure CAR responses without raw blocks will be cached
under different key than full responses (just like already existing dag-scope
and entity-bytes).
Prevention of amplification attacks and efficient server operation.
By utilizing the raw (0x55) codec servers can trivially determine whether
to fetch or skip a block without having to fetch it to learn any new
information.
If we framed this feature around skipping all leaf nodes, that would require server to fetch the leaves to learn if they have any child nodes. This would force server to fetch data that is never returned to the client.
Although skip-raw-blocks is more limited and not able to handle UnixFS files
chunked without --raw-leaves option, it allows both the client and server to
trivially verify a block must not be fetched. Preventing issues of
Amplification where a server could need to fetch multiple orders more data than
the client when executing the request.
This IPIP does not impact security model of trustless gateway.
:::issue
TODO: update below section with CIDs or CARs from conformance tests
Scenarios we should check:
- request for
/ipfs/cidwhere CID hasrawcodec MUST return HTTP 400 (Bad Request) - reuse existing UnixFS DAG that has raw-leaves, request it with
skip-raw-blocks=n, confirm the response includes expected raw leaves' CIDs - create a new CAR fixture that only have non-raw blocks. Request it with
skip-raw-blocks=y, confirm the response includes expected CIDs and does not include raw blocks referenced by parents.- important part is creating CAR fixture by hand, and ensure the raw blocks are
NEVER announced anywhere (generate fixture with random data, add to ipfs
with raw-leaves option, then export DAG without
rawblocks (use go-car'sfilteror similar)- Why? This goes extra mile, but ensures every conformant gateway
implementation is not doing useless work of fetching raw blocks which are
not required for fulfilling
skip-raw-blocks=yrequests). We did similar thing forentity-bytesand it was the only way we could show bugs in Saturn project's cache implementation at the time.
- Why? This goes extra mile, but ensures every conformant gateway
implementation is not doing useless work of fetching raw blocks which are
not required for fulfilling
- important part is creating CAR fixture by hand, and ensure the raw blocks are
NEVER announced anywhere (generate fixture with random data, add to ipfs
with raw-leaves option, then export DAG without
:::
Copyright and related rights waived via CC0.