Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/http-gateways/path-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ editors:
name: Protocol Labs
url: https://protocol.ai/
xref:
- url
- rfc3986
- trustless-gateway
- subdomain-gateway
- dnslink-gateway
Expand Down Expand Up @@ -511,7 +511,7 @@ When deserialized responses are enabled,
and no explicit response format is provided with the request, and the
requested data itself has no built-in content type metadata, implementations
SHOULD perform content type sniffing based on file name
(from :ref[url] path, or optional [`filename`](#filename-request-query-parameter) parameter)
(from URI path, or optional [`filename`](#filename-request-query-parameter) parameter)
and magic bytes to improve the utility of produced responses.

For example:
Expand Down
345 changes: 345 additions & 0 deletions src/ipips/ipip-0518.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,345 @@
---
title: "IPIP-0518: URIs in Routing V1 API via Generic Schema"
date: 2026-02-11
ipip: proposal
editors:
- name: Marcin Rataj
github: lidel
url: https://lidel.org/
affiliation:
name: Shipyard
url: https://ipshipyard.com
thanks:
- name: Adin Schmahmann
github: aschmahmann
affiliation:
name: Shipyard
url: https://ipshipyard.com
relatedIssues:
- https://github.com/ipfs/specs/issues/192
- https://github.com/ipfs/specs/issues/496
- https://github.com/multiformats/multiaddr/issues/63
- https://github.com/multiformats/multiaddr/issues/87
- https://github.com/ipshipyard/roadmaps/issues/15
- https://github.com/ipfs/specs/pull/518
order: 518
tags: ['ipips']
xref:
- rfc3986
---

## Summary

Introduce a `generic` record schema for the Delegated Routing V1 HTTP API that supports URIs alongside multiaddrs in the `Addrs` field. Unlike the `peer` schema, which is tied to libp2p PeerIDs and multiaddrs, `generic` supports arbitrary identifiers and address formats including HTTP(S) URLs and other URI schemes. This enables HTTP-only providers, WebSeeds, and other non-libp2p use cases without breaking existing clients.

## Motivation

The Delegated Routing V1 HTTP API currently requires all provider records to use the `peer` schema, which mandates a libp2p PeerID as the identifier and multiaddrs as the address format.

Many IPFS services are primarily accessible via HTTP(S) and do not use libp2p:

- IPFS Gateways (path and subdomain)
- HTTP-based content providers and pinning services
- WebSeed providers

Converting HTTP(S) URLs to multiaddrs is lossy and error-prone:

- HTTP URLs must be encoded as `/dns4/example.com/tcp/80/http` or `/dns4/example.com/tcp/443/https`
- URL-to-multiaddr round-trips are not lossless (see [multiaddr#63](https://github.com/multiformats/multiaddr/issues/63))
- Multiple implementations handle edge cases differently (default ports, paths, fragments, HTTP basic-auth)
- A single `https://example.com` URL supports HTTP/1.1, HTTP/2, and HTTP/3, but multiaddr requires separate entries per transport
- Requiring multiaddr libraries raises the barrier for lightweight HTTP-only clients

A new schema decouples provider records from libp2p, allowing the ecosystem to experiment with HTTP-only providers, WebSeeds, alternative protocols, and other novel concepts without vendor lock-in -- no need for explicit entries in [multicodec table.csv](https://github.com/multiformats/multicodec/blob/master/table.csv) or being blocked by ecosystem-wide adoption of a new addressing scheme. Existing clients remain unaffected.

## Detailed design

### Generic Schema

A new `generic` schema is added to the [Known Schemas](https://specs.ipfs.tech/routing/http-routing-v1/#known-schemas) section of the Routing V1 spec.

```json
{
"Schema": "generic",
"ID": "did:key:z6Mkm1...",
"Addrs": ["https://trustless-gateway.example.com", "/ip4/1.2.3.4/tcp/5000"],
"Protocols": ["transport-ipfs-gateway-http"]
}
```

Fields:

- `ID`: a string identifier for the provider. Unlike the `peer` schema, this is not restricted to libp2p PeerIDs. Implementations SHOULD use identifiers that are self-authenticating (e.g. `did:key`), sufficiently unique, and less than 100 bytes.
- `Addrs`: an optional list of addresses as strings. Addresses are duck-typed by prefix:
- If a string starts with `/`, it is parsed as a [multiaddr](https://github.com/multiformats/multiaddr)
- Otherwise, it is parsed as a URI per :cite[rfc3986]
- Clients MUST skip addresses they cannot parse or do not support and continue with remaining entries. This includes URIs with unrecognized schemes, unsupported multiaddrs, or all multiaddrs if the client only supports URIs.
- `Protocols`: an optional list of transfer protocol names associated with this record. Protocol names are opaque strings with a max length of 63 characters, established by rough consensus across compatible implementations per the [robustness principle](https://specs.ipfs.tech/architecture/principles/#robustness). This is a deliberate departure from the `peer` schema, which suggested protocol names require registration in [multicodec table.csv](https://github.com/multiformats/multicodec/blob/master/table.csv), creating an IANA-like chokepoint for adopting new protocols. The `generic` schema removes this gatekeeping: anyone can return novel addresses and protocol names without external approval, and clients that do not recognize them simply skip them without breaking.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment about allowing people to define known protocols here similar to how we have known schemas (either way we'll need a place to specify the names, metadata, and meaning associated with different protocol names)?


Servers and caching proxies MUST act as pass-through and return `Addrs` and `Protocols` as-is, unless explicitly filtered by the client via `?filter-addrs` or `?filter-protocols` query parameters.

To allow for protocol-specific fields and future-proofing, the parser MUST allow unknown fields, and clients MUST ignore fields they do not recognize.

The total serialized size of a single `generic` record MUST be less than 10 KiB.

### Supported URI Schemes

Initially, `https://` SHOULD be supported as the primary URI scheme.

Other URI schemes (e.g. `magnet:`, `foo://`, or any future scheme) MAY appear in `Addrs`. Clients MUST skip URIs with schemes they do not support. This ensures new URI schemes can be introduced over time without breaking existing clients or requiring central coordination.

### URI Requirements

URIs in the `Addrs` field:

- MUST be absolute URIs (not relative references)
- MUST include the scheme (e.g. `https://`, `magnet:`)
- MAY include paths, query parameters, or fragments, but clients MUST handle their presence defensively
- SHOULD point to endpoints that support protocols listed in the `Protocols` field

### Interaction with `filter-addrs`

The `filter-addrs` query parameter from [IPIP-0484](https://specs.ipfs.tech/ipips/ipip-0484/) applies to `generic` records the same way it applies to `peer` records:

- Multiaddr addresses (strings starting with `/`) are filtered by multiaddr protocol name.
- URI addresses (strings not starting with `/`) are filtered by URI scheme name. For example, `?filter-addrs=https` matches `https://example.com`.
- This is naturally consistent: `https` is both a multiaddr protocol name (matching `/dns/example.com/tcp/443/https`) and a URI scheme (matching `https://example.com`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to be explicit here about how filtering previously related to multiaddr components would apply to others (e.g. does must tls filtering be applied to permit https://, how about tcp and https:// given that HTTP could run over TCP or QUIC even though it's standard to try TCP before upgrading to QUIC)? I see the tcp example below answers this, but it might help to be explicit in the definition here

- `?filter-addrs=unknown` includes `generic` records with no known addresses.
- If no addresses remain after filtering, the `generic` record is omitted from the response.

### Relationship to Peer Schema

The `peer` schema remains unchanged. It represents a libp2p node identified by PeerID with multiaddr addresses. The `generic` schema is complementary:

| | `peer` schema | `generic` schema |
|---|---|---|
| `ID` | libp2p PeerID | any string (e.g. `did:key`) |
| `Addrs` | multiaddrs only | multiaddrs and/or URIs |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth calling out that in the peer schema the multiaddrs all had an implied /p2p/<the peerID> appended whereas that's not true in the generic schema. If you use the generic schema for libp2p multiaddrs the /p2p/<the peerID> needs to be explicitly defined.

| use case | libp2p-native providers | HTTP-only, WebSeed, custom protocols |

Routing servers MAY emit both schema types for the same provider:

```json
{
"Providers": [
{
"Schema": "peer",
"ID": "12D3KooW...",
"Addrs": ["/ip4/192.168.1.1/tcp/4001"],
"Protocols": ["transport-bitswap"]
},
{
"Schema": "generic",
"ID": "did:key:z6Mkm1...",
"Addrs": ["https://trustless-gateway.example.com"],
"Protocols": ["transport-ipfs-gateway-http"]
}
]
}
```

## Design rationale

### Why a new schema instead of modifying Peer

The `peer` schema has a hard dependency on libp2p: `ID` is a PeerID and `Addrs` are multiaddrs. Existing clients parse every entry in `Addrs` as a multiaddr. Introducing URIs into the `Addrs` field of the `peer` schema would cause parse errors in all third-party clients that have not been updated, breaking backward compatibility.

Previous rollouts of new multiaddr protocols (`/quic-v1`, `/webtransport`, `/webrtc-direct`) did not break clients because those strings still parsed as valid multiaddrs, even when the client could not dial them. URIs are not multiaddrs and will fail multiaddr parsing.

By introducing a new schema, we leverage the existing requirement that clients MUST skip records with unknown schemas:

- Existing clients continue to work, only seeing `peer` records they already understand
- Updated clients opt in to `generic` records at their own pace
- No flag day or coordinated upgrade required

### Incremental migration

Libp2p-native peers continue using the `peer` schema as-is. The migration only impacts providers that are not actual libp2p peers -- such as HTTP-only Trustless Gateways that today must be shoehorned into the `peer` schema with a synthetic PeerID. During the transition period, routing servers can return both `peer` and `generic` records for the same provider. Clients that understand `generic` use the richer address information; others fall back to `peer` records with the synthetic PeerID.

### Decoupling from libp2p

The `generic` schema removes the hard requirement on libp2p PeerIDs and multiaddrs. This lowers the barrier for building lightweight IPFS clients that only speak HTTP, and enables experimentation with new provider types (WebSeeds, S3-backed storage) without requiring changes to the libp2p specification or multiaddr registry.

## User benefit

### For developers

- HTTP-only providers and HTTP-only stacks can be built without multiaddr encoding/decoding libraries. Lower cognitive overhead: everyone familiar with `https://` URIs knows how to work with them.
- Alternative URI schemes are also easier to integrate than new multiaddr protocols
- Lightweight HTTP-only IPFS clients become feasible without re-implementing libp2p concepts

### For service providers

- HTTP(S) endpoints advertised directly as URLs
- Custom address formats supported without multiaddr registry changes
- Protocol-specific metadata via extra fields

### For end users

- Lower barrier for new client implementations increases ecosystem diversity
- HTTP-only providers improve compatibility with web-based IPFS implementations

## Compatibility

### Backward compatibility

Fully backward compatible. Existing clients skip `generic` records because they use an unknown schema. The `peer` schema is unchanged.

### Forward compatibility

Unknown fields MUST be ignored by clients. New address formats and protocol-specific fields can be added without breaking existing implementations.

URIs in `Addrs` are not limited to a specific scheme. Clients parsing a `generic` record MUST skip addresses with unrecognized URI schemes, which allows the ecosystem to introduce addressing beyond `https://` without requiring coordination or simultaneous upgrades.

### Migration path

1. Routing servers emit `generic` records alongside existing `peer` records
2. Clients add support for `generic` schema at their own pace
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is the expectation the libp2p multiaddr peers are still frequently returned as peer records longer term? Could see either way, but trying to understand the recommendation for routing-v1 server implementers.
  2. For records that are duplicated between peer and generic responses do the clients need any metadata hint noting which ones are duplicates or is that sufficiently obvious for the clients to figure out?

3. HTTP-only providers that previously required multiaddr conversion can switch to `generic` with native URI addresses
Comment on lines +197 to +198
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: The providers don't necessarily have to switch anything the routing-v1 servers would need to switch something, the providers might only if the routing-v1 server, or the routing system(s) behind it care


## Security

### URI validation

Implementations SHOULD validate URIs:

- Verify the URI scheme is supported (e.g. `https://`)
- Validate URI length limits (practical limit: 2048-8192 characters)
- Apply scheme-specific rate limits where appropriate (e.g. rate-limiting HTTP requests to URIs returning non-success responses)

### HTTPS preference

For HTTP-based URIs, implementations SHOULD prefer `https://`. The `http://` scheme SHOULD only be allowed for testing and private LAN deployments, gated behind an explicit opt-in flag.

### DNS considerations

HTTP(S) URIs rely on DNS resolution. The same security considerations that apply to `/dns`, `/dns4`, and `/dns6` multiaddrs apply here:

- DNS responses can be spoofed without DNSSEC
- Clients SHOULD use secure DNS transports where available
- Certificate validation MUST be performed for HTTPS URIs on the public internet

### ID trust

The `generic` schema `ID` field is self-reported. Clients SHOULD use self-authenticating identifiers (e.g. `did:key`) and verify signatures where applicable. Reputation and resource allocation decisions SHOULD be tied to `ID`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth clarifying what's even being trusted here. IIUC from the perspective of routing-v1 clients no trust has changed at all.

  • Peer Schema:
    • When contacting a given libp2p endpoint it will only be successful if the other party has access to the private key corresponding to the peerID. There's no proof related to that peerID advertising the record, etc. in the routing-v1 response. The ID is the routing-v1 server's reporting of who announced / is responsible for this peer record.
    • When contacting an HTTPS endpoint (e.g. for the trustless-gateway records returned today) there's no proof related to the peerID at all
  • Generic Schema:
    • When contacting a given libp2p endpoint it will only be successful if the other party has access to the private key corresponding to the peerID. There's no proof related to that peerID advertising the record, etc. in the routing-v1 response. Unlike the Peer Schema the peerID is in the address instead of the ID. The ID is the routing-v1 server's reporting of who announced / is responsible for this peer record.
    • When contacting an HTTPS endpoint (e.g. for the trustless-gateway records returned today) there's no proof related to the ID at all


## Alternatives

### URIs in Peer Schema Addrs field

Adding URIs directly to the `Addrs` field of the existing `peer` schema was considered. The `peer` schema was introduced in [IPIP-0337](https://specs.ipfs.tech/ipips/ipip-0337/) and has been used in production by multiple independent implementations for years. Changing the semantics of `Addrs` from multiaddr-only to a mixed format would break all third-party clients that parse entries as multiaddrs. Unlike new multiaddr protocols which still parse as valid multiaddrs, URIs are a fundamentally different format and cause parse errors. A new schema avoids this by leveraging the existing unknown-schema-skipping behavior.

### URI-to-multiaddr conversion

The status quo requires converting HTTP URLs to multiaddrs like `/dns4/example.com/tcp/443/https`. This conversion is lossy: URI paths, fragments, query parameters, and HTTP/3 transport information are lost. Multiple implementations handle edge cases differently, leading to interoperability issues (see [multiaddr#63](https://github.com/multiformats/multiaddr/issues/63)). It also means libp2p-specific address libraries and parsers have to be implemented by every new client, increasing complexity and raising the barrier for new implementations.

### Custom multiaddr keyword arguments

Adding keyword arguments to multiaddr protocols was proposed in [multiaddr#87](https://github.com/multiformats/multiaddr/issues/87). This would increase complexity for all multiaddr implementers without addressing the fundamental desire to use standard URIs.

### Separate URI field in Peer Schema

Adding a separate `URIs` field to the `peer` schema would complicate the schema and create ambiguity about which field to check for addresses. A new schema is a cleaner separation: `peer` stays focused on libp2p peers, `generic` handles everything else.

## Test fixtures

### HTTPS-only provider

```json
{
"Providers": [
{
"Schema": "generic",
"ID": "did:key:z6Mkm1...",
"Addrs": ["https://trustless-gateway.example.com"],
"Protocols": ["transport-ipfs-gateway-http"]
}
]
}
```

### Provider with protocol-specific metadata and custom URI scheme

```json
{
"Providers": [
{
"Schema": "generic",
"ID": "did:key:z6Mkm1...",
"Addrs": ["foo://custom-storage.example.com/bucket"],
"Protocols": ["example-future-protocol"],
"example-future-protocol": {"version": 2, "features": ["foo"]}
}
]
}
```

Clients that do not recognize the `foo://` URI scheme MUST skip that address.

### Provider with opaque identifier

The `ID` field is not restricted to `did:key`. Any string identifier can be used:

```json
{
"Providers": [
{
"Schema": "generic",
"ID": "550e8400-e29b-41d4-a716-446655440000",
"Addrs": ["https://cdn.example.com"],
"Protocols": ["transport-ipfs-gateway-http", "example-future-protocol"]
}
]
}
```

### Mixed response with both schemas

```json
{
"Providers": [
{
"Schema": "peer",
"ID": "12D3KooW...",
"Addrs": [
"/ip4/192.168.1.1/tcp/4001",
"/ip4/192.168.1.1/udp/4001/quic-v1"
],
"Protocols": ["transport-bitswap"]
},
{
"Schema": "generic",
"ID": "did:key:z6Mkm1...",
"Addrs": ["https://trustless-gateway.example.com"],
"Protocols": ["transport-ipfs-gateway-http"]
}
]
}
```

### Filtering with `filter-addrs`

Given a response containing:

```json
{
"Providers": [
{
"Schema": "generic",
"ID": "did:key:z6Mkm1...",
"Addrs": ["https://provider.example.com", "/ip4/1.2.3.4/tcp/443/https"],
"Protocols": ["transport-ipfs-gateway-http"]
}
]
}
```

A request with `?filter-addrs=https` returns both addresses, because `https` matches the URI `https://provider.example.com` by URI scheme and the multiaddr `/ip4/1.2.3.4/tcp/443/https` by multiaddr protocol name.

A request with `?filter-addrs=tcp` returns only the multiaddr `/ip4/1.2.3.4/tcp/443/https`, because `tcp` does not match the URI scheme `https`.

A request with `?filter-addrs=!https` omits the record entirely, because all addresses are removed by the negative filter.

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Loading