Persisted queries#2354
Conversation
| fs.writeFileSync(queryMapFilePath, JSON.stringify(queryMapJson)); | ||
| this._reporter.reportMessage( | ||
| `Written complete queryMap file to ${queryMapFilePath}`, | ||
| ); |
There was a problem hiding this comment.
I haven’t looked into how other files are written to disk by Relay, but is it normal to use synchronous access here instead of async?
There was a problem hiding this comment.
Other areas in Relay are using sync as well for example RelayCompilerCache line 55 and CodegenDirectory line 213. I think it's possible to change this to async, although I think it won't really make a significant impact. Generally I tend to use sync for file operations than async just to avoid file locking and concurrency issues. Imho.
| @@ -0,0 +1,7 @@ | |||
| import md5 from '../util/md5'; | |||
There was a problem hiding this comment.
This file probably needs a FB header like all other files.
| @@ -0,0 +1,11 @@ | |||
| //@flow | |||
There was a problem hiding this comment.
This file probably needs a FB header like all other files.
| writeCompleteQueryMap( | ||
| allOutputDirectories: Map<string, CodegenDirectory>, | ||
| ): void { | ||
| const queryMapFilePath = `${this._config.baseDir}/queryMap.json`; |
There was a problem hiding this comment.
I’m wondering if it would make sense to take an optional path value for the --persist option to change this location?
There was a problem hiding this comment.
Would be nice, in our app we have multiple compilations going on... so I guess without that, it would overwrite existing compilation.
There was a problem hiding this comment.
@edvinerikson each one of your compilation should have a --src specified, which means a separate queryMap.json file will be created for each src. Is that your use case? If not can you please provide more details on what you mean by multiple compilations?
There was a problem hiding this comment.
I am executing them like this
"compile-gql-documents:desktop": "relay-compiler --schema ./node_modules/@leogears/leo-gateway-schema/schema.json --src ../../ --include src/common/** --include src/web/common/** --include src/web/desktop/**",
"compile-gql-documents:mobile": "relay-compiler --schema ./node_modules/@leogears/leo-gateway-schema/schema.json --src ../../ --include src/common/** --include src/web/common/** --include src/web/mobile/**",
There was a problem hiding this comment.
Gotcha. You are right in your case the query map files will override each other. I will add an option to specify an output location. Thanks for the feedback!
There was a problem hiding this comment.
I’m wondering if it would make sense to take an optional path value for the --persist option to change this location?
Done. @edvinerikson @alloy
| getWriter: getRelayFileWriter(srcDir, options.persist), | ||
| isGeneratedFile: (filePath: string) => | ||
| filePath.endsWith('.js') && filePath.includes('__generated__'), | ||
| (filePath.endsWith('.js') || filePath.endsWith('.queryMap.json')) && |
There was a problem hiding this comment.
Wondering if it would make sense to follow the *.graphql.extname filename pattern used by the other artifacts? Because as shown in the patch in this comment it wouldn’t be unthinkable to simplify artefact detection somewhat by not matching a specific extension.
There was a problem hiding this comment.
This is an interesting one. I'm not sure the query map file qualifies as a graphql file. It's a plain json file and naming it graphql could be misleading.
There was a problem hiding this comment.
Agreed that it’s a tricky question. The JS artefacts also aren’t strictly GraphQL files but also derivatives, so figured that applies to these as well.
(Just to be clear, I meant something like Foo.graphql.json)
There was a problem hiding this comment.
That's a good point actually I never thought about that. I'll go ahead with the rename. Thanks for the response!
There was a problem hiding this comment.
Wondering if it would make sense to follow the *.graphql.extname filename pattern used by the other artifacts? Because as shown in the patch in this comment it wouldn’t be unthinkable to simplify artefact detection somewhat by not matching a specific extension.
Done
There was a problem hiding this comment.
@yusinto Hrmm, I’ve been trying to integrate this PR into the build we use at Artsy and I’m finding that in our case the (single query map) json file gets loaded at runtime before the source artefact file (a .tsx file in our case). Have you been able to use this exact code in production?
Changing these filenames back to *.queryMap.json makes it work well.
|
FYI This supersedes PR #1846. |
|
The downside with this approach over #1846 is that the query ids are not configurable. Your GraphQL server has to be able to lookup the query by the MD5 hash. This may not be ideal depending on the storage mechanism you are using to persist the queries server side. |
|
@robrichard With the language plugin support in #2293 and the proposed pre-processing plugin support here #2293 (comment), it’s not a stretch imo to imagine being able to do the same for query IDs and have MD5 hashing being a sane builtin default. |
|
@alloy Yes, so it seems there is precedence now for Relay compiler allowing you to run a custom JS module. You can add another to get custom IDs in the #1846 seemed a lot cleaner because you just had to write one function to persist the query and return the ID. |
|
On my phone now, so will look more closely to code again later, but I don’t see why you would be able to do the same in this situation. Once you have the query to e.g. hash it and return an ID, can’t you also take that moment to persist it? |
|
…or did you mean when using the default builtin md5 one? |
Yes you can, but then the logic to write Just passing a persist function from a user supplied module directly to |
|
@robrichard in this pr, @devknoll specified that |
|
|
||
| This will create a matching `./__generated__/MyComponent.queryMap.json` containing the query id and the operation text of the query in the same directory. | ||
| The Relay Compiler aggregates all the generated `*.queryMap.json` into a single complete query map file at `./src/queryMap.json`. You can then use this complete | ||
| json file in your server side to map query ids to operation text. |
There was a problem hiding this comment.
Could you maybe add some details about where this would typically sit in one’s workflow?
For instance, I’m naively thinking that you’d only use the --persist option once you decide to cut a release/perform a deploy, seeing as the server needs to know about these queries and you probably don’t want to continuously do that, is that correct?
It would also be nice to have an example of what you do with the resulting query map file, i.e. describe uploading to server and serving requests that use query IDs. (This could maybe even be added to the canonical example app?)
There was a problem hiding this comment.
My use case is explained in my blog post here. My app is universal and I have hot reload setup on both the client and server. The express server imports the complete queryMap.json file and uses a middleware to match queries from graphql requests.
During development, I change my graphql queries a lot (in relay graphql tags). Each time this happens the operation text changes and hence the query id changes because it is an md5 hash of the operation text. The graphql.js and queryMap.json files get recompiled, and my express server reloads the queryMap.json and is able to continue mapping queries correctly without having to stop/start.
The blog contains details about the express middleware and a complete working example. Hopefully the example can shed more light into this.
There was a problem hiding this comment.
Gotcha.
I understand that in such a situation you basically have the JSON file in the same project and so it’s easy to continuously write the persisted queries, but in setups where the server is a different project altogether it doesn’t seem necessary to do this continuously. What’s more, in the situation I’m describing there isn’t even a good way to persist the queries on the server during development and running relay-compiler --watch.
As such, I think it would be great if persisted queries gets its own doc page where you could both include the example from your blog post for a universal app that includes the server and has a more agnostic example of a separate server that anybody can adjust to their situation.
There was a problem hiding this comment.
Have to agree with @alloy here. We have three client repos that each have their own queries, and then a 4th repo for our gql server. We'd likely set up a 5th repo to store persisted queries in (which would also be reflected in a DB).
There was a problem hiding this comment.
I understand that in such a situation you basically have the JSON file in the same project and so it’s easy to continuously write the persisted queries, but in setups where the server is a different project altogether it doesn’t seem necessary to do this continuously. What’s more, in the situation I’m describing there isn’t even a good way to persist the queries on the server during development and running relay-compiler --watch.
Right I got your point. So there are 2 ways to use --persist:
-
As a single run with
--persistwhich will produce a master queryMap.json file. This does a one time compilation and stops. It doesn't do any continuous watching or compiling. -
In combination with
--watch. If you specify--persistand--watch, then queryMap.json will continuously update with the changes you make to your queries.
If you don't need continuous updates to queryMap.json, you can just do a single run like in step 1. The watch mode is useful if like me you have the luxury of having a universal app where the server and client are in one project.
In the case where the server is a separate project and you run --persist either as a single run or in watch mode in the client, you'll still need to somehow deploy the query map file to your server. In the simplest case, this can be a git push to your server repo to update the server query map file. In more complex scenarios this can be a build step which does database updates.
As such, I think it would be great if persisted queries gets its own doc page where you could both include the example from your blog post for a universal app that includes the server and has a more agnostic example of a separate server that anybody can adjust to their situation.
That's a great suggestion. I'll document what I've said here and in my blog post in a standalone doc page specifically for persisted queries.
There was a problem hiding this comment.
As such, I think it would be great if persisted queries gets its own doc page where you could both include the example from your blog post for a universal app that includes the server and has a more agnostic example of a separate server that anybody can adjust to their situation.
Done.
Agreed. Apologies for going off on a tangent regarding plugins.
Can you expand a bit on what you mean by not changing it “much”? Do you still make customisations and, if so, what are they?
This makes sense to me, as in my naive thought you’d only run this on release time. Maybe some more information about how one would use this in practice is what’s missing? (Left a comment asking for that here.) |
@alloy apologies by being ambiguous. I should have said my persist function does not change at all. I just use the md5 hash. |
| The Relay Compiler is responsible for generating code as part of a build step which, at runtime, can be used statically. By building the query ahead of time, the client's JS runtime is not responsible for generating a query string, and fields that are duplicated in the query can be merged during the build step, to improve parsing efficiency. If you have the ability to persist queries to your server, the compiler's code generation process provides a convenient time to convert a query or mutation's text into a unique identifier, which can greatly reduce the upload bytes required in some applications. | ||
| The Relay Compiler is responsible for generating code as part of a build step which, at runtime, can be used statically. By building the query ahead of time, the client's JS runtime is not responsible for generating a query string, and fields that are duplicated in the query can be merged during the build step, to improve parsing efficiency. | ||
|
|
||
| ### Persisting queries |
There was a problem hiding this comment.
It would be nice to add a paragraph maybe about why someone would want to persist queries. There is both the performance benefits, and, security benefits if you use the persisted queries as a query whitelist.
|
|
||
| fs.writeFileSync(queryMapFilePath, JSON.stringify(queryMapJson, null, 2)); | ||
| this._reporter.reportMessage( | ||
| `Written complete queryMap file to ${queryMapFilePath}`, |
There was a problem hiding this comment.
Grammar nit: "Wrote"
Wrote complete queryMap file to ${queryMapFilePath}
Or, word order change:
Complete queryMap written to file ${queryMapFilePath}
|
@edvinerikson @alloy @MatthewHerbst I think I've done all outstanding items in this PR. Are you guys able to take another look and confirm please? Will be good to have this merged at some point soon, so the rest of the community can finally use it. It's been a long time coming. Thank you. |
| }, | ||
| body: JSON.stringify({ | ||
| queryId: operation.id, // NOTE: pass md5 hash to the server | ||
| // query: operation.text, // this is now obsolete because text is null |
There was a problem hiding this comment.
It should be noted there is a 3rd - but much more complex, way to get the queries to the server. If all you care about is performance, then you can optimistically send the ID to the server. If the server responds that it doesn't know the ID, then you can send the whole query text, which the server can then cache. The next time you send the query ID to the server, it will have it. This is how apollo-link-persisted-queries works. I don't think we need to write an implementation of this, but I think we should note in the docs here that it is possible.
There was a problem hiding this comment.
That's an interesting approach. It seems to conflict with whitelisting though? You can't optimistically send queries if you are also whitelisting queries. Perhaps I should include this point in the docs as well. Will update.
There was a problem hiding this comment.
Yeah, that's true, but it is an option if all you're looking to do is optimization. I lot of people won't be able to whitelist because they allow 3rd party queries, but this option allows them (or people who don't want to/can't update a central repository) to still cache queries.
There was a problem hiding this comment.
Gotcha. Will include in the docs. Thanks for the explanation!
| const persistOutputDir = path.dirname(persistOutput); | ||
| if (!fs.existsSync(persistOutputDir)) { | ||
| throw new Error( | ||
| `--persist-output path does not exist: ${persistOutputDir}.`, |
There was a problem hiding this comment.
It would be nice to mention this option in the docs with/near --persist
There was a problem hiding this comment.
Good point. Will update the docs. Thank you.
MatthewHerbst
left a comment
There was a problem hiding this comment.
LGTM. Just my two comments on docs. If you wanted slightly more test coverage it would be nice to see tests that tested that the new errors are thrown when you expected them to be.
| describe: | ||
| 'The json filepath where the complete query map file will be written to', | ||
| type: 'string', | ||
| default: null, |
There was a problem hiding this comment.
Is it not possible with yargs to use a single option (—persist) both with and without value?
There was a problem hiding this comment.
Is it not possible with yargs to use a single option (—persist) both with and without value?
@alloy I spent some time looking into this. If we use just a single --persist option then the param type will have to change to a string with a default of do-not-persist. Like so:
persist: {
describe: 'Use an md5 hash as query id to replace operation text',
type: 'string',
default: 'do-not-persist',
},Then the logic in the run method to detect persistence will change to this:
const persist = options.persist;
if (persist === 'do-not-persist') {
console.log('Not persisting');
} else if (!persist.trim()) {
console.log(`Persisting to default output location: ${srcDir}/queryMap.graphql.json`);
} else {
console.log(`Persisting to custom output location ${persist}`);
}The --help documentation on the console will display:
Options:
--persist Use an md5 hash as query id to replace operation text
[string] [default: "do-not-persist"]
// other params...
So it is possible to just use a single --persist flag, however I feel that it's doing a bit too much and is not as clean as separating the two logical actions as separate flags. The --help documentation in particular is confusing, because a default string of do-not-persist doesn't really make sense to users unless they analyse the relay-compiler code. We can change the default string to something else, but it's not really going to help because this default string is only used in the code to detect if the --persist flag is completely absent from the command line. Hopefully that makes sense?
There was a problem hiding this comment.
I find it unfortunate that the tooling doesn’t nicely allow for the best UI, but your explanation makes sense and I agree with your conclusion 👍
There was a problem hiding this comment.
You can remove the type option and then yargs will treat it as either a boolean if used alone or a string if passed with a value. However I agree that they should still be separate options to not overload the meaning of --persist.
MatthewHerbst
left a comment
There was a problem hiding this comment.
Looking forward to using this!!!
Done @MatthewHerbst. Just waiting for @alloy's response now re using a single |
| ``` | ||
|
|
||
| This will create a matching `./__generated__/MyComponent.graphql.json` containing the query id and the operation text of the query in the same directory. | ||
| The Relay Compiler aggregates all the generated `*.graphql.json` into a single complete query map file at `./src/queryMap.graphql.json`. You can then use this complete |
There was a problem hiding this comment.
all the generated
*.graphql.jsonfiles
| The Relay Compiler aggregates all the generated `*.graphql.json` into a single complete query map file at `./src/queryMap.graphql.json`. You can then use this complete | ||
| json file in your server side to map query ids to operation text. | ||
|
|
||
| Fore more details, refer to the [Persisted Queries section](./persisted-queries.html). |
There was a problem hiding this comment.
For more details
(drop the e after For)
|
|
||
| ### Persisting queries | ||
| The Relay Compiler can convert a query or mutation's text into a unique identifier during compilation. This can greatly reduce the upload bytes required in some applications. | ||
| Using a unique identifier in place of your query or mutation also allows you to whitelist operations that are allowed on your server which improves security. |
There was a problem hiding this comment.
Should this expand a little more on how it may improve security so people can decide if that applies to them? (Possibly by linking to a good article on query depth as a possible DOS vector.) Otherwise it may just leave the reader with uncertainty.
There was a problem hiding this comment.
There's two parts to that:
- Security against internal DOS
- Security against full schema introspection (which may or may not be undesirable)
There was a problem hiding this comment.
Good point and done
|
|
||
| * `src/Queries/__generated__/DictionaryQuery.graphql.json` | ||
|
|
||
| Only one query map json is generated in this instance because only concrete queries can be persisted. Fragments are not persisted. |
There was a problem hiding this comment.
query map json file
Not sure if ‘a json’ is used as equivalent of a JSON file nowadays, just reads a little nicer to me, but I’m also a non-native reader/writer so 🤷♂️
| expect(process.exit).toBeCalled(); | ||
| expect(process.exit.mock.calls[0][0]).toEqual(1); | ||
| }); | ||
| }); No newline at end of file |
There was a problem hiding this comment.
Only just noticed that you added the first bin tests, awesome! 👏
Nice one @alloy. Re renaming queryId to Document id, sure I'll make the change. There are also conflicts which I need to resolve because the PR has been sitting for a while. It might take a while for me to fix it. |
|
@yusinto have you had some time to look into this? Any update? Eagerly waiting for this. :) |
|
Hi Im off in South America for the next two weeks. Ill get to this when I
return.
…On Tue, 5 Jun 2018 at 11:35 pm, Soney Mathew ***@***.***> wrote:
@yusinto <https://github.com/yusinto> have you had some time to look into
this? Any update? Eagerly waiting for this. :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2354 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABhO9TAwl4wd5-6XWOiahT5TxsRHCOp_ks5t51wfgaJpZM4SWq42>
.
|
|
A lot of good work went into this, but I think it's unlikely to be merged as-is. It adds too many assumptions about how persisting queries should work. For example, Facebook wouldn't be able to use a queryMap file because they have far too many persisted queries. Even more importantly, they need to support every query that has ever been persisted since some people never update their apps. Persisting queries is already supported in Relay Modern in a pluggable way, but it requires a lot of manual effort and basically writing your own RelayCompilerBin. I think a better approach here would be to leverage WriterConfig/ReaderConfig and allow users to easily use their own writerConfigs and readerConfigs when invoking relay. Then you could provide a custom writerConfig that does its work from the outside instead of adding new assumptions to core files like |
Correct me if I’m wrong, but doesn’t Facebook already use its own RelayCompilerBin? That renders the former point moot imo. The point of this PR is mostly to offer a simple solution that will work for many out-of-the-box or just starting. Users that have more demands can still do so, like they do today. |
|
Agreed. Relay needs to start shipping features with default options that work for most users. It doesn't need to support every use case out of the box, and advanced users should always be able to replace modular functionality (such as persisted queries) with their own solutions. Without sensible defaults that are easy to use (as in don't require a lot of complicated setup and config) more and more users, including myself, are going to feel the draw of switching to Apollo simply due to lack of feature parity. |
|
|
||
| ```js | ||
| import Express from 'express'; | ||
| import expressGraphl from 'express-graphql'; |
There was a problem hiding this comment.
nit: typo! correct to expressGraph"Q"L :)
|
@kassens any update on this? |
|
@johntran @yusinto Btw, this request for change is still outstanding #2354 (comment). If/when you do that, could you also maybe squash all the changes into a single commit? That’s going to happen anyways when FB merges it and in the meantime it makes rebasing/merging it into fork branches a lot easier 🙏 |
|
I've created a PR with @alloy's suggestions to @yusinto's Relay repo. This merges the latest master (Relay 1.7.0.rc-1) and renamed all instances of queryId with documentId (only in files where the persisted query PR added). I am not confident in my rebasing skills, but I definitely want to learn. I would want to rebase all the changes in the persisted query branch and squash them; how would I do that? Any guides you can show me? I just finished a product launch, so I'm able to dedicate some time every week towards updating this PR. |
|
@jstejada @alunyov @kassens I think we are ready for another review Thanks @soneymathew for the detailed rebase & squash instructions :). I think @yusinto is the only one that can affect the PR branch with the rebase & squash right? |
|
Apologies for the delay @soneymathew @alloy @johntran I have added you all as collaborators to this PR because I might not be able to spend time on this quickly. |
Refactored md5 to its own util file. Fixed md5 usage and exports. Fixed tests. Ran flow and prettier. Relocate persistQuery test file to the tests folder. Renamed with '-test' suffix to be consistent. Fixed bug where persistQuery is not imported correctly. Renamed _persistQuery param to persistQuery in writeRelayGeneratedFile. Fixed bug where subsequent updates override the entire queryMap.json file. Restructured queryMap.json to include node name. Fixed queryMap to update on watch. Moved persist logic to RelayFileWriter. Fixed tests. Minor code cleanup. Changed console log to use reporter. Added cli help text. Added tests. Fixed bug where reportError does not specify module name. Updated doco Documentation improvements. Added fb headers. Fix json stringify so query maps are human readable. Change *.queryMap.json to *.graphql.json. Updated docs re advantages of using persisted queries. Minor grammatical improvements on logging. Added a new section under Guides for persisted queries. WIP Added Persisted Queries standalone section Updates to doco Added persisted queries section link Added more documentation. Added tests for RelayCompilerBin Improve docs on server usage. More doco improvements. [persisted queries] Revert file extension to `.queryMap.json` Having `foo.graphql.json` and `foo.graphql.other-ext` next to each other can lead to undefined results depending on packager and presumably the order supported file extensions have been defined. For instance, using TypeScript and the React Native packager (Metro), the `.json` file would get loaded at runtime by Relay rather than the expected `.ts` file. Fix merge conflicts and failing tests Remove package-lock.json Spelling correction Fixed lint warning. Remove relayRuntimeModule from Persist Query tests. Rename config variable to use writerConfig. Replace all instances of `queryId` with `documentId` Refactored md5 to its own util file. Fixed md5 usage and exports. Fixed tests. Ran flow and prettier. Relocate persistQuery test file to the tests folder. Renamed with '-test' suffix to be consistent. Fixed bug where persistQuery is not imported correctly. Renamed _persistQuery param to persistQuery in writeRelayGeneratedFile. Fixed bug where subsequent updates override the entire queryMap.json file. Restructured queryMap.json to include node name. Fixed queryMap to update on watch. Moved persist logic to RelayFileWriter. Fixed tests. Minor code cleanup. Changed console log to use reporter. Added cli help text. Added tests. Fixed bug where reportError does not specify module name. Updated doco Documentation improvements. Added fb headers. Fix json stringify so query maps are human readable. Change *.queryMap.json to *.graphql.json. Updated docs re advantages of using persisted queries. Minor grammatical improvements on logging. Added a new section under Guides for persisted queries. WIP Added Persisted Queries standalone section Added persisted queries section link Added tests for RelayCompilerBin [persisted queries] Revert file extension to `.queryMap.json` Having `foo.graphql.json` and `foo.graphql.other-ext` next to each other can lead to undefined results depending on packager and presumably the order supported file extensions have been defined. For instance, using TypeScript and the React Native packager (Metro), the `.json` file would get loaded at runtime by Relay rather than the expected `.ts` file. Remove relayRuntimeModule from Persist Query tests. Rename config variable to use writerConfig. Replace all instances of `queryId` with `documentId`
659b9e0 to
6aa4277
Compare
|
@yusinto done! Please review. Change summary
|
|
Hello, can you please share with us what is blocking this PR? Thank you. :) |
|
We think the changes to Relay compiler already provides a mechanism to persistQuery relay/packages/relay-compiler/codegen/writeRelayGeneratedFile.js Lines 92 to 94 in 47e74eb when passed in as a WriterConfig When time permits I have agreed to explore an approach to externalize this change as a standalone package that implements, mostly the current direction from this PR, but can be passed in via RelayCompilerBin as part of the writerConfig In a nutshell the goal is to provide a default implementation of In the near term we aim for this PR to be scoped to just emit single files per persisted query as we work on #2518 in parallel to integrate via that approach |
|
cc @alunyov Thoughts on this? Seems like it would be ideal to avoid creating query "map" files adjacent to every generated file only to later accumulate that data into a single map at the top level - and even then, only write that file if the user opts into creating it. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@josephsavona has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@josephsavona What are your thoughts on @soneymathew's last comment? |
|
@alloy I’ve imported this but am changing the implementation a bit to emit only a single aggregated query file with all persisted queries. It should be easy to later extract the logic if folks prefer that, but I want to get something landed to at least unblock folks and give a point to build from. |
|
Sounds great, thanks @josephsavona 👍 |
Hi guys, here's my attempt at implementing persisted queries for relay modern. Please review and let me know if it's ok.
Thanks!