Skip to content

[WIP] DB backend hooks#37

Closed
vqhuy wants to merge 18 commits into
masterfrom
db
Closed

[WIP] DB backend hooks#37
vqhuy wants to merge 18 commits into
masterfrom
db

Conversation

@vqhuy
Copy link
Copy Markdown
Member

@vqhuy vqhuy commented Jul 27, 2016

  • Resolve Add hooks for storage backend #4
  • This pull uses a key-value db interface to store PAD, STR, Policies, MerkleTree. The storing data is formatted as the comments of StoreToKV methods.
  • All source files related to storage backend have postfix kv
  • The library provides APIs to reconstruct the PAD from the latest STR from the db (e.g., when the key server restarts), and to reconstruct a part of tree when looking through the db for specific epoch in lookup function.

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Jul 27, 2016

There're some issues with this db hooks implementation:

  • We allow the developers to supply their own policy which is implemented Policies interface (Allow the developer to supply her own policy #6). Thus, the developers have to use exactly the policies type that they have been used before to reconstruct the PAD or STR, for example. If the developers misused in this case, the reconstructed PAD or STR would break the hash chain.
  • When looking through the db for specific STR, if we want to construct the tree partially based on the lookup index, then we have to somehow reconstruct the policies first (to compute the lookup index). Thus, the developers also need to provide exactly the policies type which he is used in that epoch. It may lead to another issue of policies management?

I would like to have your advices @masomel @arlolra @liamsi !
Thanks!

@masomel
Copy link
Copy Markdown
Member

masomel commented Jul 27, 2016

Looking through the ac93761 code, I don't quite understand why the policies need to be passed into str.LoadFromKV() for the reconstruction. Maybe I'm missing something, but why can't policies.LoadFromKV() not just find the appropriate policy entry in the DB with policies := new(Policies) as is done with str? It seems that the policy.serializeKvKey() only needs the policy identifier and the epoch to find the policy entry in the DB. I think we can also add more mandatory fields to the policy interface, e.g. the LibVersion and the hashID to leave less room for developers to do weird things when they're customizing the policy. My idea for a developer-supplied policy was really more about allowing them to set the policy fields we give them to their own values, and less about letting them implement new policy types.

But coming back to your second point, could we give the developer the option to store the policy type itself in the DB?

Looking at the code, I'm also wondering if all of the Load() and Store() methods need to be interface methods, or if it would make more sense to have methods such as LoadStrFromKV(), LoadPolicyFromKV() etc and make these part of the kv package instead. Looking through the code, it seemed odd to me to have a str.LoadFromKV() call when the str isn't really anything yet.

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Jul 28, 2016

My idea for a developer-supplied policy was really more about allowing them to set the policy fields we give them to their own values, and less about letting them implement new policy types.

It's much more clear. Thanks!

Maybe I'm missing something, but why can't policies.LoadFromKV() not just find the appropriate policy entry in the DB with policies := new(Policies) as is done with str?

Because the Policies is just an interface.

I think we can also add more mandatory fields to the policy interface, e.g. the LibVersion and the hashID to leave less room for developers to do weird things when they're customizing the policy.

Yes. I already did it in af4ea71051d25191b555d96820b3fe9d964fd219, but it doesn't belong to this branch. Hmm, I guess we should review and merge #35 first.

Looking at the code, I'm also wondering if all of the Load() and Store() methods need to be interface methods, or if it would make more sense to have methods such as LoadStrFromKV(), LoadPolicyFromKV() etc and make these part of the kv package instead.

Yes. I'm just waiting for you to say that. I would like to do it, too.

@masomel
Copy link
Copy Markdown
Member

masomel commented Jul 28, 2016

Maybe I'm missing something, but why can't policies.LoadFromKV() not just find the appropriate policy entry in the DB with policies := new(Policies) as is done with str?

Because the Policies is just an interface.

Ok, so how the Policies are set from the DB is determined by the specific Load() function of the interface. Got it.

Yes. I already did it in af4ea71, but it doesn't belong to this branch. Hmm, I guess we should review and merge #35 first.

Ok, I'll go take a look. Agreed, let's get that done first.

Yes. I'm just waiting for you to say that. I would like to do it, too.

Cool! Feel free to just suggest it next time if you would like to make changes like these :)

# Conflicts:
#	merkletree/policy.go
@masomel masomel force-pushed the master branch 3 times, most recently from 25a6925 to 2f704cf Compare July 29, 2016 20:03
@masomel
Copy link
Copy Markdown
Member

masomel commented Aug 4, 2016

Looking at the code, I'm also wondering if all of the Load() and Store() methods need to be interface methods, or if it would make more sense to have methods such as LoadStrFromKV(), LoadPolicyFromKV() etc and make these part of the kv package instead.

I've been thinking more about this, and maybe we should move all *kv.go modules to the kv package.

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Aug 4, 2016

I've been thinking more about this, and maybe we should move all *kv.go modules to the kv package.

Yup, totally agree. I'm doing it :)

@vqhuy vqhuy changed the title DB backend hooks [WIP] DB backend hooks Aug 5, 2016
@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Aug 8, 2016

I've been thinking more about this, and maybe we should move all *kv.go modules to the kv package.

I spent last few days to work on this. But it seems not to be feasible. If we want to go this way, we have to export fields that we really don't want to do. (e.g., see https://github.com/coniks-sys/coniks-go/blob/cc8b74db472f53a621ac89c05d41ba4121b891a1/merkletree/nodekv.go). What do you think?

@masomel
Copy link
Copy Markdown
Member

masomel commented Aug 8, 2016

I spent last few days to work on this. But it seems not to be feasible. If we want to go this way, we have to export fields that we really don't want to do.

Hmm, this is a good point. At least for node it seems like the main issue comes from the serialization in storeToKV() that needs access to those fields. Would it be possible to export Serialize() and Deserialize() functions that could be used in the KV module instead?

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Aug 9, 2016

Would it be possible to export Serialize() and Deserialize() functions that could be used in the KV module instead?

But storeToKV uses a specific scheme of Serialize() and Deserialize(), thus we cannot have a general Serialize() and Deserialize() at all.

@masomel masomel added this to the 0.1.0 milestone Sep 1, 2016
@masomel
Copy link
Copy Markdown
Member

masomel commented Sep 19, 2016

@c633 Is there anything I can help with here to get this moving again?

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Sep 20, 2016

@masomel I'm working on this. I'm trying to implement it in several ways and do benchmark to compare them. Code will be pushed today.

@masomel
Copy link
Copy Markdown
Member

masomel commented Sep 20, 2016

I saw your email, I didn't realize you were benchmarking the various implementations. I don't want to rush you. But I have some time this week to help with coding, so if there was any task that you haven't gotten to yet and would want help on, I'd be available work on it.

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Sep 20, 2016

But I have some time this week to help with coding, so if there was any task that you haven't gotten to yet and would want help on, I'd be available work on it.

Thanks! Maybe we can start reviewing this pull.

I tried to implement the encoding/decoding using encoding/gob (see branch kvdb). Here are the output of the Benchmarks of 2 implementations:

# db branch
⇒  go test -bench=BenchmarkStorePAD100K -benchmem -cpuprofile cpu.out -memprofile mem.out
BenchmarkStorePAD100K-4            1    2528018579 ns/op    1535363120 B/op  2461414 allocs/op
PASS
ok      github.com/coniks-sys/coniks-go/merkletree  93.763s

⇒  go test -bench=BenchmarkLoadPAD100K -benchmem -cpuprofile cpu.out -memprofile mem.out
BenchmarkLoadPAD100K-4             1    2135054242 ns/op    376952096 B/op   5977039 allocs/op
PASS
ok      github.com/coniks-sys/coniks-go/merkletree  96.130s
# kvdb branch (using gob)
⇒  go test -bench=BenchmarkStorePAD100K -benchmem -cpuprofile cpu.out -memprofile mem.out
BenchmarkStorePAD100K-4            1    6552359956 ns/op    3667490784 B/op 11409518 allocs/op
PASS
ok      github.com/coniks-sys/coniks-go/merkletree  98.109s

⇒  go test -bench=BenchmarkLoadPAD100K -benchmem -cpuprofile cpu.out -memprofile mem.out
BenchmarkLoadPAD100K-4             1    15619991406 ns/op   2927011456 B/op 69655546 allocs/op
PASS
ok      github.com/coniks-sys/coniks-go/merkletree  114.171s

So I think I will stick with this implementation and start re-organizing the source code (I still have no idea how to put *kv.go into a separate package).

Comment thread README.md
- ``client``: Client-side protocol cgo hooks
- ``crypto``: Cryptographic algorithms and operations
- ``merkletree``: Merkle prefix tree and related data structures
- ``storage``: DB hooks for storage backend (currently the library supports key-value db only)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is duplicate (see L27)

return m, nil
}

func (m *MerkleTree) reconstructTree(db kv.DB, parent MerkleNode, epoch uint64, prefixBits []bool) (MerkleNode, error) {
Copy link
Copy Markdown
Member

@masomel masomel Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why reconstructTree needs to be a method of MerkleTree? None of m's fields are used in this function.

} else {
parent.(*interiorNode).leftChild = n
}
switch n.(type) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this switch needs a default case to throw an error in case we encounter some malformed data.


wb := db.NewBatch()
m1.StoreToKV(1, wb)
err = db.Write(wb)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why db.Write(wb) isn't called from within m.StoreToKV()?


wb := db.NewBatch()
m1.StoreToKV(1, wb)
err = db.Write(wb)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as L34.

}

ap := m2.Get(index1)
if ap.Leaf.Value() == nil {
Copy link
Copy Markdown
Member

@masomel masomel Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only condition that indicates that key1 cannot be found in the tree? I'm wondering if we should also be checking the leaf node type and the leaf's index here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for L58, L105, and L120.

Comment thread merkletree/node.go
}

func (n *userLeafNode) isEmpty() bool {
func (n *interiorNode) isEmpty() bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason the node types were flipped between here and L156?

Comment thread merkletree/nodekv.go

func deserializeNode(buf []byte) MerkleNode {
switch buf[0] {
case InteriorNodeIdentifier:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a modularity comment: Might be worth moving each of these cases into a helper function to clean this function up.

Comment thread merkletree/nodekv_test.go
wb := db.NewBatch()
enWant.storeToKV(1, []bool{false}, wb)
inWant.storeToKV(1, []bool{true}, wb)
lnWant.storeToKV(1, util.ToBits(lnWant.index)[:lnWant.level], wb)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is util.ToBits(lnWant.index)[:lnWant.level] how the developer is going to have to call ln.storeToKV()? If so, I think the interface should abstract this detail away.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same goes for loadNode() below in L94.

Copy link
Copy Markdown
Member

@masomel masomel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good, though some of my comments do require changes. Should we also add db hooks for the TB?

Comment thread merkletree/pad.go
// return nil
// }
// return str
return nil // util we have a better way to construct the tree partially based on the lookup index.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant? It seems like str.LoadFromKV() doesn't do anything tree-related.

@masomel
Copy link
Copy Markdown
Member

masomel commented Sep 20, 2016

I think I will stick with this implementation

The differences between the two benchmarks aren't huge, but I agree that the current implementation works fine.

I still have no idea how to put *kv.go into a separate package

I'm thinking it might make sense to structure the db hooks in a similar fashion to how we've structured our library: server calls protocol calls merkletree. So one possible solution would be to move all of the *kv.go files to a storage/kv/merkletree which imports merkletree, and create a higher-level storage/kv/protocol package that abstracts away the merkletree hooks. I think the only place where the developer should be aware of the DB is at the server level since it's not really part of the protocol or the data structure.

I also think that the hooks should only be for deserializing/loading and serializing/storing. If we don't make the serialization functions etc methods of the corresponding types, we also shouldn't have to add new interface methods (e.g. in policy), right? If this is possible, this should allow us to de-tangle the DB hooks from the actual data structure. Addditionally, if we remove the hooks in functions like GetSTR(), this will make the design a bit simpler, and a higher-level wrapper (say, at the protocol level) can decide to load from the db if GetSTR returns nil (as opposed to GetSTR needing to figure out that we've reached the limit of loaded STRs and load from the DB). Does this make sense?

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Sep 21, 2016

The differences between the two benchmarks aren't huge, but I agree that the current implementation works fine.

I'm trying to optimize the gob-encoding implementation. Hopefully it can improve the benchmarks.

I'm thinking it might make sense to structure the db hooks in a similar fashion to how we've structured our library: server calls protocol calls merkletree. So one possible solution would be to move all of the *kv.go files to a storage/kv/merkletree which imports merkletree, and create a higher-level storage/kv/protocol package that abstracts away the merkletree hooks.

Maybe I got your idea. Thanks for your suggestions! It totally makes sense!

I think the only place where the developer should be aware of the DB is at the server level since it's not really part of the protocol or the data structure.

Yeah, it's right! It makes things clearer. Let's see if I can make this work.

@vqhuy
Copy link
Copy Markdown
Member Author

vqhuy commented Sep 22, 2016

Close and move to #88.

@vqhuy vqhuy closed this Sep 22, 2016
@masomel masomel deleted the db branch October 9, 2016 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add hooks for storage backend

2 participants