Skip to content

Added support for DeepseekV2 model #382

Open
aditya-29 wants to merge 2 commits into
arcee-ai:mainfrom
aditya-29:main
Open

Added support for DeepseekV2 model #382
aditya-29 wants to merge 2 commits into
arcee-ai:mainfrom
aditya-29:main

Conversation

@aditya-29

Copy link
Copy Markdown

This pull request introduces the capability to merge DeepSeekV2 Mixture-of-Experts (MoE) models using MergeKit. To facilitate this, a deepseekv2.json configuration file has been added to the architecture directory. Additionally, a custom class analogous to Mixtral has been implemented to enable model merging based on the JSON configuration.

@metric-space metric-space self-requested a review August 16, 2024 23:23

@metric-space metric-space left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the wait

I think this is a good first stab but will need a bit more work to push it over the finish line.

Please see the comments and once you test it on your end it would be beneficial for both parties if you could mention how you tested it. It could be as simple as a yaml file for linear merging where you use the DeepseekV2

You've probably noticed the failing formatting check, so just be explicit please do address those

"layer_templates": {
"weights": [
{
"name" : "model.layers.${layer_index}.self_attn.q_proj.weight"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken I don't think this weight exists in the model. Please see here: https://huggingface.co/deepseek-ai/DeepSeek-V2/raw/main/model.safetensors.index.json

    "model.layers.0.self_attn.q_a_proj.weight": "model-00001-of-000055.safetensors",
    "model.layers.0.self_attn.q_a_layernorm.weight": "model-00001-of-000055.safetensors",
    "model.layers.0.self_attn.q_b_proj.weight": "model-00001-of-000055.safetensors",

Comment thread mergekit/architecture.py
def layer_weights(
self, index: int, config: PretrainedConfig
) -> Optional[List[WeightInfo]]:
num_experts = self.num_local_experts

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Any reason for local aliasing? I see it is also present in the Mixtral code, but any reason you left it in?

Comment thread mergekit/architecture.py
Comment thread mergekit/architecture.py
DEEPSEEKV2_INFO = _load_json_arch("deepseekv2.json")


def get_architecture_info(config: PretrainedConfig) -> ArchitectureInfo:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am mistaken which I could be, there needs to be a clause within the get_architecture_info function similar to Mixtral's otherwise the code https://github.com/arcee-ai/mergekit/blob/main/mergekit/merge.py#L51 will just pull in just the info associated with the template

@shamanez

Copy link
Copy Markdown
Contributor

@aditya-29 can you please respond :) sorry for the late reply.

@aditya-29

Copy link
Copy Markdown
Author

Thanks @metric-space and @shamanez. I didn't get a chance to go over the comments earlier. I will work on the suggested changes and reach out to you for any clarification

@shamanez

shamanez commented Aug 28, 2024 via email

Copy link
Copy Markdown
Contributor

@ehartford

Copy link
Copy Markdown

@cg123 is there plan to support DeepseekV2 and DeepseekV3?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants