Skip to content

New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053

Open
toniher wants to merge 35 commits intonf-core:masterfrom
biocorecrg:llamacpp
Open

New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053
toniher wants to merge 35 commits intonf-core:masterfrom
biocorecrg:llamacpp

Conversation

@toniher
Copy link
Copy Markdown
Member

@toniher toniher commented Mar 26, 2026

This pull request, contributed jointly with @lucacozzuto , provides a simple workload for running text inference tasks using llamacpp-python against local LLMs.
This effort was worked on during the nf-core Hackathon in March 2026.

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@toniher toniher added the new module Adding a new module label Mar 26, 2026
@toniher toniher requested a review from JoseEspinosa March 26, 2026 14:08
@toniher toniher moved this to Ready for review in Hackathon March 2026 Mar 26, 2026
Copy link
Copy Markdown
Contributor

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.

@toniher
Copy link
Copy Markdown
Member Author

toniher commented Apr 4, 2026

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.

Hi @famosab . Thanks for the feedback and I will go through your comments! I was told about the module submission, but since you need the output of one of the processes for the other, I thought it would help potential users once it could become eventually accepted. But, certainly, is more work for everyone. Sorry about this and I will avoid it in future PRs.

@famosab
Copy link
Copy Markdown
Contributor

famosab commented Apr 7, 2026

You can always propose modules after each other :) the approval is not dependent on that its basically just whether you adhere to our guidelines :)

@lucacozzuto
Copy link
Copy Markdown
Contributor

Something to keep in mind: the new models, like Gemma 4, are not supported by the llama-cpp-python version currently in the container.

Copy link
Copy Markdown
Contributor

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments (some things I only commented once but they should be followed for all occurences) :)

toniher and others added 7 commits April 8, 2026 17:09
output to ${prefix}

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
output to ${prefix}

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
output to ${prefix}

Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
@famosab
Copy link
Copy Markdown
Contributor

famosab commented Apr 13, 2026

I think from my side it looks good, but now I spend so much time back-and-forth with you and this module is a bit unusual :) so I would like to have another set of eyes on this PR! I will ask for another review :)

Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we should avoid using module binaries until we're satisfied they don't limit module portability. Currently Wave is required for Cloud scenarios without a shared file system.

Templates are the more portable, if less pretty, approach.

Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a few things to resolve here since we're colouring outside the lines a bit.

label 'process_gpu'

conda "${moduleDir}/environment.yml"
container "${task.accelerator ? 'quay.io/nf-core/llama-cpp-python:0.1.9' : 'community.wave.seqera.io/library/llama-cpp-python:0.3.16--b351398cd0ea7fc5'}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need this? A little bit of AI chat suggests that:

llama-cpp-python compiled with CUDA support does runtime GPU detection and falls back to CPU when no GPU is present

... which, if correct, would mean you could just supply the same container in either case and avoid the complexity.

You might be able to do a multi-stage thing like this to bring the container size down:

  FROM nvidia/cuda:12.4.1-devel-ubuntu22.04 AS builder                                                                         
   
  RUN apt-get update && apt-get install -y python3 python3-pip python3-dev                                                     
  RUN pip3 install --prefix=/install llama-cpp-python==0.3.16 \
      --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124                                                   
                  
  FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04                                                                                  
                  
  RUN apt-get update && apt-get install -y --no-install-recommends python3 python3-pip \                                       
      && rm -rf /var/lib/apt/lists/*
                                                                                                                               
  COPY --from=builder /install /usr/local

... so it's less overhead for the non-GPU case (untested).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried again running Docker in my laptop (without GPU) using quay.io/nf-core/llama-cpp-python:0.1.9 and I cannot make it work.

│     RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so': libcuda.so.1: cannot open shared object file: No    │
│ such file or directory

So, it looks that after building the Docker image with GPU support, you cannot run it without any GPU device. It's particularly annoying because there are different GPU types and CUDA versions. :(

label 'process_gpu'

conda "${moduleDir}/environment.yml"
container "${task.accelerator ? 'quay.io/nf-core/llama-cpp-python:0.1.9' : 'community.wave.seqera.io/library/llama-cpp-python:0.3.16--b351398cd0ea7fc5'}"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No singularity?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related discussed here: #11053 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this tool is on conda-forge, you don't need to add a dockerfile, just use seqera containers https://nf-co.re/docs/developing/containers/seqera-containers

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually using seqera containers for no gpu situations.

https://seqera.io/containers/?packages=conda-forge::llama-cpp-python=0.3.16

I understand I could approach it similarly as here:

Until there is any better solution, I could enable when it is, at the same time containers, accelerator (and amd64?) to use quay.io/nf-core/llama-cpp-python:0.1.9 and for the rest of situations to do as with the multiqc module.
Otherwise, I could simply remove the gpu container until there must be a better idea. It must be said that the speed gain is huge with gpu...
Any feedback is appreciated!

@toniher
Copy link
Copy Markdown
Member Author

toniher commented Apr 13, 2026

I really think we should avoid using module binaries until we're satisfied they don't limit module portability. Currently Wave is required for Cloud scenarios without a shared file system.

Templates are the more portable, if less pretty, approach.

Any module example you could suggest as a model suitable for this case?

@pinin4fjords
Copy link
Copy Markdown
Member

I really think we should avoid using module binaries until we're satisfied they don't limit module portability. Currently Wave is required for Cloud scenarios without a shared file system.
Templates are the more portable, if less pretty, approach.

Any module example you could suggest as a model suitable for this case?

Just look for the template keyword in the modules repo. But here are a couple of examples:

Note that these are not quite script files, and you have to escape $ etc with \.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Ready for review

Development

Successfully merging this pull request may close these issues.

6 participants