New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053
New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053toniher wants to merge 35 commits intonf-core:masterfrom
Conversation
famosab
left a comment
There was a problem hiding this comment.
Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.
We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.
Hi @famosab . Thanks for the feedback and I will go through your comments! I was told about the module submission, but since you need the output of one of the processes for the other, I thought it would help potential users once it could become eventually accepted. But, certainly, is more work for everyone. Sorry about this and I will avoid it in future PRs. |
Not so many assertions for stub test Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
|
You can always propose modules after each other :) the approval is not dependent on that its basically just whether you adhere to our guidelines :) |
|
Something to keep in mind: the new models, like Gemma 4, are not supported by the llama-cpp-python version currently in the container. |
famosab
left a comment
There was a problem hiding this comment.
Few more comments (some things I only commented once but they should be followed for all occurences) :)
output to ${prefix}
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
output to ${prefix}
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
output to ${prefix}
Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
|
I think from my side it looks good, but now I spend so much time back-and-forth with you and this module is a bit unusual :) so I would like to have another set of eyes on this PR! I will ask for another review :) |
pinin4fjords
left a comment
There was a problem hiding this comment.
I really think we should avoid using module binaries until we're satisfied they don't limit module portability. Currently Wave is required for Cloud scenarios without a shared file system.
Templates are the more portable, if less pretty, approach.
pinin4fjords
left a comment
There was a problem hiding this comment.
I think there's a few things to resolve here since we're colouring outside the lines a bit.
| label 'process_gpu' | ||
|
|
||
| conda "${moduleDir}/environment.yml" | ||
| container "${task.accelerator ? 'quay.io/nf-core/llama-cpp-python:0.1.9' : 'community.wave.seqera.io/library/llama-cpp-python:0.3.16--b351398cd0ea7fc5'}" |
There was a problem hiding this comment.
Are you sure you need this? A little bit of AI chat suggests that:
llama-cpp-python compiled with CUDA support does runtime GPU detection and falls back to CPU when no GPU is present
... which, if correct, would mean you could just supply the same container in either case and avoid the complexity.
You might be able to do a multi-stage thing like this to bring the container size down:
FROM nvidia/cuda:12.4.1-devel-ubuntu22.04 AS builder
RUN apt-get update && apt-get install -y python3 python3-pip python3-dev
RUN pip3 install --prefix=/install llama-cpp-python==0.3.16 \
--extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
FROM nvidia/cuda:12.4.1-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y --no-install-recommends python3 python3-pip \
&& rm -rf /var/lib/apt/lists/*
COPY --from=builder /install /usr/local
... so it's less overhead for the non-GPU case (untested).
There was a problem hiding this comment.
I just tried again running Docker in my laptop (without GPU) using quay.io/nf-core/llama-cpp-python:0.1.9 and I cannot make it work.
│ RuntimeError: Failed to load shared library '/usr/local/lib/python3.10/dist-packages/llama_cpp/lib/libllama.so': libcuda.so.1: cannot open shared object file: No │
│ such file or directory
So, it looks that after building the Docker image with GPU support, you cannot run it without any GPU device. It's particularly annoying because there are different GPU types and CUDA versions. :(
| label 'process_gpu' | ||
|
|
||
| conda "${moduleDir}/environment.yml" | ||
| container "${task.accelerator ? 'quay.io/nf-core/llama-cpp-python:0.1.9' : 'community.wave.seqera.io/library/llama-cpp-python:0.3.16--b351398cd0ea7fc5'}" |
There was a problem hiding this comment.
since this tool is on conda-forge, you don't need to add a dockerfile, just use seqera containers https://nf-co.re/docs/developing/containers/seqera-containers
There was a problem hiding this comment.
I am actually using seqera containers for no gpu situations.
https://seqera.io/containers/?packages=conda-forge::llama-cpp-python=0.3.16
I understand I could approach it similarly as here:
modules/modules/nf-core/multiqc/meta.yml
Line 110 in dd6396b
Until there is any better solution, I could enable when it is, at the same time containers, accelerator (and amd64?) to use quay.io/nf-core/llama-cpp-python:0.1.9 and for the rest of situations to do as with the multiqc module.
Otherwise, I could simply remove the gpu container until there must be a better idea. It must be said that the speed gain is huge with gpu...
Any feedback is appreciated!
Any module example you could suggest as a model suitable for this case? |
Just look for the
Note that these are not quite script files, and you have to escape |
This pull request, contributed jointly with @lucacozzuto , provides a simple workload for running text inference tasks using llamacpp-python against local LLMs.
This effort was worked on during the nf-core Hackathon in March 2026.
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda