zimscraperlib

Collection of python code to re-use across python-based scrapers

Usage

This library is meant to be installed via PyPI (zimscraperlib).
Make sure to reference it using a version code as the API is subject to frequent changes.
API should remain the same only within the same minor version.

Example usage:

zimscraperlib>=1.1,<1.2

See documentation at Read the Docs for details.

Warning

While this library brings support for downloading videos with yt-dlp, recent changes in Youtube have forced yt-dlp team to require new dependencies for youtube videos (see yt-dlp/yt-dlp#15012). These dependencies are significantly big and not needed for all other backend supported by yt-dlp (only youtube needs it). These dependencies are hence not included in this library dependencies (yet, see #268), you have to install them on your own if you intend to download videos from Youtube.

Dependencies

Most dependencies are installed automatically by pip (from PyPI by default). The following system packages may be required depending on which features you use:

libmagic — required for file type detection (used in most scrapers)
wget — required only for zimscraperlib.download functions
FFmpeg — required only for video processing functions
gifsicle (>=1.92) — required only for GIF optimization
libcairo — required only for SVG-to-PNG conversion
libzim — auto-installed via PyPI, not available on Windows
Pillow — auto-installed via PyPI; pre-built wheels are used by default and no system image libraries are needed. Only if you need to build Pillow from source should you install additional system libraries — see Pillow's build documentation for details.

Note: To run the full test suite, all system dependencies listed above must be installed.

macOS

brew install libmagic wget ffmpeg gifsicle cairo

Linux

sudo apt install libmagic1 wget ffmpeg gifsicle libcairo2

Alpine

apk add ffmpeg gifsicle libmagic wget cairo

Contribution

This project adheres to openZIM's Contribution Guidelines.

This project has implemented openZIM's Python bootstrap, conventions and policies v1.0.2.

All instructions below must be run from the root of your local clone of this repository.

If you do not already have it on your system, install hatch:

pip install hatch

Start a hatch shell — this will install all dependencies including dev in an isolated virtual environment:

hatch shell

Set up the pre-commit Git hook (runs linters automatically before each commit):

pre-commit install

Run tests with coverage:

invoke coverage

Users

Non-exhaustive list of scrapers using it (check status when updating API):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

zimscraperlib

Usage

Dependencies

macOS

Linux

Alpine

Contribution

Users

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

zimscraperlib

Usage

Dependencies

macOS

Linux

Alpine

Contribution

Users