Feat: Add crawl4ai as an option for local scrape#51
Open
relic664 wants to merge 5 commits intodanny-avila:mainfrom
Open
Feat: Add crawl4ai as an option for local scrape#51relic664 wants to merge 5 commits intodanny-avila:mainfrom
relic664 wants to merge 5 commits intodanny-avila:mainfrom
Conversation
Implements Crawl4AI scraper alongside existing Firecrawl and Serper scrapers for web content extraction in the search tool. Changes: - Add 'crawl4ai' to ScraperProvider type - Add Crawl4AIScraperConfig and Crawl4AIScrapeResponse interfaces - Create crawl4ai-scraper.ts implementing BaseScraper interface - Update tool.ts to support Crawl4AI configuration and selection - Support extraction and chunking strategies - Use Bearer token authentication
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a PR to add crawl4ai as an option for local crawl, to supplement the existing firecrawl option. There's going to be a complementary PR in the LibreChat main repo, which I've drafted here
This PR builds on @lukolszewski original work and adds
fitas a default option for crawl4ai's/mdendpoint, which uses adaptive filtering to filter the markdown, or returns just the raw markdown (rawforfitStrategy).I've tested locally and it works fine. I have docker image for testing if anybody is interested (ghcr.io/relic664/librechat:latest).
It's worth nothing that this is a very basic implementation to provide a simple option for a self-hosted scrape option. This implementation doesn't provide options for the scrape beyond
fit(filtered markdown) orraw(raw markdown). Given that there's only one self-hosted option for scrape, I thought it was prudent to go ahead and make a MVP PR for crawl4ai before a full featured implementation with all the configuration knobs.A simple quick start is to set the env var
CRAWL4AI_API_URLto your instance, and in yourlibrechat.config: