From 9cfdf98795f52ff1b08614758ff62519f1e83d12 Mon Sep 17 00:00:00 2001 From: imilev Date: Sun, 29 Jun 2025 01:42:35 +0300 Subject: [PATCH 1/4] Added high-level diagrams --- .codeboarding/Core_Infrastructure.md | 169 ++++++++++++ .codeboarding/Experiment_Data_Core.md | 223 ++++++++++++++++ .codeboarding/Expression_Matrix_Generation.md | 111 ++++++++ .codeboarding/Image_Processing_Management.md | 181 +++++++++++++ .codeboarding/Mask_Label_Management.md | 245 ++++++++++++++++++ .codeboarding/Spot_Intensity_Analysis.md | 215 +++++++++++++++ .codeboarding/on_boarding.md | 237 +++++++++++++++++ 7 files changed, 1381 insertions(+) create mode 100644 .codeboarding/Core_Infrastructure.md create mode 100644 .codeboarding/Experiment_Data_Core.md create mode 100644 .codeboarding/Expression_Matrix_Generation.md create mode 100644 .codeboarding/Image_Processing_Management.md create mode 100644 .codeboarding/Mask_Label_Management.md create mode 100644 .codeboarding/Spot_Intensity_Analysis.md create mode 100644 .codeboarding/on_boarding.md diff --git a/.codeboarding/Core_Infrastructure.md b/.codeboarding/Core_Infrastructure.md new file mode 100644 index 00000000..6121745c --- /dev/null +++ b/.codeboarding/Core_Infrastructure.md @@ -0,0 +1,169 @@ +```mermaid + +graph LR + + Versioning_Component["Versioning Component"] + + Data_Types_and_Structures_Component["Data Types and Structures Component"] + + Configuration_Management_Component["Configuration Management Component"] + + Logging_and_System_Info_Component["Logging and System Info Component"] + + Execution_Orchestration_Component["Execution Orchestration Component"] + + Image_Level_Adjustment_Component["Image Level Adjustment Component"] + + Versioning_Component -- "provides version to" --> Logging_and_System_Info_Component + + Data_Types_and_Structures_Component -- "defines" --> Configuration_Management_Component + + Data_Types_and_Structures_Component -- "consumed by" --> Execution_Orchestration_Component + + Configuration_Management_Component -- "configures" --> Execution_Orchestration_Component + + Configuration_Management_Component -- "provides settings to" --> Logging_and_System_Info_Component + + Logging_and_System_Info_Component -- "records operations of" --> Execution_Orchestration_Component + + Logging_and_System_Info_Component -- "reports on" --> Data_Types_and_Structures_Component + + Execution_Orchestration_Component -- "uses" --> Configuration_Management_Component + + Execution_Orchestration_Component -- "logs activities via" --> Logging_and_System_Info_Component + + Image_Level_Adjustment_Component -- "processes" --> Data_Types_and_Structures_Component + + Image_Level_Adjustment_Component -- "configured by" --> Configuration_Management_Component + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +This section provides an overview of the `Core Infrastructure` components within the `starfish` project. These components are fundamental because they address the essential, cross-cutting concerns of the library, providing foundational services, defining core data representations, and offering basic utilities that higher-level modules depend on. They are not specific image processing algorithms but rather the underlying framework that enables the entire system to function robustly and consistently. + + + +### Versioning Component + +This component is responsible for programmatically determining and managing the version of the `starfish` software. It interacts with version control systems (like Git) to extract version information and format it according to standard conventions (e.g., PEP 440). This ensures that the software version can be consistently identified and reported. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core._version` (0:0) + + + + + +### Data Types and Structures Component + +This component defines the fundamental data models, structures (e.g., `DecodedSpots`, `SpotAttributes`, `ValidatedTable`), and enumerations (e.g., `Axes`, `Levels`, `Coordinates`) used throughout the `starfish` library. It ensures consistent data representation and interoperability across different modules and algorithms. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.types` (0:0) + + + + + +### Configuration Management Component + +This component handles the loading, parsing, and centralized management of application-wide configurations. It provides a flexible and centralized way to define and access parameters that control the behavior of various `starfish` algorithms and processes, supporting nested configuration structures. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.config` (0:0) + +- `starfish.core.util.config` (0:0) + + + + + +### Logging and System Info Component + +This component provides comprehensive logging capabilities for the `starfish` application, enabling detailed tracking of execution flow, warnings, and errors. It also gathers system and dependency information, which is vital for diagnostics, error reporting, and understanding the execution environment. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.util.logging` (0:0) + + + + + +### Execution Orchestration Component + +This component is responsible for orchestrating the execution flow of different processing stages within the `starfish` pipeline. It manages the sequence of operations, ensuring proper order and dependencies, and can integrate utilities like timing for performance monitoring and optimization. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.util.exec` (0:0) + + + + + +### Image Level Adjustment Component + +This component offers a set of utility functions specifically designed for adjusting the intensity levels of images. This is a common and fundamental preprocessing step in image analysis pipelines, used to enhance contrast, normalize data, or prepare images for subsequent processing. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.util.levels` (45:119) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Experiment_Data_Core.md b/.codeboarding/Experiment_Data_Core.md new file mode 100644 index 00000000..628ead1a --- /dev/null +++ b/.codeboarding/Experiment_Data_Core.md @@ -0,0 +1,223 @@ +```mermaid + +graph LR + + Experiment["Experiment"] + + FieldOfView["FieldOfView"] + + ImageStack["ImageStack"] + + Codebook["Codebook"] + + StarfishConfig["StarfishConfig"] + + SpaceTx_Validator["SpaceTx Validator"] + + CropParameters["CropParameters"] + + TileCollectionData["TileCollectionData"] + + TileData["TileData"] + + Experiment -- "contains" --> FieldOfView + + Experiment -- "references" --> Codebook + + Experiment -- "uses" --> StarfishConfig + + Experiment -- "uses" --> SpaceTx_Validator + + FieldOfView -- "manages" --> ImageStack + + FieldOfView -- "uses" --> CropParameters + + ImageStack -- "uses" --> CropParameters + + ImageStack -- "composed of" --> TileCollectionData + + TileCollectionData -- "composed of" --> TileData + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +The `Experiment Data Core` subsystem in Starfish is designed to encapsulate and manage all data associated with a spatial transcriptomics experiment. It provides a structured, hierarchical representation of experimental data, from the top-level experiment down to individual image tiles, along with essential metadata and validation mechanisms. The chosen components are fundamental because they directly represent the data, define its structure, enable its interpretation, and ensure its integrity. + + + +### Experiment + +The top-level container representing an entire spatial transcriptomics experiment. It orchestrates access to all associated data, including multiple fields of view (FOVs) and the experiment's codebook. It provides methods for loading experiment data from standardized formats (e.g., JSON) and offers iterable access to its constituent FOVs. + + + + + +**Related Classes/Methods**: + + + +- `Experiment` (212:453) + + + + + +### FieldOfView + +Represents a single field of view within an experiment, corresponding to a specific spatial region imaged. It acts as a direct interface to the raw imaging data for that region, providing methods to retrieve individual images or entire image stacks. + + + + + +**Related Classes/Methods**: + + + +- `FieldOfView` (1:1) + + + + + +### ImageStack + +A multi-dimensional array-like structure that holds the actual image data (raw or processed fluorescent images). It provides functionalities for accessing, manipulating, and iterating over image data across various dimensions (e.g., channels, imaging rounds, Z-planes). It's a core component for any image-based processing. + + + + + +**Related Classes/Methods**: + + + +- `ImageStack` (1:1) + + + + + +### Codebook + +Stores the mapping between fluorescent probes (or imaging channels) and the specific gene targets they represent. This information is critical for decoding the spatial transcriptomics data, translating raw intensity measurements into gene expression profiles. It can be loaded from a JSON format. + + + + + +**Related Classes/Methods**: + + + +- `Codebook` (28:804) + + + + + +### StarfishConfig + +A centralized configuration management component for the Starfish application. It provides a structured way to store and retrieve various settings that influence the behavior of different parts of the software, including data loading, processing, and analysis. + + + + + +**Related Classes/Methods**: + + + +- `StarfishConfig` (1:1) + + + + + +### SpaceTx Validator + +A utility component responsible for validating the structure and content of experiment data against the SpaceTx format specification. This ensures that the input data adheres to predefined standards, promoting interoperability and data quality. + + + + + +**Related Classes/Methods**: + + + +- `SpaceTx Validator` (1:1) + + + + + +### CropParameters + +A data structure that defines the parameters for cropping image data. It specifies the region of interest to be extracted from a larger image stack, enabling focused analysis on specific parts of the field of view. + + + + + +**Related Classes/Methods**: + + + +- `CropParameters` (1:1) + + + + + +### TileCollectionData + +An internal class that manages the underlying data storage and access for collections of individual image tiles. It serves as a foundational layer for `ImageStack`, handling the organization and retrieval of image data chunks. + + + + + +**Related Classes/Methods**: + + + +- `TileCollectionData` (1:1) + + + + + +### TileData + +An internal class that manages the underlying data storage and access for individual image tiles. It represents a single chunk of image data and is a building block for `TileCollectionData`. + + + + + +**Related Classes/Methods**: + + + +- `TileData` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Expression_Matrix_Generation.md b/.codeboarding/Expression_Matrix_Generation.md new file mode 100644 index 00000000..4a02e22f --- /dev/null +++ b/.codeboarding/Expression_Matrix_Generation.md @@ -0,0 +1,111 @@ +```mermaid + +graph LR + + ExpressionMatrix["ExpressionMatrix"] + + DecodedIntensityTable["DecodedIntensityTable"] + + ExpressionMatrixConcatenation["ExpressionMatrixConcatenation"] + + TryImportUtility["TryImportUtility"] + + DecodedIntensityTable -- "provides data to" --> ExpressionMatrix + + ExpressionMatrix -- "uses" --> TryImportUtility + + ExpressionMatrixConcatenation -- "operates on" --> ExpressionMatrix + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +This subsystem is responsible for the creation, management, and output of expression matrices, which quantify gene expression levels. These matrices serve as the final, standardized data output for subsequent biological analysis within the `starfish` framework. + + + +### ExpressionMatrix + +This is the central data structure for storing and manipulating quantitative gene expression data. It encapsulates the gene expression levels, along with associated metadata, and provides core functionalities for loading data from various sources and saving it into standardized formats (e.g., Loom, AnnData). It represents the final, processed output of the gene expression quantification pipeline. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.expression_matrix.expression_matrix:ExpressionMatrix` (6:93) + + + + + +### DecodedIntensityTable + +This component represents the gene expression intensities after the decoding process. It serves as a crucial intermediate data structure, holding the quantitative measurements of gene expression for identified spots or cells, which are then used to construct the final `ExpressionMatrix`. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.intensity_table.decoded_intensity_table:DecodedIntensityTable` (15:190) + + + + + +### ExpressionMatrixConcatenation + +This component provides the functionality to combine multiple `ExpressionMatrix` objects into a single, larger, unified expression matrix. This is essential for integrating gene expression data from different fields of view, experimental replicates, or samples, enabling a comprehensive analysis across a larger dataset. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.expression_matrix.concatenate:ExpressionMatrixConcatenation` (1:1) + + + + + +### TryImportUtility + +This is a utility module designed to safely attempt the import of Python modules. In the context of `ExpressionMatrix`, its primary role is to manage optional dependencies required for specific functionalities, such as saving the expression matrix to external file formats like Loom or AnnData. It ensures that these features can be used if the necessary libraries are installed, without causing errors if they are not. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.util.try_import:TryImportUtility` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Image_Processing_Management.md b/.codeboarding/Image_Processing_Management.md new file mode 100644 index 00000000..939d9011 --- /dev/null +++ b/.codeboarding/Image_Processing_Management.md @@ -0,0 +1,181 @@ +```mermaid + +graph LR + + ImageStack_Data_Structure["ImageStack Data Structure"] + + Image_Data_Parsers["Image Data Parsers"] + + Image_Cropping_Parameters["Image Cropping Parameters"] + + Image_Filtering_Algorithms["Image Filtering Algorithms"] + + Image_Segmentation_Algorithms["Image Segmentation Algorithms"] + + Image_Registration_Algorithms["Image Registration Algorithms"] + + Image_Data_Parsers -- "provides data to" --> ImageStack_Data_Structure + + Image_Data_Parsers -- "uses" --> Image_Cropping_Parameters + + Image_Filtering_Algorithms -- "processes" --> ImageStack_Data_Structure + + Image_Segmentation_Algorithms -- "processes" --> ImageStack_Data_Structure + + Image_Registration_Algorithms -- "processes" --> ImageStack_Data_Structure + + ImageStack_Data_Structure -- "is processed by" --> Image_Filtering_Algorithms + + ImageStack_Data_Structure -- "is processed by" --> Image_Segmentation_Algorithms + + ImageStack_Data_Structure -- "is processed by" --> Image_Registration_Algorithms + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +This subsystem is responsible for the core handling, manipulation, and processing of multi-dimensional image data within the `starfish` project. It provides the foundational data structures and algorithms necessary for various image analysis tasks, from loading raw data to applying advanced transformations. + + + +### ImageStack Data Structure + +The fundamental data structure representing a multi-dimensional image. It provides methods for accessing, slicing, and basic manipulation of image data, serving as the primary input and output for image processing operations. It is designed to hold raw or processed fluorescent images for an experiment or a FieldOfView. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.imagestack.imagestack.ImageStack` (67:1273) + + + + + +### Image Data Parsers + +A collection of modules and classes responsible for reading and converting raw image data from various external sources (e.g., numpy arrays, tile fetchers, tilesets) into the internal `TileData` and `TileCollectionData` structures. These intermediate structures are then used to construct `ImageStack` objects. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.imagestack.parser._tiledata` (1:1) + +- `starfish.core.imagestack.parser.crop` (1:1) + +- `starfish.core.imagestack.parser.numpy` (1:1) + +- `starfish.core.imagestack.parser.tilefetcher._parser` (1:1) + +- `starfish.core.imagestack.parser.tileset._parser` (1:1) + + + + + +### Image Cropping Parameters + +Defines the parameters and logic for cropping or slicing image data. This component is utilized by the `Image Parsers` to extract specific regions of interest from larger image stacks, optimizing memory usage and processing time by only loading and processing necessary data. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.imagestack.parser.crop.CropParameters` (10:240) + + + + + +### Image Filtering Algorithms + +A collection of algorithms that apply various filtering techniques (e.g., bandpass, Gaussian, Laplace, deconvolution) to `ImageStack` objects to enhance or modify image data. These algorithms typically take an `ImageStack` as input and produce a new, filtered `ImageStack`. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.image.Filter._base.FilterAlgorithm` (7:12) + +- `starfish.core.image.Filter.bandpass` (1:1) + +- `starfish.core.image.Filter.gaussian_low_pass` (1:1) + +- `starfish.core.image.Filter.richardson_lucy_deconvolution` (1:1) + + + + + +### Image Segmentation Algorithms + +Provides algorithms for identifying and delineating distinct objects or regions within an image. A prominent example is the Watershed algorithm, which is used to separate touching objects. These algorithms typically take an `ImageStack` as input and produce a segmented output, often in the form of a mask or labeled image. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.image.Segment._base.SegmentAlgorithm` (7:17) + +- `starfish.core.image.Segment.watershed` (1:1) + + + + + +### Image Registration Algorithms + +Contains algorithms for aligning multiple images or image stacks to correct for spatial distortions, shifts, or rotations. This is crucial for integrating data from different acquisition rounds or fields of view. It includes algorithms for learning transformations (e.g., Translation) and applying them (e.g., Warp). + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.image._registration._base` (1:1) + +- `starfish.core.image._registration.ApplyTransform` (1:1) + +- `starfish.core.image._registration.LearnTransform` (1:1) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Mask_Label_Management.md b/.codeboarding/Mask_Label_Management.md new file mode 100644 index 00000000..6a222971 --- /dev/null +++ b/.codeboarding/Mask_Label_Management.md @@ -0,0 +1,245 @@ +```mermaid + +graph LR + + BinaryMaskCollection["BinaryMaskCollection"] + + LabelImage["LabelImage"] + + SegmentationMaskCollection["SegmentationMaskCollection"] + + BinarizeAlgorithm["BinarizeAlgorithm"] + + ThresholdBinarize["ThresholdBinarize"] + + SegmentAlgorithm["SegmentAlgorithm"] + + WatershedSegment["WatershedSegment"] + + FilterAlgorithm["FilterAlgorithm"] + + AreaFilter["AreaFilter"] + + MergeAlgorithm["MergeAlgorithm"] + + SegmentationMaskCollection -- "specializes" --> BinaryMaskCollection + + ThresholdBinarize -- "implements" --> BinarizeAlgorithm + + WatershedSegment -- "implements" --> SegmentAlgorithm + + AreaFilter -- "implements" --> FilterAlgorithm + + BinarizeAlgorithm -- "produces" --> BinaryMaskCollection + + SegmentAlgorithm -- "produces" --> LabelImage + + LabelImage -- "can be converted to" --> BinaryMaskCollection + + FilterAlgorithm -- "operates on" --> BinaryMaskCollection + + FilterAlgorithm -- "operates on" --> LabelImage + + MergeAlgorithm -- "combines" --> BinaryMaskCollection + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +The `Mask & Label Management` subsystem in `starfish` is responsible for the creation, manipulation, and processing of binary masks, labeled images, and segmentation masks. It provides a robust framework for various morphological operations, ensuring data consistency and extensibility through well-defined interfaces and specialized data structures. + + + +### BinaryMaskCollection + +This is the foundational data structure for storing and manipulating collections of binary masks. It allows for the creation of masks from various sources (arrays, label images, external files) and provides methods for accessing, cropping, and reducing individual masks. It also manages the normalization of pixel and physical coordinate systems, ensuring spatial consistency. + + + + + +**Related Classes/Methods**: + + + +- `BinaryMaskCollection` (0:0) + + + + + +### LabelImage + +Represents an image where each distinct object or region is assigned a unique integer label. It supports creation from arrays and coordinate ticks and can be converted into a `BinaryMaskCollection`. This component is crucial for representing segmented regions before they are converted into binary masks. + + + + + +**Related Classes/Methods**: + + + +- `LabelImage` (28:167) + + + + + +### SegmentationMaskCollection + +A specialized subclass of `BinaryMaskCollection` tailored specifically for handling segmentation masks. It inherits all functionalities from its parent and adds specific methods relevant to segmentation, such as loading from compressed archives, making it the primary data structure for segmentation results. + + + + + +**Related Classes/Methods**: + + + +- `SegmentationMaskCollection` (0:0) + + + + + +### BinarizeAlgorithm + +Defines the abstract interface for all binarization algorithms. These algorithms convert an input image into a binary mask, typically by applying a threshold. This abstraction allows for different binarization methods to be implemented and used interchangeably. + + + + + +**Related Classes/Methods**: + + + +- `BinarizeAlgorithm` (0:0) + + + + + +### ThresholdBinarize + +A concrete implementation of `BinarizeAlgorithm` that performs binarization by applying a threshold to an input image, resulting in a `BinaryMaskCollection`. This is a common and essential binarization technique. + + + + + +**Related Classes/Methods**: + + + +- `ThresholdBinarize` (0:0) + + + + + +### SegmentAlgorithm + +Defines the abstract interface for segmentation algorithms, which are responsible for identifying and delineating distinct objects or regions within an image. This provides a common contract for various segmentation approaches. + + + + + +**Related Classes/Methods**: + + + +- `SegmentAlgorithm` (0:0) + + + + + +### WatershedSegment + +A concrete implementation of `SegmentAlgorithm` that uses the watershed algorithm for image segmentation. It takes an image and produces a `LabelImage`, which can then be converted to a `BinaryMaskCollection`. This algorithm is a powerful tool for separating touching objects. + + + + + +**Related Classes/Methods**: + + + +- `WatershedSegment` (0:0) + + + + + +### FilterAlgorithm + +Defines the abstract interface for filtering algorithms that operate on `BinaryMaskCollection` or `LabelImage` objects to refine, select, or modify regions based on certain criteria (e.g., size, shape). This allows for post-processing of masks and labels. + + + + + +**Related Classes/Methods**: + + + +- `FilterAlgorithm` (0:0) + + + + + +### AreaFilter + +A concrete implementation of `FilterAlgorithm` that filters `BinaryMaskCollection` objects based on the area of the individual masks. This is a practical example of how filtering can be applied to refine segmentation results. + + + + + +**Related Classes/Methods**: + + + +- `AreaFilter` (0:0) + + + + + +### MergeAlgorithm + +Defines the abstract interface for algorithms that combine multiple `BinaryMaskCollection` objects into a single, unified collection. This is essential for integrating masks from different sources or processing steps. + + + + + +**Related Classes/Methods**: + + + +- `MergeAlgorithm` (0:0) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Spot_Intensity_Analysis.md b/.codeboarding/Spot_Intensity_Analysis.md new file mode 100644 index 00000000..e96b546a --- /dev/null +++ b/.codeboarding/Spot_Intensity_Analysis.md @@ -0,0 +1,215 @@ +```mermaid + +graph LR + + Raw_Image_Data["Raw Image Data"] + + Codebook["Codebook"] + + Spot_Detection_Algorithms["Spot Detection Algorithms"] + + Spot_Intensity_Metadata["Spot Intensity & Metadata"] + + Spot_Decoding_Algorithms["Spot Decoding Algorithms"] + + Pixel_Level_Detection_Decoding["Pixel-Level Detection & Decoding"] + + Segmentation_Masks["Segmentation Masks"] + + Spot_Assignment_Algorithms["Spot Assignment Algorithms"] + + Raw_Image_Data -- "is processed by" --> Spot_Detection_Algorithms + + Raw_Image_Data -- "is processed by" --> Pixel_Level_Detection_Decoding + + Spot_Detection_Algorithms -- "produces" --> Spot_Intensity_Metadata + + Codebook -- "is used by" --> Spot_Decoding_Algorithms + + Codebook -- "is used by" --> Pixel_Level_Detection_Decoding + + Spot_Intensity_Metadata -- "is consumed by" --> Spot_Decoding_Algorithms + + Spot_Decoding_Algorithms -- "produces decoded data for" --> Spot_Intensity_Metadata + + Pixel_Level_Detection_Decoding -- "produces" --> Spot_Intensity_Metadata + + Spot_Intensity_Metadata -- "is consumed by" --> Spot_Assignment_Algorithms + + Segmentation_Masks -- "is used by" --> Spot_Assignment_Algorithms + + Spot_Assignment_Algorithms -- "produces spatially assigned data for" --> Spot_Intensity_Metadata + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +This subsystem is responsible for the core computational steps of identifying, quantifying, and decoding individual RNA molecules (spots) within microscopy images, and subsequently assigning them to specific biological targets or regions. It transforms raw image data into quantitative expression profiles. + + + +### Raw Image Data + +The primary input data structure representing the multi-dimensional raw image data acquired from the experiment. It serves as the foundational source of pixel intensities upon which all spot analysis operations are performed. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.imagestack.imagestack` (0:0) + + + + + +### Codebook + +A critical reference data structure containing the expected intensity profiles (barcodes) for known biological targets (e.g., genes). It is used by decoding algorithms to assign identities to detected spots based on their measured intensity patterns. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.codebook.codebook` (0:0) + + + + + +### Spot Detection Algorithms + +A collection of algorithms (e.g., BlobDetector, LocalMaxPeakFinder, TrackpyLocalMaxPeakFinder) responsible for identifying and localizing potential spots within the ImageStack. They output initial spot candidates and their raw intensity properties. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.spots.FindSpots` (0:0) + + + + + +### Spot Intensity & Metadata + +These are central data structures that store the quantitative and descriptive information about detected and decoded spots. IntensityTable holds raw intensity profiles, DecodedIntensityTable extends this with assigned biological identities, SpotAttributes stores spatial coordinates and other properties, and DecodedSpots aggregates the final decoded spot information. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.intensity_table.intensity_table` (0:0) + +- `starfish.core.intensity_table.decoded_intensity_table` (0:0) + +- `starfish.core.types._spot_attributes` (0:0) + +- `starfish.core.types._decoded_spots` (0:0) + + + + + +### Spot Decoding Algorithms + +A suite of algorithms (e.g., CheckAll, MetricDistance, PerRoundMaxChannel, SimpleLookupDecoder) that take the raw intensity profiles from IntensityTable and, using the Codebook, assign a biological identity (e.g., gene name) to each spot. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.spots.DecodeSpots` (0:0) + + + + + +### Pixel-Level Detection & Decoding + +Algorithms (e.g., PixelSpotDecoder) that perform spot detection and decoding directly at the pixel level. This can be an alternative or complementary approach to FindSpots and DecodeSpots, especially useful for dense or overlapping spots. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.spots.DetectPixels` (0:0) + + + + + +### Segmentation Masks + +A data structure representing segmented biological regions (e.g., cells, nuclei) as binary masks. These masks are used to associate detected and decoded spots with specific anatomical or cellular contexts. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.morphology.binary_mask.binary_mask` (0:0) + +- `starfish.core.segmentation_mask.segmentation_mask` (0:0) + + + + + +### Spot Assignment Algorithms + +Algorithms (e.g., Label) that assign decoded spots to specific biological targets or regions, often by overlaying them with segmented regions represented by Segmentation Masks. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.spots.AssignTargets` (0:0) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md new file mode 100644 index 00000000..f1b79467 --- /dev/null +++ b/.codeboarding/on_boarding.md @@ -0,0 +1,237 @@ +```mermaid + +graph LR + + Experiment_Data_Core["Experiment Data Core"] + + Data_Ingestion_Validation["Data Ingestion & Validation"] + + Image_Processing_Management["Image Processing & Management"] + + Spot_Intensity_Analysis["Spot & Intensity Analysis"] + + Mask_Label_Management["Mask & Label Management"] + + Expression_Matrix_Generation["Expression Matrix Generation"] + + Core_Infrastructure["Core Infrastructure"] + + Data_Ingestion_Validation -- "provides data to" --> Experiment_Data_Core + + Experiment_Data_Core -- "provides experimental data to" --> Image_Processing_Management + + Data_Ingestion_Validation -- "loads and validates codebooks for" --> Spot_Intensity_Analysis + + Image_Processing_Management -- "provides image data to" --> Spot_Intensity_Analysis + + Image_Processing_Management -- "outputs masks to" --> Mask_Label_Management + + Mask_Label_Management -- "provides masks to" --> Spot_Intensity_Analysis + + Spot_Intensity_Analysis -- "provides data to" --> Expression_Matrix_Generation + + click Experiment_Data_Core href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Experiment_Data_Core.md" "Details" + + click Image_Processing_Management href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Image_Processing_Management.md" "Details" + + click Spot_Intensity_Analysis href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Spot_Intensity_Analysis.md" "Details" + + click Mask_Label_Management href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Mask_Label_Management.md" "Details" + + click Expression_Matrix_Generation href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Expression_Matrix_Generation.md" "Details" + + click Core_Infrastructure href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Core_Infrastructure.md" "Details" + +``` + + + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + + + +## Details + + + +High-level architecture overview of the `starfish` project, detailing its main components, their responsibilities, associated source code, and inter-component data flow and relationships. + + + +### Experiment Data Core [[Expand]](./Experiment_Data_Core.md) + +The central data structure representing a spatial transcriptomics experiment. It encapsulates fields of view, image stacks, and associated metadata, serving as the primary container for all experimental data. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.experiment.experiment.Experiment` (212:453) + +- `starfish.core.experiment.experiment.FieldOfView` (32:193) + + + + + +### Data Ingestion & Validation + +Responsible for loading various spatial transcriptomics datasets into the Experiment Data Core and validating their structure and content against SpaceTx schemas. It also handles the loading and validation of codebooks. + + + + + +**Related Classes/Methods**: + + + +- `starfish.data.MERFISH` (3:26) + +- `starfish.data.ISS` (96:119) + +- `starfish.core.spacetx_format.util.SpaceTxValidator` (26:202) + +- `starfish.core.spacetx_format.validate_sptx.validate` (17:75) + +- `starfish.core.codebook.codebook.Codebook` (28:804) + + + + + +### Image Processing & Management [[Expand]](./Image_Processing_Management.md) + +Manages multi-dimensional image data, offering functionalities for loading, slicing, and basic transformations. It also applies various image processing techniques such as filtering, segmentation, and registration to image stacks. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.imagestack.imagestack.ImageStack` (67:1273) + +- `starfish.core.imagestack.parser` (0:0) + +- `starfish.core.image.Filter` (0:0) + +- `starfish.core.image.Segment` (0:0) + +- `starfish.core.image._registration` (0:0) + + + + + +### Spot & Intensity Analysis [[Expand]](./Spot_Intensity_Analysis.md) + +Contains algorithms for identifying potential spots (e.g., RNA molecules), decoding their intensity profiles using a codebook, detecting spots at the pixel level, and assigning decoded spots to specific biological targets or regions. It also manages the measured intensity values for detected spots. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.spots.FindSpots` (0:0) + +- `starfish.core.spots.DecodeSpots` (0:0) + +- `starfish.core.spots.DetectPixels` (0:0) + +- `starfish.core.spots.AssignTargets` (0:0) + +- `starfish.core.intensity_table.intensity_table.IntensityTable` (26:455) + +- `starfish.core.intensity_table.decoded_intensity_table.DecodedIntensityTable` (15:190) + + + + + +### Mask & Label Management [[Expand]](./Mask_Label_Management.md) + +Manages and processes collections of binary masks, labeled images, and segmentation masks, which represent segmented regions or objects. It includes functionalities for binarization, filtering, merging, and general morphological operations. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.morphology.binary_mask.binary_mask.BinaryMaskCollection` (48:760) + +- `starfish.core.morphology.label_image.label_image.LabelImage` (28:167) + +- `starfish.core.segmentation_mask.segmentation_mask.SegmentationMaskCollection` (15:48) + +- `starfish.core.morphology.Binarize` (0:0) + +- `starfish.core.morphology.Filter` (0:0) + +- `starfish.core.morphology.Merge` (0:0) + +- `starfish.core.morphology.Segment` (0:0) + + + + + +### Expression Matrix Generation [[Expand]](./Expression_Matrix_Generation.md) + +Creates and manages expression matrices, which quantify gene expression levels within defined regions or cells, serving as the final output for downstream biological analysis. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.expression_matrix.expression_matrix.ExpressionMatrix` (6:93) + + + + + +### Core Infrastructure [[Expand]](./Core_Infrastructure.md) + +Provides foundational utility functions for configuration management, logging, versioning, and defines fundamental data structures and constants used throughout the starfish library, ensuring consistent data representation. + + + + + +**Related Classes/Methods**: + + + +- `starfish.core.config.StarfishConfig` (0:0) + +- `starfish.core.util` (46:50) + +- `starfish.core._version` (0:0) + +- `starfish.core.types` (0:0) + + + + + + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file From 4be9da3d4cdda98cdb778ca531c49f87bc2ff347 Mon Sep 17 00:00:00 2001 From: ivanmilevtues Date: Wed, 13 Aug 2025 11:44:42 +0200 Subject: [PATCH 2/4] Updated diagrams --- .../Agentic_Reasoning_External_Tools.json | 151 ++++++++ .../Agentic_Reasoning_External_Tools.md | 101 +++++ .codeboarding/Asynchronous_Task_Worker.json | 56 +++ .codeboarding/Asynchronous_Task_Worker.md | 45 +++ .codeboarding/Backend_Core.json | 112 ++++++ .codeboarding/Backend_Core.md | 71 ++++ .codeboarding/Core_Infrastructure.md | 169 --------- .codeboarding/Data_Ingestion_Storage.json | 116 ++++++ .codeboarding/Data_Ingestion_Storage.md | 79 ++++ .codeboarding/Experiment_Data_Core.md | 223 ----------- .codeboarding/Expression_Matrix_Generation.md | 111 ------ .codeboarding/Image_Processing_Management.md | 181 --------- .codeboarding/LLM_Integration_Layer.json | 57 +++ .codeboarding/LLM_Integration_Layer.md | 52 +++ .codeboarding/Mask_Label_Management.md | 245 ------------ .codeboarding/Retrieval_Module.json | 38 ++ .codeboarding/Retrieval_Module.md | 34 ++ .codeboarding/Spot_Intensity_Analysis.md | 215 ----------- .codeboarding/User_Interface_UI_.json | 112 ++++++ .codeboarding/User_Interface_UI_.md | 71 ++++ .../Vector_Database_Knowledge_Base.json | 196 ++++++++++ .../Vector_Database_Knowledge_Base.md | 110 ++++++ .codeboarding/analysis.json | 351 ++++++++++++++++++ .codeboarding/codeboarding_version.json | 4 + .codeboarding/on_boarding.md | 261 +++++-------- 25 files changed, 1841 insertions(+), 1320 deletions(-) create mode 100644 .codeboarding/Agentic_Reasoning_External_Tools.json create mode 100644 .codeboarding/Agentic_Reasoning_External_Tools.md create mode 100644 .codeboarding/Asynchronous_Task_Worker.json create mode 100644 .codeboarding/Asynchronous_Task_Worker.md create mode 100644 .codeboarding/Backend_Core.json create mode 100644 .codeboarding/Backend_Core.md delete mode 100644 .codeboarding/Core_Infrastructure.md create mode 100644 .codeboarding/Data_Ingestion_Storage.json create mode 100644 .codeboarding/Data_Ingestion_Storage.md delete mode 100644 .codeboarding/Experiment_Data_Core.md delete mode 100644 .codeboarding/Expression_Matrix_Generation.md delete mode 100644 .codeboarding/Image_Processing_Management.md create mode 100644 .codeboarding/LLM_Integration_Layer.json create mode 100644 .codeboarding/LLM_Integration_Layer.md delete mode 100644 .codeboarding/Mask_Label_Management.md create mode 100644 .codeboarding/Retrieval_Module.json create mode 100644 .codeboarding/Retrieval_Module.md delete mode 100644 .codeboarding/Spot_Intensity_Analysis.md create mode 100644 .codeboarding/User_Interface_UI_.json create mode 100644 .codeboarding/User_Interface_UI_.md create mode 100644 .codeboarding/Vector_Database_Knowledge_Base.json create mode 100644 .codeboarding/Vector_Database_Knowledge_Base.md create mode 100644 .codeboarding/analysis.json create mode 100644 .codeboarding/codeboarding_version.json diff --git a/.codeboarding/Agentic_Reasoning_External_Tools.json b/.codeboarding/Agentic_Reasoning_External_Tools.json new file mode 100644 index 00000000..50f96375 --- /dev/null +++ b/.codeboarding/Agentic_Reasoning_External_Tools.json @@ -0,0 +1,151 @@ +{ + "description": "The Agentic Subsystem in DocsGPT is designed to enable intelligent, multi-step interactions by leveraging Large Language Models (LLMs) and external tools. At its core, the `Stream Processor` orchestrates the execution flow, initiating the `ReActAgent`. The `ReActAgent`, inheriting from `BaseAgent`, implements the ReAct pattern to reason, decide on actions, and execute them. It interacts with the `LLM Tool Call Handler` to process LLM outputs, which in turn utilizes the `ToolActionParser` to interpret tool calls. The `ToolManager` is responsible for loading and providing `Individual Tools` (such as `Elevenlabs TTS`) that the `ReActAgent` can invoke to perform specific actions, thereby extending the LLM's capabilities. This modular design ensures a clear separation of concerns, facilitating robust and extensible agentic behavior.", + "components": [ + { + "name": "BaseAgent", + "description": "Defines the abstract interface and common functionalities for all agents. It establishes the blueprint for how agents should process queries, interact with tools, and generate responses. It's fundamental as the base contract for any agent.", + "referenced_source_code": [ + { + "qualified_name": "application.agents.base.BaseAgent", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/base.py", + "reference_start_line": 19, + "reference_end_line": 326 + } + ], + "can_expand": true + }, + { + "name": "ReActAgent", + "description": "Implements the ReAct (Reasoning and Acting) pattern, enabling the LLM to perform complex, multi-step tasks. It orchestrates the iterative process of generating thoughts, deciding on actions (tool calls), executing them, and formulating a final answer. This is the core of the agentic reasoning.", + "referenced_source_code": [ + { + "qualified_name": "application.agents.react_agent.ReActAgent", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/react_agent.py", + "reference_start_line": 26, + "reference_end_line": 229 + } + ], + "can_expand": true + }, + { + "name": "ToolManager", + "description": "Manages the lifecycle and availability of external tools. It's responsible for discovering, loading, and providing access to the various tools that agents can utilize. Essential for agents to access their capabilities.", + "referenced_source_code": [ + { + "qualified_name": "application.agents.tools.tool_manager.ToolManager", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools/tool_manager.py", + "reference_start_line": 9, + "reference_end_line": 42 + } + ], + "can_expand": true + }, + { + "name": "ToolActionParser", + "description": "Interprets the raw output from the LLM to identify and parse tool calls, extracting the tool name and its arguments. This component is critical for translating the LLM's textual output into executable actions.", + "referenced_source_code": [ + { + "qualified_name": "application.agents.tools.tool_action_parser.ToolActionParser", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools/tool_action_parser.py", + "reference_start_line": 7, + "reference_end_line": 37 + } + ], + "can_expand": false + }, + { + "name": "Individual Tools", + "description": "Represents the collection of specific external tools, each encapsulating the logic for interacting with an external service or performing a distinct action (e.g., web search, TTS). These are the actual capabilities the agent leverages.", + "referenced_source_code": [ + { + "qualified_name": "application.agents.tools", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Elevenlabs TTS", + "description": "Provides Text-to-Speech functionality, allowing agents to generate spoken responses. This is a concrete example of an external tool, highlighting the subsystem's ability to integrate diverse external services.", + "referenced_source_code": [ + { + "qualified_name": "application.tts.elevenlabs.ElevenlabsTTS", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/tts/elevenlabs.py", + "reference_start_line": 9, + "reference_end_line": 66 + } + ], + "can_expand": false + }, + { + "name": "LLM Tool Call Handler", + "description": "Acts as the bridge between the raw LLM output and the agent's tool execution logic, specifically handling tool calls. It's crucial for the agent to correctly interpret and act upon the LLM's instructions for tool usage.", + "referenced_source_code": [ + { + "qualified_name": "application.llm.handlers.LLMToolCallHandler", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/handlers/base.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Stream Processor", + "description": "Initiates and orchestrates the agent's execution flow within the main application, particularly for streaming responses. It serves as the entry point for triggering the agent's reasoning and tool-use cycle.", + "referenced_source_code": [ + { + "qualified_name": "application.api.answer.services.stream_processor.StreamProcessor", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/answer/services/stream_processor.py", + "reference_start_line": 56, + "reference_end_line": 260 + } + ], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "inherits from", + "src_name": "ReActAgent", + "dst_name": "BaseAgent" + }, + { + "relation": "interacts with", + "src_name": "ReActAgent", + "dst_name": "LLM Tool Call Handler" + }, + { + "relation": "uses", + "src_name": "LLM Tool Call Handler", + "dst_name": "ToolActionParser" + }, + { + "relation": "invokes", + "src_name": "ReActAgent", + "dst_name": "Individual Tools" + }, + { + "relation": "invokes", + "src_name": "ReActAgent", + "dst_name": "Elevenlabs TTS" + }, + { + "relation": "provides tools to", + "src_name": "ToolManager", + "dst_name": "BaseAgent" + }, + { + "relation": "loads", + "src_name": "ToolManager", + "dst_name": "Individual Tools" + }, + { + "relation": "orchestrates", + "src_name": "Stream Processor", + "dst_name": "ReActAgent" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Agentic_Reasoning_External_Tools.md b/.codeboarding/Agentic_Reasoning_External_Tools.md new file mode 100644 index 00000000..3a2ec8bc --- /dev/null +++ b/.codeboarding/Agentic_Reasoning_External_Tools.md @@ -0,0 +1,101 @@ +```mermaid +graph LR + BaseAgent["BaseAgent"] + ReActAgent["ReActAgent"] + ToolManager["ToolManager"] + ToolActionParser["ToolActionParser"] + Individual_Tools["Individual Tools"] + Elevenlabs_TTS["Elevenlabs TTS"] + LLM_Tool_Call_Handler["LLM Tool Call Handler"] + Stream_Processor["Stream Processor"] + ReActAgent -- "inherits from" --> BaseAgent + ReActAgent -- "interacts with" --> LLM_Tool_Call_Handler + LLM_Tool_Call_Handler -- "uses" --> ToolActionParser + ReActAgent -- "invokes" --> Individual_Tools + ReActAgent -- "invokes" --> Elevenlabs_TTS + ToolManager -- "provides tools to" --> BaseAgent + ToolManager -- "loads" --> Individual_Tools + Stream_Processor -- "orchestrates" --> ReActAgent +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The Agentic Subsystem in DocsGPT is designed to enable intelligent, multi-step interactions by leveraging Large Language Models (LLMs) and external tools. At its core, the `Stream Processor` orchestrates the execution flow, initiating the `ReActAgent`. The `ReActAgent`, inheriting from `BaseAgent`, implements the ReAct pattern to reason, decide on actions, and execute them. It interacts with the `LLM Tool Call Handler` to process LLM outputs, which in turn utilizes the `ToolActionParser` to interpret tool calls. The `ToolManager` is responsible for loading and providing `Individual Tools` (such as `Elevenlabs TTS`) that the `ReActAgent` can invoke to perform specific actions, thereby extending the LLM's capabilities. This modular design ensures a clear separation of concerns, facilitating robust and extensible agentic behavior. + +### BaseAgent +Defines the abstract interface and common functionalities for all agents. It establishes the blueprint for how agents should process queries, interact with tools, and generate responses. It's fundamental as the base contract for any agent. + + +**Related Classes/Methods**: + +- `application.agents.base.BaseAgent`:19-326 + + +### ReActAgent +Implements the ReAct (Reasoning and Acting) pattern, enabling the LLM to perform complex, multi-step tasks. It orchestrates the iterative process of generating thoughts, deciding on actions (tool calls), executing them, and formulating a final answer. This is the core of the agentic reasoning. + + +**Related Classes/Methods**: + +- `application.agents.react_agent.ReActAgent`:26-229 + + +### ToolManager +Manages the lifecycle and availability of external tools. It's responsible for discovering, loading, and providing access to the various tools that agents can utilize. Essential for agents to access their capabilities. + + +**Related Classes/Methods**: + +- `application.agents.tools.tool_manager.ToolManager`:9-42 + + +### ToolActionParser +Interprets the raw output from the LLM to identify and parse tool calls, extracting the tool name and its arguments. This component is critical for translating the LLM's textual output into executable actions. + + +**Related Classes/Methods**: + +- `application.agents.tools.tool_action_parser.ToolActionParser`:7-37 + + +### Individual Tools +Represents the collection of specific external tools, each encapsulating the logic for interacting with an external service or performing a distinct action (e.g., web search, TTS). These are the actual capabilities the agent leverages. + + +**Related Classes/Methods**: + +- `application.agents.tools` + + +### Elevenlabs TTS +Provides Text-to-Speech functionality, allowing agents to generate spoken responses. This is a concrete example of an external tool, highlighting the subsystem's ability to integrate diverse external services. + + +**Related Classes/Methods**: + +- `application.tts.elevenlabs.ElevenlabsTTS`:9-66 + + +### LLM Tool Call Handler +Acts as the bridge between the raw LLM output and the agent's tool execution logic, specifically handling tool calls. It's crucial for the agent to correctly interpret and act upon the LLM's instructions for tool usage. + + +**Related Classes/Methods**: + +- `application.llm.handlers.LLMToolCallHandler` + + +### Stream Processor +Initiates and orchestrates the agent's execution flow within the main application, particularly for streaming responses. It serves as the entry point for triggering the agent's reasoning and tool-use cycle. + + +**Related Classes/Methods**: + +- `application.api.answer.services.stream_processor.StreamProcessor`:56-260 + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Asynchronous_Task_Worker.json b/.codeboarding/Asynchronous_Task_Worker.json new file mode 100644 index 00000000..ef092466 --- /dev/null +++ b/.codeboarding/Asynchronous_Task_Worker.json @@ -0,0 +1,56 @@ +{ + "description": "This subsystem is responsible for managing and executing long-running or computationally intensive tasks asynchronously, such as document ingestion, remote data synchronization, and agent webhooks. It prevents the main API from blocking, ensuring responsiveness and scalability for the RAG system.", + "components": [ + { + "name": "Task Orchestrator", + "description": "Defines and encapsulates the actual long-running, computationally intensive tasks essential for the RAG system's operation. These tasks offload heavy processing from the main application thread, ensuring responsiveness.", + "referenced_source_code": [ + { + "qualified_name": "application.worker", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/worker.py", + "reference_start_line": 1, + "reference_end_line": 9999 + } + ], + "can_expand": true + }, + { + "name": "Celery Application Initializer", + "description": "Responsible for bootstrapping and initializing the Celery application instance. It establishes the connection to the message broker and result backend, effectively setting up the runtime environment for asynchronous tasks.", + "referenced_source_code": [ + { + "qualified_name": "application.celery_init", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celery_init.py", + "reference_start_line": 1, + "reference_end_line": 9999 + } + ], + "can_expand": false + }, + { + "name": "Celery Configuration Manager", + "description": "Centralizes and provides all necessary configuration parameters for the Celery application. It ensures that the asynchronous system operates correctly by defining settings such as broker URLs, backend URLs, and task queues.", + "referenced_source_code": [ + { + "qualified_name": "application.celeryconfig", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celeryconfig.py", + "reference_start_line": 1, + "reference_end_line": 9999 + } + ], + "can_expand": false + } + ], + "components_relations": [ + { + "relation": "registers tasks with", + "src_name": "Task Orchestrator", + "dst_name": "Celery Application Initializer" + }, + { + "relation": "provides configuration to", + "src_name": "Celery Configuration Manager", + "dst_name": "Celery Application Initializer" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Asynchronous_Task_Worker.md b/.codeboarding/Asynchronous_Task_Worker.md new file mode 100644 index 00000000..fa98fced --- /dev/null +++ b/.codeboarding/Asynchronous_Task_Worker.md @@ -0,0 +1,45 @@ +```mermaid +graph LR + Task_Orchestrator["Task Orchestrator"] + Celery_Application_Initializer["Celery Application Initializer"] + Celery_Configuration_Manager["Celery Configuration Manager"] + Task_Orchestrator -- "registers tasks with" --> Celery_Application_Initializer + Celery_Configuration_Manager -- "provides configuration to" --> Celery_Application_Initializer +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +This subsystem is responsible for managing and executing long-running or computationally intensive tasks asynchronously, such as document ingestion, remote data synchronization, and agent webhooks. It prevents the main API from blocking, ensuring responsiveness and scalability for the RAG system. + +### Task Orchestrator +Defines and encapsulates the actual long-running, computationally intensive tasks essential for the RAG system's operation. These tasks offload heavy processing from the main application thread, ensuring responsiveness. + + +**Related Classes/Methods**: + +- `application.worker`:1-9999 + + +### Celery Application Initializer +Responsible for bootstrapping and initializing the Celery application instance. It establishes the connection to the message broker and result backend, effectively setting up the runtime environment for asynchronous tasks. + + +**Related Classes/Methods**: + +- `application.celery_init`:1-9999 + + +### Celery Configuration Manager +Centralizes and provides all necessary configuration parameters for the Celery application. It ensures that the asynchronous system operates correctly by defining settings such as broker URLs, backend URLs, and task queues. + + +**Related Classes/Methods**: + +- `application.celeryconfig`:1-9999 + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Backend_Core.json b/.codeboarding/Backend_Core.json new file mode 100644 index 00000000..f611593d --- /dev/null +++ b/.codeboarding/Backend_Core.json @@ -0,0 +1,112 @@ +{ + "description": "The Backend Core acts as the central entry point and orchestrator for the DocsGPT application, handling request routing, core application logic, authentication, and configuration.", + "components": [ + { + "name": "Application Orchestrator", + "description": "The main Flask application instance, serving as the primary orchestrator and central entry point for all incoming HTTP requests. It initializes the application and registers all API routes.", + "referenced_source_code": [ + { + "qualified_name": "application.app", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/app.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "API Answer Routes", + "description": "Defines and handles API endpoints specifically for processing user queries and initiating the AI-powered answer generation process, delegating to the RAG services.", + "referenced_source_code": [ + { + "qualified_name": "application.api.answer.routes", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/answer/routes", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "API User Routes", + "description": "Defines and handles API endpoints for user-specific functionalities, including document management, API key operations, and usage tracking.", + "referenced_source_code": [ + { + "qualified_name": "application.api.user.routes", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/user/routes.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Authentication & Authorization", + "description": "Manages user authentication and authorization, ensuring secure access to application functionalities and data.", + "referenced_source_code": [ + { + "qualified_name": "application.auth", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/auth.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": false + }, + { + "name": "Configuration Manager", + "description": "Centralized configuration management for application-wide parameters, environment variables, and secrets.", + "referenced_source_code": [ + { + "qualified_name": "application.core.settings", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/core/settings.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "registers", + "src_name": "Application Orchestrator", + "dst_name": "API Answer Routes" + }, + { + "relation": "registers", + "src_name": "Application Orchestrator", + "dst_name": "API User Routes" + }, + { + "relation": "utilizes", + "src_name": "Application Orchestrator", + "dst_name": "Configuration Manager" + }, + { + "relation": "relies on", + "src_name": "API Answer Routes", + "dst_name": "Authentication & Authorization" + }, + { + "relation": "relies on", + "src_name": "API User Routes", + "dst_name": "Authentication & Authorization" + }, + { + "relation": "provides services to", + "src_name": "Authentication & Authorization", + "dst_name": "API Answer Routes" + }, + { + "relation": "provides services to", + "src_name": "Authentication & Authorization", + "dst_name": "API User Routes" + }, + { + "relation": "provides configuration to", + "src_name": "Configuration Manager", + "dst_name": "Application Orchestrator" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Backend_Core.md b/.codeboarding/Backend_Core.md new file mode 100644 index 00000000..834e0e54 --- /dev/null +++ b/.codeboarding/Backend_Core.md @@ -0,0 +1,71 @@ +```mermaid +graph LR + Application_Orchestrator["Application Orchestrator"] + API_Answer_Routes["API Answer Routes"] + API_User_Routes["API User Routes"] + Authentication_Authorization["Authentication & Authorization"] + Configuration_Manager["Configuration Manager"] + Application_Orchestrator -- "registers" --> API_Answer_Routes + Application_Orchestrator -- "registers" --> API_User_Routes + Application_Orchestrator -- "utilizes" --> Configuration_Manager + API_Answer_Routes -- "relies on" --> Authentication_Authorization + API_User_Routes -- "relies on" --> Authentication_Authorization + Authentication_Authorization -- "provides services to" --> API_Answer_Routes + Authentication_Authorization -- "provides services to" --> API_User_Routes + Configuration_Manager -- "provides configuration to" --> Application_Orchestrator +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The Backend Core acts as the central entry point and orchestrator for the DocsGPT application, handling request routing, core application logic, authentication, and configuration. + +### Application Orchestrator +The main Flask application instance, serving as the primary orchestrator and central entry point for all incoming HTTP requests. It initializes the application and registers all API routes. + + +**Related Classes/Methods**: + +- `application.app` + + +### API Answer Routes +Defines and handles API endpoints specifically for processing user queries and initiating the AI-powered answer generation process, delegating to the RAG services. + + +**Related Classes/Methods**: + +- `application.api.answer.routes` + + +### API User Routes +Defines and handles API endpoints for user-specific functionalities, including document management, API key operations, and usage tracking. + + +**Related Classes/Methods**: + +- `application.api.user.routes` + + +### Authentication & Authorization +Manages user authentication and authorization, ensuring secure access to application functionalities and data. + + +**Related Classes/Methods**: + +- `application.auth` + + +### Configuration Manager +Centralized configuration management for application-wide parameters, environment variables, and secrets. + + +**Related Classes/Methods**: + +- `application.core.settings` + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Core_Infrastructure.md b/.codeboarding/Core_Infrastructure.md deleted file mode 100644 index 6121745c..00000000 --- a/.codeboarding/Core_Infrastructure.md +++ /dev/null @@ -1,169 +0,0 @@ -```mermaid - -graph LR - - Versioning_Component["Versioning Component"] - - Data_Types_and_Structures_Component["Data Types and Structures Component"] - - Configuration_Management_Component["Configuration Management Component"] - - Logging_and_System_Info_Component["Logging and System Info Component"] - - Execution_Orchestration_Component["Execution Orchestration Component"] - - Image_Level_Adjustment_Component["Image Level Adjustment Component"] - - Versioning_Component -- "provides version to" --> Logging_and_System_Info_Component - - Data_Types_and_Structures_Component -- "defines" --> Configuration_Management_Component - - Data_Types_and_Structures_Component -- "consumed by" --> Execution_Orchestration_Component - - Configuration_Management_Component -- "configures" --> Execution_Orchestration_Component - - Configuration_Management_Component -- "provides settings to" --> Logging_and_System_Info_Component - - Logging_and_System_Info_Component -- "records operations of" --> Execution_Orchestration_Component - - Logging_and_System_Info_Component -- "reports on" --> Data_Types_and_Structures_Component - - Execution_Orchestration_Component -- "uses" --> Configuration_Management_Component - - Execution_Orchestration_Component -- "logs activities via" --> Logging_and_System_Info_Component - - Image_Level_Adjustment_Component -- "processes" --> Data_Types_and_Structures_Component - - Image_Level_Adjustment_Component -- "configured by" --> Configuration_Management_Component - -``` - - - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - - - -## Details - - - -This section provides an overview of the `Core Infrastructure` components within the `starfish` project. These components are fundamental because they address the essential, cross-cutting concerns of the library, providing foundational services, defining core data representations, and offering basic utilities that higher-level modules depend on. They are not specific image processing algorithms but rather the underlying framework that enables the entire system to function robustly and consistently. - - - -### Versioning Component - -This component is responsible for programmatically determining and managing the version of the `starfish` software. It interacts with version control systems (like Git) to extract version information and format it according to standard conventions (e.g., PEP 440). This ensures that the software version can be consistently identified and reported. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core._version` (0:0) - - - - - -### Data Types and Structures Component - -This component defines the fundamental data models, structures (e.g., `DecodedSpots`, `SpotAttributes`, `ValidatedTable`), and enumerations (e.g., `Axes`, `Levels`, `Coordinates`) used throughout the `starfish` library. It ensures consistent data representation and interoperability across different modules and algorithms. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.types` (0:0) - - - - - -### Configuration Management Component - -This component handles the loading, parsing, and centralized management of application-wide configurations. It provides a flexible and centralized way to define and access parameters that control the behavior of various `starfish` algorithms and processes, supporting nested configuration structures. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.config` (0:0) - -- `starfish.core.util.config` (0:0) - - - - - -### Logging and System Info Component - -This component provides comprehensive logging capabilities for the `starfish` application, enabling detailed tracking of execution flow, warnings, and errors. It also gathers system and dependency information, which is vital for diagnostics, error reporting, and understanding the execution environment. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.util.logging` (0:0) - - - - - -### Execution Orchestration Component - -This component is responsible for orchestrating the execution flow of different processing stages within the `starfish` pipeline. It manages the sequence of operations, ensuring proper order and dependencies, and can integrate utilities like timing for performance monitoring and optimization. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.util.exec` (0:0) - - - - - -### Image Level Adjustment Component - -This component offers a set of utility functions specifically designed for adjusting the intensity levels of images. This is a common and fundamental preprocessing step in image analysis pipelines, used to enhance contrast, normalize data, or prepare images for subsequent processing. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.util.levels` (45:119) - - - - - - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Data_Ingestion_Storage.json b/.codeboarding/Data_Ingestion_Storage.json new file mode 100644 index 00000000..690afbfc --- /dev/null +++ b/.codeboarding/Data_Ingestion_Storage.json @@ -0,0 +1,116 @@ +{ + "description": "The DocsGPT system is designed around a modular data processing pipeline. It begins with the `Data Source Ingestion` component, responsible for acquiring raw data from diverse origins. This data then proceeds to `Document Chunking`, where it is segmented into optimized units. The `Embedding Pipeline` subsequently transforms these chunks into vector embeddings, which are crucial for knowledge representation. For persistent storage of these embeddings, the `Embedding Pipeline` interacts exclusively with the `Storage Abstraction` layer. This abstraction layer intelligently delegates storage operations to specific backends, such as `Local Storage` for local persistence or `S3 Storage` for scalable cloud-based storage, thereby ensuring a flexible and decoupled storage mechanism.", + "components": [ + { + "name": "Data Source Ingestion", + "description": "This logical component encompasses the initial ingestion of raw data. `application.parser.file` handles local file system inputs, while `application.parser.remote` manages data from external sources like GitHub or sitemaps. They are the entry points for all data into the system.", + "referenced_source_code": [ + { + "qualified_name": "application.parser.file", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/file", + "reference_start_line": 1, + "reference_end_line": 1 + }, + { + "qualified_name": "application.parser.remote", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/remote", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Document Chunking", + "description": "Responsible for breaking down large documents received from ingestion components into smaller, manageable chunks. This is critical for optimizing the data for embedding models and fitting within LLM context windows.", + "referenced_source_code": [ + { + "qualified_name": "application.parser.chunking", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/chunking.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Embedding Pipeline", + "description": "Orchestrates the conversion of text chunks into vector embeddings. This component is central to the data preparation process, bridging the gap between raw text and vector-based knowledge representation. It interacts with the `Storage Abstraction` for persistence.", + "referenced_source_code": [ + { + "qualified_name": "application.parser.embedding_pipeline", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/embedding_pipeline.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Storage Abstraction", + "description": "Acts as a factory or manager for abstracting different storage backends. It provides a unified interface for the rest of the system to interact with persistent storage, whether it's local or cloud-based. This promotes flexibility and extensibility in storage solutions by delegating to concrete implementations.", + "referenced_source_code": [ + { + "qualified_name": "application.storage.storage_creator", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/storage_creator.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": false + }, + { + "name": "Local Storage", + "description": "Implements the concrete logic for persistent storage and retrieval of files and data on the local file system. It's one of the specific storage backends supported by the system, managed by the `Storage Abstraction`.", + "referenced_source_code": [ + { + "qualified_name": "application.storage.local", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/local.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "S3 Storage", + "description": "Implements the concrete logic for persistent storage and retrieval using S3-compatible object storage services. This provides cloud-based, scalable storage capabilities, managed by the `Storage Abstraction`.", + "referenced_source_code": [ + { + "qualified_name": "application.storage.s3", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/s3.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "passes parsed documents to", + "src_name": "Data Source Ingestion", + "dst_name": "Document Chunking" + }, + { + "relation": "feeds text chunks to", + "src_name": "Document Chunking", + "dst_name": "Embedding Pipeline" + }, + { + "relation": "utilizes", + "src_name": "Embedding Pipeline", + "dst_name": "Storage Abstraction" + }, + { + "relation": "delegates to", + "src_name": "Storage Abstraction", + "dst_name": "Local Storage" + }, + { + "relation": "delegates to", + "src_name": "Storage Abstraction", + "dst_name": "S3 Storage" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Data_Ingestion_Storage.md b/.codeboarding/Data_Ingestion_Storage.md new file mode 100644 index 00000000..5933f982 --- /dev/null +++ b/.codeboarding/Data_Ingestion_Storage.md @@ -0,0 +1,79 @@ +```mermaid +graph LR + Data_Source_Ingestion["Data Source Ingestion"] + Document_Chunking["Document Chunking"] + Embedding_Pipeline["Embedding Pipeline"] + Storage_Abstraction["Storage Abstraction"] + Local_Storage["Local Storage"] + S3_Storage["S3 Storage"] + Data_Source_Ingestion -- "passes parsed documents to" --> Document_Chunking + Document_Chunking -- "feeds text chunks to" --> Embedding_Pipeline + Embedding_Pipeline -- "utilizes" --> Storage_Abstraction + Storage_Abstraction -- "delegates to" --> Local_Storage + Storage_Abstraction -- "delegates to" --> S3_Storage +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The DocsGPT system is designed around a modular data processing pipeline. It begins with the `Data Source Ingestion` component, responsible for acquiring raw data from diverse origins. This data then proceeds to `Document Chunking`, where it is segmented into optimized units. The `Embedding Pipeline` subsequently transforms these chunks into vector embeddings, which are crucial for knowledge representation. For persistent storage of these embeddings, the `Embedding Pipeline` interacts exclusively with the `Storage Abstraction` layer. This abstraction layer intelligently delegates storage operations to specific backends, such as `Local Storage` for local persistence or `S3 Storage` for scalable cloud-based storage, thereby ensuring a flexible and decoupled storage mechanism. + +### Data Source Ingestion +This logical component encompasses the initial ingestion of raw data. `application.parser.file` handles local file system inputs, while `application.parser.remote` manages data from external sources like GitHub or sitemaps. They are the entry points for all data into the system. + + +**Related Classes/Methods**: + +- `application.parser.file` +- `application.parser.remote` + + +### Document Chunking +Responsible for breaking down large documents received from ingestion components into smaller, manageable chunks. This is critical for optimizing the data for embedding models and fitting within LLM context windows. + + +**Related Classes/Methods**: + +- `application.parser.chunking` + + +### Embedding Pipeline +Orchestrates the conversion of text chunks into vector embeddings. This component is central to the data preparation process, bridging the gap between raw text and vector-based knowledge representation. It interacts with the `Storage Abstraction` for persistence. + + +**Related Classes/Methods**: + +- `application.parser.embedding_pipeline` + + +### Storage Abstraction +Acts as a factory or manager for abstracting different storage backends. It provides a unified interface for the rest of the system to interact with persistent storage, whether it's local or cloud-based. This promotes flexibility and extensibility in storage solutions by delegating to concrete implementations. + + +**Related Classes/Methods**: + +- `application.storage.storage_creator` + + +### Local Storage +Implements the concrete logic for persistent storage and retrieval of files and data on the local file system. It's one of the specific storage backends supported by the system, managed by the `Storage Abstraction`. + + +**Related Classes/Methods**: + +- `application.storage.local` + + +### S3 Storage +Implements the concrete logic for persistent storage and retrieval using S3-compatible object storage services. This provides cloud-based, scalable storage capabilities, managed by the `Storage Abstraction`. + + +**Related Classes/Methods**: + +- `application.storage.s3` + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Experiment_Data_Core.md b/.codeboarding/Experiment_Data_Core.md deleted file mode 100644 index 628ead1a..00000000 --- a/.codeboarding/Experiment_Data_Core.md +++ /dev/null @@ -1,223 +0,0 @@ -```mermaid - -graph LR - - Experiment["Experiment"] - - FieldOfView["FieldOfView"] - - ImageStack["ImageStack"] - - Codebook["Codebook"] - - StarfishConfig["StarfishConfig"] - - SpaceTx_Validator["SpaceTx Validator"] - - CropParameters["CropParameters"] - - TileCollectionData["TileCollectionData"] - - TileData["TileData"] - - Experiment -- "contains" --> FieldOfView - - Experiment -- "references" --> Codebook - - Experiment -- "uses" --> StarfishConfig - - Experiment -- "uses" --> SpaceTx_Validator - - FieldOfView -- "manages" --> ImageStack - - FieldOfView -- "uses" --> CropParameters - - ImageStack -- "uses" --> CropParameters - - ImageStack -- "composed of" --> TileCollectionData - - TileCollectionData -- "composed of" --> TileData - -``` - - - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - - - -## Details - - - -The `Experiment Data Core` subsystem in Starfish is designed to encapsulate and manage all data associated with a spatial transcriptomics experiment. It provides a structured, hierarchical representation of experimental data, from the top-level experiment down to individual image tiles, along with essential metadata and validation mechanisms. The chosen components are fundamental because they directly represent the data, define its structure, enable its interpretation, and ensure its integrity. - - - -### Experiment - -The top-level container representing an entire spatial transcriptomics experiment. It orchestrates access to all associated data, including multiple fields of view (FOVs) and the experiment's codebook. It provides methods for loading experiment data from standardized formats (e.g., JSON) and offers iterable access to its constituent FOVs. - - - - - -**Related Classes/Methods**: - - - -- `Experiment` (212:453) - - - - - -### FieldOfView - -Represents a single field of view within an experiment, corresponding to a specific spatial region imaged. It acts as a direct interface to the raw imaging data for that region, providing methods to retrieve individual images or entire image stacks. - - - - - -**Related Classes/Methods**: - - - -- `FieldOfView` (1:1) - - - - - -### ImageStack - -A multi-dimensional array-like structure that holds the actual image data (raw or processed fluorescent images). It provides functionalities for accessing, manipulating, and iterating over image data across various dimensions (e.g., channels, imaging rounds, Z-planes). It's a core component for any image-based processing. - - - - - -**Related Classes/Methods**: - - - -- `ImageStack` (1:1) - - - - - -### Codebook - -Stores the mapping between fluorescent probes (or imaging channels) and the specific gene targets they represent. This information is critical for decoding the spatial transcriptomics data, translating raw intensity measurements into gene expression profiles. It can be loaded from a JSON format. - - - - - -**Related Classes/Methods**: - - - -- `Codebook` (28:804) - - - - - -### StarfishConfig - -A centralized configuration management component for the Starfish application. It provides a structured way to store and retrieve various settings that influence the behavior of different parts of the software, including data loading, processing, and analysis. - - - - - -**Related Classes/Methods**: - - - -- `StarfishConfig` (1:1) - - - - - -### SpaceTx Validator - -A utility component responsible for validating the structure and content of experiment data against the SpaceTx format specification. This ensures that the input data adheres to predefined standards, promoting interoperability and data quality. - - - - - -**Related Classes/Methods**: - - - -- `SpaceTx Validator` (1:1) - - - - - -### CropParameters - -A data structure that defines the parameters for cropping image data. It specifies the region of interest to be extracted from a larger image stack, enabling focused analysis on specific parts of the field of view. - - - - - -**Related Classes/Methods**: - - - -- `CropParameters` (1:1) - - - - - -### TileCollectionData - -An internal class that manages the underlying data storage and access for collections of individual image tiles. It serves as a foundational layer for `ImageStack`, handling the organization and retrieval of image data chunks. - - - - - -**Related Classes/Methods**: - - - -- `TileCollectionData` (1:1) - - - - - -### TileData - -An internal class that manages the underlying data storage and access for individual image tiles. It represents a single chunk of image data and is a building block for `TileCollectionData`. - - - - - -**Related Classes/Methods**: - - - -- `TileData` (1:1) - - - - - - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Expression_Matrix_Generation.md b/.codeboarding/Expression_Matrix_Generation.md deleted file mode 100644 index 4a02e22f..00000000 --- a/.codeboarding/Expression_Matrix_Generation.md +++ /dev/null @@ -1,111 +0,0 @@ -```mermaid - -graph LR - - ExpressionMatrix["ExpressionMatrix"] - - DecodedIntensityTable["DecodedIntensityTable"] - - ExpressionMatrixConcatenation["ExpressionMatrixConcatenation"] - - TryImportUtility["TryImportUtility"] - - DecodedIntensityTable -- "provides data to" --> ExpressionMatrix - - ExpressionMatrix -- "uses" --> TryImportUtility - - ExpressionMatrixConcatenation -- "operates on" --> ExpressionMatrix - -``` - - - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - - - -## Details - - - -This subsystem is responsible for the creation, management, and output of expression matrices, which quantify gene expression levels. These matrices serve as the final, standardized data output for subsequent biological analysis within the `starfish` framework. - - - -### ExpressionMatrix - -This is the central data structure for storing and manipulating quantitative gene expression data. It encapsulates the gene expression levels, along with associated metadata, and provides core functionalities for loading data from various sources and saving it into standardized formats (e.g., Loom, AnnData). It represents the final, processed output of the gene expression quantification pipeline. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.expression_matrix.expression_matrix:ExpressionMatrix` (6:93) - - - - - -### DecodedIntensityTable - -This component represents the gene expression intensities after the decoding process. It serves as a crucial intermediate data structure, holding the quantitative measurements of gene expression for identified spots or cells, which are then used to construct the final `ExpressionMatrix`. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.intensity_table.decoded_intensity_table:DecodedIntensityTable` (15:190) - - - - - -### ExpressionMatrixConcatenation - -This component provides the functionality to combine multiple `ExpressionMatrix` objects into a single, larger, unified expression matrix. This is essential for integrating gene expression data from different fields of view, experimental replicates, or samples, enabling a comprehensive analysis across a larger dataset. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.expression_matrix.concatenate:ExpressionMatrixConcatenation` (1:1) - - - - - -### TryImportUtility - -This is a utility module designed to safely attempt the import of Python modules. In the context of `ExpressionMatrix`, its primary role is to manage optional dependencies required for specific functionalities, such as saving the expression matrix to external file formats like Loom or AnnData. It ensures that these features can be used if the necessary libraries are installed, without causing errors if they are not. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.util.try_import:TryImportUtility` (1:1) - - - - - - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Image_Processing_Management.md b/.codeboarding/Image_Processing_Management.md deleted file mode 100644 index 939d9011..00000000 --- a/.codeboarding/Image_Processing_Management.md +++ /dev/null @@ -1,181 +0,0 @@ -```mermaid - -graph LR - - ImageStack_Data_Structure["ImageStack Data Structure"] - - Image_Data_Parsers["Image Data Parsers"] - - Image_Cropping_Parameters["Image Cropping Parameters"] - - Image_Filtering_Algorithms["Image Filtering Algorithms"] - - Image_Segmentation_Algorithms["Image Segmentation Algorithms"] - - Image_Registration_Algorithms["Image Registration Algorithms"] - - Image_Data_Parsers -- "provides data to" --> ImageStack_Data_Structure - - Image_Data_Parsers -- "uses" --> Image_Cropping_Parameters - - Image_Filtering_Algorithms -- "processes" --> ImageStack_Data_Structure - - Image_Segmentation_Algorithms -- "processes" --> ImageStack_Data_Structure - - Image_Registration_Algorithms -- "processes" --> ImageStack_Data_Structure - - ImageStack_Data_Structure -- "is processed by" --> Image_Filtering_Algorithms - - ImageStack_Data_Structure -- "is processed by" --> Image_Segmentation_Algorithms - - ImageStack_Data_Structure -- "is processed by" --> Image_Registration_Algorithms - -``` - - - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - - - -## Details - - - -This subsystem is responsible for the core handling, manipulation, and processing of multi-dimensional image data within the `starfish` project. It provides the foundational data structures and algorithms necessary for various image analysis tasks, from loading raw data to applying advanced transformations. - - - -### ImageStack Data Structure - -The fundamental data structure representing a multi-dimensional image. It provides methods for accessing, slicing, and basic manipulation of image data, serving as the primary input and output for image processing operations. It is designed to hold raw or processed fluorescent images for an experiment or a FieldOfView. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.imagestack.imagestack.ImageStack` (67:1273) - - - - - -### Image Data Parsers - -A collection of modules and classes responsible for reading and converting raw image data from various external sources (e.g., numpy arrays, tile fetchers, tilesets) into the internal `TileData` and `TileCollectionData` structures. These intermediate structures are then used to construct `ImageStack` objects. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.imagestack.parser._tiledata` (1:1) - -- `starfish.core.imagestack.parser.crop` (1:1) - -- `starfish.core.imagestack.parser.numpy` (1:1) - -- `starfish.core.imagestack.parser.tilefetcher._parser` (1:1) - -- `starfish.core.imagestack.parser.tileset._parser` (1:1) - - - - - -### Image Cropping Parameters - -Defines the parameters and logic for cropping or slicing image data. This component is utilized by the `Image Parsers` to extract specific regions of interest from larger image stacks, optimizing memory usage and processing time by only loading and processing necessary data. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.imagestack.parser.crop.CropParameters` (10:240) - - - - - -### Image Filtering Algorithms - -A collection of algorithms that apply various filtering techniques (e.g., bandpass, Gaussian, Laplace, deconvolution) to `ImageStack` objects to enhance or modify image data. These algorithms typically take an `ImageStack` as input and produce a new, filtered `ImageStack`. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.image.Filter._base.FilterAlgorithm` (7:12) - -- `starfish.core.image.Filter.bandpass` (1:1) - -- `starfish.core.image.Filter.gaussian_low_pass` (1:1) - -- `starfish.core.image.Filter.richardson_lucy_deconvolution` (1:1) - - - - - -### Image Segmentation Algorithms - -Provides algorithms for identifying and delineating distinct objects or regions within an image. A prominent example is the Watershed algorithm, which is used to separate touching objects. These algorithms typically take an `ImageStack` as input and produce a segmented output, often in the form of a mask or labeled image. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.image.Segment._base.SegmentAlgorithm` (7:17) - -- `starfish.core.image.Segment.watershed` (1:1) - - - - - -### Image Registration Algorithms - -Contains algorithms for aligning multiple images or image stacks to correct for spatial distortions, shifts, or rotations. This is crucial for integrating data from different acquisition rounds or fields of view. It includes algorithms for learning transformations (e.g., Translation) and applying them (e.g., Warp). - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.image._registration._base` (1:1) - -- `starfish.core.image._registration.ApplyTransform` (1:1) - -- `starfish.core.image._registration.LearnTransform` (1:1) - - - - - - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/LLM_Integration_Layer.json b/.codeboarding/LLM_Integration_Layer.json new file mode 100644 index 00000000..06a02966 --- /dev/null +++ b/.codeboarding/LLM_Integration_Layer.json @@ -0,0 +1,57 @@ +{ + "description": "The LLM subsystem is designed around a core LLM Abstraction that provides a consistent interface for interacting with various LLM Provider Implementations. The LLM Creator is responsible for dynamically instantiating the appropriate LLM Provider Implementation based on configuration. Once an LLM interaction occurs, the LLM Response Handler orchestrates the subsequent message processing, including parsing responses and managing tool calls. The LLM Handler Creator facilitates the selection and instantiation of the correct LLM Response Handler. This architecture ensures modularity, extensibility, and LLM agnosticism by centralizing interactions through the LLM Abstraction and separating response processing logic.", + "components": [ + { + "name": "LLM Abstraction", + "description": "Serves as the central contract for all LLM interactions, abstracting away the specifics of different LLM Provider Implementations.", + "referenced_source_code": [], + "can_expand": true + }, + { + "name": "LLM Creator", + "description": "Acts as a factory, providing instances of concrete LLM Provider Implementations.", + "referenced_source_code": [], + "can_expand": true + }, + { + "name": "LLM Provider Implementations", + "description": "Concrete implementations that adhere to the LLM Abstraction.", + "referenced_source_code": [], + "can_expand": true + }, + { + "name": "LLM Response Handler", + "description": "Manages the entire post-response flow, including parsing, tool call execution, and message preparation.", + "referenced_source_code": [], + "can_expand": true + }, + { + "name": "LLM Handler Creator", + "description": "Dynamically provides the appropriate LLM Response Handler instance.", + "referenced_source_code": [], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "instantiates", + "src_name": "LLM Creator", + "dst_name": "LLM Provider Implementations" + }, + { + "relation": "implements", + "src_name": "LLM Provider Implementations", + "dst_name": "LLM Abstraction" + }, + { + "relation": "interacts with", + "src_name": "LLM Response Handler", + "dst_name": "LLM Abstraction" + }, + { + "relation": "instantiates", + "src_name": "LLM Handler Creator", + "dst_name": "LLM Response Handler" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/LLM_Integration_Layer.md b/.codeboarding/LLM_Integration_Layer.md new file mode 100644 index 00000000..09dead0f --- /dev/null +++ b/.codeboarding/LLM_Integration_Layer.md @@ -0,0 +1,52 @@ +```mermaid +graph LR + LLM_Abstraction["LLM Abstraction"] + LLM_Creator["LLM Creator"] + LLM_Provider_Implementations["LLM Provider Implementations"] + LLM_Response_Handler["LLM Response Handler"] + LLM_Handler_Creator["LLM Handler Creator"] + LLM_Creator -- "instantiates" --> LLM_Provider_Implementations + LLM_Provider_Implementations -- "implements" --> LLM_Abstraction + LLM_Response_Handler -- "interacts with" --> LLM_Abstraction + LLM_Handler_Creator -- "instantiates" --> LLM_Response_Handler +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The LLM subsystem is designed around a core LLM Abstraction that provides a consistent interface for interacting with various LLM Provider Implementations. The LLM Creator is responsible for dynamically instantiating the appropriate LLM Provider Implementation based on configuration. Once an LLM interaction occurs, the LLM Response Handler orchestrates the subsequent message processing, including parsing responses and managing tool calls. The LLM Handler Creator facilitates the selection and instantiation of the correct LLM Response Handler. This architecture ensures modularity, extensibility, and LLM agnosticism by centralizing interactions through the LLM Abstraction and separating response processing logic. + +### LLM Abstraction +Serves as the central contract for all LLM interactions, abstracting away the specifics of different LLM Provider Implementations. + + +**Related Classes/Methods**: _None_ + +### LLM Creator +Acts as a factory, providing instances of concrete LLM Provider Implementations. + + +**Related Classes/Methods**: _None_ + +### LLM Provider Implementations +Concrete implementations that adhere to the LLM Abstraction. + + +**Related Classes/Methods**: _None_ + +### LLM Response Handler +Manages the entire post-response flow, including parsing, tool call execution, and message preparation. + + +**Related Classes/Methods**: _None_ + +### LLM Handler Creator +Dynamically provides the appropriate LLM Response Handler instance. + + +**Related Classes/Methods**: _None_ + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Mask_Label_Management.md b/.codeboarding/Mask_Label_Management.md deleted file mode 100644 index 6a222971..00000000 --- a/.codeboarding/Mask_Label_Management.md +++ /dev/null @@ -1,245 +0,0 @@ -```mermaid - -graph LR - - BinaryMaskCollection["BinaryMaskCollection"] - - LabelImage["LabelImage"] - - SegmentationMaskCollection["SegmentationMaskCollection"] - - BinarizeAlgorithm["BinarizeAlgorithm"] - - ThresholdBinarize["ThresholdBinarize"] - - SegmentAlgorithm["SegmentAlgorithm"] - - WatershedSegment["WatershedSegment"] - - FilterAlgorithm["FilterAlgorithm"] - - AreaFilter["AreaFilter"] - - MergeAlgorithm["MergeAlgorithm"] - - SegmentationMaskCollection -- "specializes" --> BinaryMaskCollection - - ThresholdBinarize -- "implements" --> BinarizeAlgorithm - - WatershedSegment -- "implements" --> SegmentAlgorithm - - AreaFilter -- "implements" --> FilterAlgorithm - - BinarizeAlgorithm -- "produces" --> BinaryMaskCollection - - SegmentAlgorithm -- "produces" --> LabelImage - - LabelImage -- "can be converted to" --> BinaryMaskCollection - - FilterAlgorithm -- "operates on" --> BinaryMaskCollection - - FilterAlgorithm -- "operates on" --> LabelImage - - MergeAlgorithm -- "combines" --> BinaryMaskCollection - -``` - - - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - - - -## Details - - - -The `Mask & Label Management` subsystem in `starfish` is responsible for the creation, manipulation, and processing of binary masks, labeled images, and segmentation masks. It provides a robust framework for various morphological operations, ensuring data consistency and extensibility through well-defined interfaces and specialized data structures. - - - -### BinaryMaskCollection - -This is the foundational data structure for storing and manipulating collections of binary masks. It allows for the creation of masks from various sources (arrays, label images, external files) and provides methods for accessing, cropping, and reducing individual masks. It also manages the normalization of pixel and physical coordinate systems, ensuring spatial consistency. - - - - - -**Related Classes/Methods**: - - - -- `BinaryMaskCollection` (0:0) - - - - - -### LabelImage - -Represents an image where each distinct object or region is assigned a unique integer label. It supports creation from arrays and coordinate ticks and can be converted into a `BinaryMaskCollection`. This component is crucial for representing segmented regions before they are converted into binary masks. - - - - - -**Related Classes/Methods**: - - - -- `LabelImage` (28:167) - - - - - -### SegmentationMaskCollection - -A specialized subclass of `BinaryMaskCollection` tailored specifically for handling segmentation masks. It inherits all functionalities from its parent and adds specific methods relevant to segmentation, such as loading from compressed archives, making it the primary data structure for segmentation results. - - - - - -**Related Classes/Methods**: - - - -- `SegmentationMaskCollection` (0:0) - - - - - -### BinarizeAlgorithm - -Defines the abstract interface for all binarization algorithms. These algorithms convert an input image into a binary mask, typically by applying a threshold. This abstraction allows for different binarization methods to be implemented and used interchangeably. - - - - - -**Related Classes/Methods**: - - - -- `BinarizeAlgorithm` (0:0) - - - - - -### ThresholdBinarize - -A concrete implementation of `BinarizeAlgorithm` that performs binarization by applying a threshold to an input image, resulting in a `BinaryMaskCollection`. This is a common and essential binarization technique. - - - - - -**Related Classes/Methods**: - - - -- `ThresholdBinarize` (0:0) - - - - - -### SegmentAlgorithm - -Defines the abstract interface for segmentation algorithms, which are responsible for identifying and delineating distinct objects or regions within an image. This provides a common contract for various segmentation approaches. - - - - - -**Related Classes/Methods**: - - - -- `SegmentAlgorithm` (0:0) - - - - - -### WatershedSegment - -A concrete implementation of `SegmentAlgorithm` that uses the watershed algorithm for image segmentation. It takes an image and produces a `LabelImage`, which can then be converted to a `BinaryMaskCollection`. This algorithm is a powerful tool for separating touching objects. - - - - - -**Related Classes/Methods**: - - - -- `WatershedSegment` (0:0) - - - - - -### FilterAlgorithm - -Defines the abstract interface for filtering algorithms that operate on `BinaryMaskCollection` or `LabelImage` objects to refine, select, or modify regions based on certain criteria (e.g., size, shape). This allows for post-processing of masks and labels. - - - - - -**Related Classes/Methods**: - - - -- `FilterAlgorithm` (0:0) - - - - - -### AreaFilter - -A concrete implementation of `FilterAlgorithm` that filters `BinaryMaskCollection` objects based on the area of the individual masks. This is a practical example of how filtering can be applied to refine segmentation results. - - - - - -**Related Classes/Methods**: - - - -- `AreaFilter` (0:0) - - - - - -### MergeAlgorithm - -Defines the abstract interface for algorithms that combine multiple `BinaryMaskCollection` objects into a single, unified collection. This is essential for integrating masks from different sources or processing steps. - - - - - -**Related Classes/Methods**: - - - -- `MergeAlgorithm` (0:0) - - - - - - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Retrieval_Module.json b/.codeboarding/Retrieval_Module.json new file mode 100644 index 00000000..3ef9c695 --- /dev/null +++ b/.codeboarding/Retrieval_Module.json @@ -0,0 +1,38 @@ +{ + "description": "The Retrieval Module is a core subsystem responsible for efficiently fetching the most relevant document chunks from the Vector Database based on user queries. It acts as the bridge between the user's information need and the contextual data required by the Large Language Model (LLM) for generating responses. Its boundaries encompass the logic for query processing, interaction with the knowledge base, and preparation of retrieved content.", + "components": [ + { + "name": "Retriever Creator", + "description": "This component serves as a factory for instantiating various retriever implementations. It abstracts the creation process, allowing other parts of the system to obtain a retriever instance (e.g., `Classic RAG Retriever`) without needing to know the specific concrete class or its initialization details. This design promotes modularity, extensibility, and supports the project's \"LLM Agnosticism\" and \"Modularity\" architectural biases by enabling easy swapping or addition of different retrieval strategies.", + "referenced_source_code": [ + { + "qualified_name": "Retriever Creator", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/retriever_creator.py", + "reference_start_line": 1, + "reference_end_line": 9999 + } + ], + "can_expand": true + }, + { + "name": "Classic RAG Retriever", + "description": "This component embodies a concrete and fundamental retrieval strategy within the RAG pipeline. It encapsulates the core logic for transforming a user query into an effective search query, interacting with the Vector Database to retrieve the most relevant document chunks, and preparing these chunks as contextual information for the LLM. It represents the \"how\" of fetching information in a standard RAG flow.", + "referenced_source_code": [ + { + "qualified_name": "Classic RAG Retriever", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/classic_rag.py", + "reference_start_line": 1, + "reference_end_line": 9999 + } + ], + "can_expand": false + } + ], + "components_relations": [ + { + "relation": "instantiates", + "src_name": "Retriever Creator", + "dst_name": "Classic RAG Retriever" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Retrieval_Module.md b/.codeboarding/Retrieval_Module.md new file mode 100644 index 00000000..534c86d9 --- /dev/null +++ b/.codeboarding/Retrieval_Module.md @@ -0,0 +1,34 @@ +```mermaid +graph LR + Retriever_Creator["Retriever Creator"] + Classic_RAG_Retriever["Classic RAG Retriever"] + Retriever_Creator -- "instantiates" --> Classic_RAG_Retriever +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The Retrieval Module is a core subsystem responsible for efficiently fetching the most relevant document chunks from the Vector Database based on user queries. It acts as the bridge between the user's information need and the contextual data required by the Large Language Model (LLM) for generating responses. Its boundaries encompass the logic for query processing, interaction with the knowledge base, and preparation of retrieved content. + +### Retriever Creator +This component serves as a factory for instantiating various retriever implementations. It abstracts the creation process, allowing other parts of the system to obtain a retriever instance (e.g., `Classic RAG Retriever`) without needing to know the specific concrete class or its initialization details. This design promotes modularity, extensibility, and supports the project's "LLM Agnosticism" and "Modularity" architectural biases by enabling easy swapping or addition of different retrieval strategies. + + +**Related Classes/Methods**: + +- `Retriever Creator`:1-9999 + + +### Classic RAG Retriever +This component embodies a concrete and fundamental retrieval strategy within the RAG pipeline. It encapsulates the core logic for transforming a user query into an effective search query, interacting with the Vector Database to retrieve the most relevant document chunks, and preparing these chunks as contextual information for the LLM. It represents the "how" of fetching information in a standard RAG flow. + + +**Related Classes/Methods**: + +- `Classic RAG Retriever`:1-9999 + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Spot_Intensity_Analysis.md b/.codeboarding/Spot_Intensity_Analysis.md deleted file mode 100644 index e96b546a..00000000 --- a/.codeboarding/Spot_Intensity_Analysis.md +++ /dev/null @@ -1,215 +0,0 @@ -```mermaid - -graph LR - - Raw_Image_Data["Raw Image Data"] - - Codebook["Codebook"] - - Spot_Detection_Algorithms["Spot Detection Algorithms"] - - Spot_Intensity_Metadata["Spot Intensity & Metadata"] - - Spot_Decoding_Algorithms["Spot Decoding Algorithms"] - - Pixel_Level_Detection_Decoding["Pixel-Level Detection & Decoding"] - - Segmentation_Masks["Segmentation Masks"] - - Spot_Assignment_Algorithms["Spot Assignment Algorithms"] - - Raw_Image_Data -- "is processed by" --> Spot_Detection_Algorithms - - Raw_Image_Data -- "is processed by" --> Pixel_Level_Detection_Decoding - - Spot_Detection_Algorithms -- "produces" --> Spot_Intensity_Metadata - - Codebook -- "is used by" --> Spot_Decoding_Algorithms - - Codebook -- "is used by" --> Pixel_Level_Detection_Decoding - - Spot_Intensity_Metadata -- "is consumed by" --> Spot_Decoding_Algorithms - - Spot_Decoding_Algorithms -- "produces decoded data for" --> Spot_Intensity_Metadata - - Pixel_Level_Detection_Decoding -- "produces" --> Spot_Intensity_Metadata - - Spot_Intensity_Metadata -- "is consumed by" --> Spot_Assignment_Algorithms - - Segmentation_Masks -- "is used by" --> Spot_Assignment_Algorithms - - Spot_Assignment_Algorithms -- "produces spatially assigned data for" --> Spot_Intensity_Metadata - -``` - - - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - - - -## Details - - - -This subsystem is responsible for the core computational steps of identifying, quantifying, and decoding individual RNA molecules (spots) within microscopy images, and subsequently assigning them to specific biological targets or regions. It transforms raw image data into quantitative expression profiles. - - - -### Raw Image Data - -The primary input data structure representing the multi-dimensional raw image data acquired from the experiment. It serves as the foundational source of pixel intensities upon which all spot analysis operations are performed. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.imagestack.imagestack` (0:0) - - - - - -### Codebook - -A critical reference data structure containing the expected intensity profiles (barcodes) for known biological targets (e.g., genes). It is used by decoding algorithms to assign identities to detected spots based on their measured intensity patterns. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.codebook.codebook` (0:0) - - - - - -### Spot Detection Algorithms - -A collection of algorithms (e.g., BlobDetector, LocalMaxPeakFinder, TrackpyLocalMaxPeakFinder) responsible for identifying and localizing potential spots within the ImageStack. They output initial spot candidates and their raw intensity properties. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.spots.FindSpots` (0:0) - - - - - -### Spot Intensity & Metadata - -These are central data structures that store the quantitative and descriptive information about detected and decoded spots. IntensityTable holds raw intensity profiles, DecodedIntensityTable extends this with assigned biological identities, SpotAttributes stores spatial coordinates and other properties, and DecodedSpots aggregates the final decoded spot information. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.intensity_table.intensity_table` (0:0) - -- `starfish.core.intensity_table.decoded_intensity_table` (0:0) - -- `starfish.core.types._spot_attributes` (0:0) - -- `starfish.core.types._decoded_spots` (0:0) - - - - - -### Spot Decoding Algorithms - -A suite of algorithms (e.g., CheckAll, MetricDistance, PerRoundMaxChannel, SimpleLookupDecoder) that take the raw intensity profiles from IntensityTable and, using the Codebook, assign a biological identity (e.g., gene name) to each spot. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.spots.DecodeSpots` (0:0) - - - - - -### Pixel-Level Detection & Decoding - -Algorithms (e.g., PixelSpotDecoder) that perform spot detection and decoding directly at the pixel level. This can be an alternative or complementary approach to FindSpots and DecodeSpots, especially useful for dense or overlapping spots. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.spots.DetectPixels` (0:0) - - - - - -### Segmentation Masks - -A data structure representing segmented biological regions (e.g., cells, nuclei) as binary masks. These masks are used to associate detected and decoded spots with specific anatomical or cellular contexts. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.morphology.binary_mask.binary_mask` (0:0) - -- `starfish.core.segmentation_mask.segmentation_mask` (0:0) - - - - - -### Spot Assignment Algorithms - -Algorithms (e.g., Label) that assign decoded spots to specific biological targets or regions, often by overlaying them with segmented regions represented by Segmentation Masks. - - - - - -**Related Classes/Methods**: - - - -- `starfish.core.spots.AssignTargets` (0:0) - - - - - - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/User_Interface_UI_.json b/.codeboarding/User_Interface_UI_.json new file mode 100644 index 00000000..486a146a --- /dev/null +++ b/.codeboarding/User_Interface_UI_.json @@ -0,0 +1,112 @@ +{ + "description": "The DocsGPT application is structured around a core `App` component that orchestrates the user interface and manages the overall application flow. It renders key functional components such as `Conversation`, `Upload`, and `Settings`. The `Conversation` component provides the interactive chat interface, allowing users to query the RAG system and view responses. The `Upload` component handles the backend processing of document ingestion, integrating new knowledge into the system. The `Settings` component centralizes the application's configuration, providing critical parameters for LLM providers, API keys, and vector store settings, which are utilized by other components like `Upload`. Additionally, the `DocsGPTWidget` offers an embeddable version of the chat functionality, leveraging the core logic of the `Conversation` component for external website integration. This architecture ensures a modular and maintainable system, with clear responsibilities and interaction pathways between components.", + "components": [ + { + "name": "App", + "description": "Acts as the main application orchestrator and entry point. It manages global UI state (e.g., authentication status, theming), handles client-side routing, and defines the overall layout and navigation structure of the DocsGPT application. It ensures a cohesive user experience across different functionalities.", + "referenced_source_code": [ + { + "qualified_name": "App", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/chatwoot/app.py", + "reference_start_line": 48, + "reference_end_line": 48 + } + ], + "can_expand": true + }, + { + "name": "Conversation", + "description": "Manages the core interactive chat interface. This component is central to the RAG system's user interaction, displaying conversation history, allowing users to input queries, and presenting the generated responses from the backend. It also facilitates user feedback on responses, which is crucial for iterative improvement of the RAG model.", + "referenced_source_code": [ + { + "qualified_name": "Conversation", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/chatwoot/app.py", + "reference_start_line": 61, + "reference_end_line": 61 + } + ], + "can_expand": false + }, + { + "name": "Upload", + "description": "Handles the backend logic for ingesting new documents into the RAG system's knowledge base. It processes file uploads, manages different ingestor types, and interacts with the parsing and vector store components to store the document content.", + "referenced_source_code": [ + { + "qualified_name": "Upload", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/user/routes.py", + "reference_start_line": 164, + "reference_end_line": 184 + } + ], + "can_expand": true + }, + { + "name": "Settings", + "description": "Manages application-wide configurations and environment variables. This component is responsible for loading and providing access to various settings, including LLM provider details, API keys, vector store configurations, and other operational parameters that influence the RAG system's behavior.", + "referenced_source_code": [ + { + "qualified_name": "Settings", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/elasticsearch.py", + "reference_start_line": 121, + "reference_end_line": 121 + } + ], + "can_expand": true + }, + { + "name": "DocsGPTWidget", + "description": "Serves as an embeddable DocsGPT chat widget, designed for seamless integration into external websites. It encapsulates core chat functionality, providing a portable and lightweight experience of the DocsGPT RAG system.", + "referenced_source_code": [ + { + "qualified_name": "DocsGPTWidget", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/react-widget/src/components/DocsGPTWidget.tsx", + "reference_start_line": 554, + "reference_end_line": 586 + } + ], + "can_expand": false + } + ], + "components_relations": [ + { + "relation": "renders", + "src_name": "App", + "dst_name": "Conversation" + }, + { + "relation": "renders", + "src_name": "App", + "dst_name": "Upload" + }, + { + "relation": "renders", + "src_name": "App", + "dst_name": "Settings" + }, + { + "relation": "manages state for", + "src_name": "App", + "dst_name": "Conversation" + }, + { + "relation": "manages state for", + "src_name": "App", + "dst_name": "Upload" + }, + { + "relation": "manages state for", + "src_name": "App", + "dst_name": "Settings" + }, + { + "relation": "leverages core logic from", + "src_name": "DocsGPTWidget", + "dst_name": "Conversation" + }, + { + "relation": "configures via", + "src_name": "Upload", + "dst_name": "Settings" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/User_Interface_UI_.md b/.codeboarding/User_Interface_UI_.md new file mode 100644 index 00000000..3e67ca70 --- /dev/null +++ b/.codeboarding/User_Interface_UI_.md @@ -0,0 +1,71 @@ +```mermaid +graph LR + App["App"] + Conversation["Conversation"] + Upload["Upload"] + Settings["Settings"] + DocsGPTWidget["DocsGPTWidget"] + App -- "renders" --> Conversation + App -- "renders" --> Upload + App -- "renders" --> Settings + App -- "manages state for" --> Conversation + App -- "manages state for" --> Upload + App -- "manages state for" --> Settings + DocsGPTWidget -- "leverages core logic from" --> Conversation + Upload -- "configures via" --> Settings +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The DocsGPT application is structured around a core `App` component that orchestrates the user interface and manages the overall application flow. It renders key functional components such as `Conversation`, `Upload`, and `Settings`. The `Conversation` component provides the interactive chat interface, allowing users to query the RAG system and view responses. The `Upload` component handles the backend processing of document ingestion, integrating new knowledge into the system. The `Settings` component centralizes the application's configuration, providing critical parameters for LLM providers, API keys, and vector store settings, which are utilized by other components like `Upload`. Additionally, the `DocsGPTWidget` offers an embeddable version of the chat functionality, leveraging the core logic of the `Conversation` component for external website integration. This architecture ensures a modular and maintainable system, with clear responsibilities and interaction pathways between components. + +### App +Acts as the main application orchestrator and entry point. It manages global UI state (e.g., authentication status, theming), handles client-side routing, and defines the overall layout and navigation structure of the DocsGPT application. It ensures a cohesive user experience across different functionalities. + + +**Related Classes/Methods**: + +- `App` + + +### Conversation +Manages the core interactive chat interface. This component is central to the RAG system's user interaction, displaying conversation history, allowing users to input queries, and presenting the generated responses from the backend. It also facilitates user feedback on responses, which is crucial for iterative improvement of the RAG model. + + +**Related Classes/Methods**: + +- `Conversation` + + +### Upload +Handles the backend logic for ingesting new documents into the RAG system's knowledge base. It processes file uploads, manages different ingestor types, and interacts with the parsing and vector store components to store the document content. + + +**Related Classes/Methods**: + +- `Upload`:164-184 + + +### Settings +Manages application-wide configurations and environment variables. This component is responsible for loading and providing access to various settings, including LLM provider details, API keys, vector store configurations, and other operational parameters that influence the RAG system's behavior. + + +**Related Classes/Methods**: + +- `Settings` + + +### DocsGPTWidget +Serves as an embeddable DocsGPT chat widget, designed for seamless integration into external websites. It encapsulates core chat functionality, providing a portable and lightweight experience of the DocsGPT RAG system. + + +**Related Classes/Methods**: + +- `DocsGPTWidget`:554-586 + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Vector_Database_Knowledge_Base.json b/.codeboarding/Vector_Database_Knowledge_Base.json new file mode 100644 index 00000000..c1c3e749 --- /dev/null +++ b/.codeboarding/Vector_Database_Knowledge_Base.json @@ -0,0 +1,196 @@ +{ + "description": "The feedback correctly identified an issue with the VectorStoreCreator component's source file reference. The original analysis had FileRef: None, which has been corrected to FileRef: /home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/vector_creator.py based on the readFile tool's output. It appears there was a typo in the original QName and FileRef for VectorStoreCreator, as the correct file is vector_creator.py not vectorstore_creator.py. The VectorCreator (formerly VectorStoreCreator) acts as a factory, centralizing the creation of various vector store instances. This design pattern allows the application to dynamically switch between different vector store backends (e.g., FAISS, MongoDB, PGVector) at runtime, promoting modularity and decoupling.", + "components": [ + { + "name": "VectorStoreBase", + "description": "Defines the abstract interface for all vector store operations, including adding documents, performing similarity searches, and managing embedding model configurations. It establishes the contract for how any vector store backend should behave, ensuring extensibility and interchangeability.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.base.VectorStoreBase", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/base.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": false + }, + { + "name": "VectorCreator", + "description": "Centralizes the logic for creating instances of specific vector store implementations (e.g., FAISS, MongoDB, PGVector) based on system configuration. This factory pattern enables dynamic backend switching at runtime and promotes modularity by decoupling the client from concrete vector store classes.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.vector_creator.VectorCreator", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/vector_creator.py", + "reference_start_line": 9, + "reference_end_line": 24 + } + ], + "can_expand": false + }, + { + "name": "EmbeddingsWrapper", + "description": "Encapsulates the logic for interacting with various embedding models. Its primary function is to convert textual data into high-dimensional numerical vectors (embeddings) that can be stored in the vector database and used for similarity search. This component abstracts away the specifics of different embedding providers.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.base.EmbeddingsWrapper", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/base.py", + "reference_start_line": 7, + "reference_end_line": 24 + } + ], + "can_expand": true + }, + { + "name": "FAISSVectorStore", + "description": "Provides a concrete implementation of the VectorStoreBase interface, leveraging the FAISS library for efficient similarity search on locally stored vector indexes. It's suitable for smaller-scale deployments or local development.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.faiss.FAISSVectorStore", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/faiss.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "MongoDBVectorStore", + "description": "Provides a concrete implementation of the VectorStoreBase interface, utilizing MongoDB as the backend for storing documents and their associated embeddings. This allows for scalable, document-oriented storage with vector search capabilities.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.mongodb.MongoDBVectorStore", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/mongodb.py", + "reference_start_line": 7, + "reference_end_line": 177 + } + ], + "can_expand": false + }, + { + "name": "PGVectorStore", + "description": "Provides a concrete implementation of the VectorStoreBase interface, integrating with PostgreSQL databases via the PGVector extension. This enables storing and querying embeddings directly within a robust relational database system.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.pgvector.PGVectorStore", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/pgvector.py", + "reference_start_line": 8, + "reference_end_line": 303 + } + ], + "can_expand": true + }, + { + "name": "ElasticsearchVectorStore", + "description": "Provides a concrete implementation of the VectorStoreBase interface, leveraging Elasticsearch for scalable, distributed storage and search of documents and their embeddings. It's well-suited for large datasets and complex query capabilities.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.elasticsearch.ElasticsearchVectorStore", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/elasticsearch.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "LanceDBVectorStore", + "description": "Provides a concrete implementation of the VectorStoreBase interface, utilizing LanceDB for efficient, local, and serverless vector storage. It's designed for high-performance similarity search on embedded data.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.lancedb.LanceDBVectorStore", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/lancedb.py", + "reference_start_line": 6, + "reference_end_line": 119 + } + ], + "can_expand": false + } + ], + "components_relations": [ + { + "relation": "utilizes", + "src_name": "VectorStoreBase", + "dst_name": "EmbeddingsWrapper" + }, + { + "relation": "implements", + "src_name": "FAISSVectorStore", + "dst_name": "VectorStoreBase" + }, + { + "relation": "implements", + "src_name": "MongoDBVectorStore", + "dst_name": "VectorStoreBase" + }, + { + "relation": "implements", + "src_name": "PGVectorStore", + "dst_name": "VectorStoreBase" + }, + { + "relation": "implements", + "src_name": "ElasticsearchVectorStore", + "dst_name": "VectorStoreBase" + }, + { + "relation": "implements", + "src_name": "LanceDBVectorStore", + "dst_name": "VectorStoreBase" + }, + { + "relation": "creates", + "src_name": "VectorCreator", + "dst_name": "FAISSVectorStore" + }, + { + "relation": "creates", + "src_name": "VectorCreator", + "dst_name": "MongoDBVectorStore" + }, + { + "relation": "creates", + "src_name": "VectorCreator", + "dst_name": "PGVectorStore" + }, + { + "relation": "creates", + "src_name": "VectorCreator", + "dst_name": "ElasticsearchVectorStore" + }, + { + "relation": "creates", + "src_name": "VectorCreator", + "dst_name": "LanceDBVectorStore" + }, + { + "relation": "relies on", + "src_name": "VectorCreator", + "dst_name": "VectorStoreBase" + }, + { + "relation": "utilizes", + "src_name": "FAISSVectorStore", + "dst_name": "EmbeddingsWrapper" + }, + { + "relation": "utilizes", + "src_name": "MongoDBVectorStore", + "dst_name": "EmbeddingsWrapper" + }, + { + "relation": "utilizes", + "src_name": "PGVectorStore", + "dst_name": "EmbeddingsWrapper" + }, + { + "relation": "utilizes", + "src_name": "ElasticsearchVectorStore", + "dst_name": "EmbeddingsWrapper" + }, + { + "relation": "utilizes", + "src_name": "LanceDBVectorStore", + "dst_name": "EmbeddingsWrapper" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Vector_Database_Knowledge_Base.md b/.codeboarding/Vector_Database_Knowledge_Base.md new file mode 100644 index 00000000..1b5598a9 --- /dev/null +++ b/.codeboarding/Vector_Database_Knowledge_Base.md @@ -0,0 +1,110 @@ +```mermaid +graph LR + VectorStoreBase["VectorStoreBase"] + VectorCreator["VectorCreator"] + EmbeddingsWrapper["EmbeddingsWrapper"] + FAISSVectorStore["FAISSVectorStore"] + MongoDBVectorStore["MongoDBVectorStore"] + PGVectorStore["PGVectorStore"] + ElasticsearchVectorStore["ElasticsearchVectorStore"] + LanceDBVectorStore["LanceDBVectorStore"] + VectorStoreBase -- "utilizes" --> EmbeddingsWrapper + FAISSVectorStore -- "implements" --> VectorStoreBase + MongoDBVectorStore -- "implements" --> VectorStoreBase + PGVectorStore -- "implements" --> VectorStoreBase + ElasticsearchVectorStore -- "implements" --> VectorStoreBase + LanceDBVectorStore -- "implements" --> VectorStoreBase + VectorCreator -- "creates" --> FAISSVectorStore + VectorCreator -- "creates" --> MongoDBVectorStore + VectorCreator -- "creates" --> PGVectorStore + VectorCreator -- "creates" --> ElasticsearchVectorStore + VectorCreator -- "creates" --> LanceDBVectorStore + VectorCreator -- "relies on" --> VectorStoreBase + FAISSVectorStore -- "utilizes" --> EmbeddingsWrapper + MongoDBVectorStore -- "utilizes" --> EmbeddingsWrapper + PGVectorStore -- "utilizes" --> EmbeddingsWrapper + ElasticsearchVectorStore -- "utilizes" --> EmbeddingsWrapper + LanceDBVectorStore -- "utilizes" --> EmbeddingsWrapper +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The feedback correctly identified an issue with the VectorStoreCreator component's source file reference. The original analysis had FileRef: None, which has been corrected to FileRef: /home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/vector_creator.py based on the readFile tool's output. It appears there was a typo in the original QName and FileRef for VectorStoreCreator, as the correct file is vector_creator.py not vectorstore_creator.py. The VectorCreator (formerly VectorStoreCreator) acts as a factory, centralizing the creation of various vector store instances. This design pattern allows the application to dynamically switch between different vector store backends (e.g., FAISS, MongoDB, PGVector) at runtime, promoting modularity and decoupling. + +### VectorStoreBase +Defines the abstract interface for all vector store operations, including adding documents, performing similarity searches, and managing embedding model configurations. It establishes the contract for how any vector store backend should behave, ensuring extensibility and interchangeability. + + +**Related Classes/Methods**: + +- `application.vectorstore.base.VectorStoreBase` + + +### VectorCreator +Centralizes the logic for creating instances of specific vector store implementations (e.g., FAISS, MongoDB, PGVector) based on system configuration. This factory pattern enables dynamic backend switching at runtime and promotes modularity by decoupling the client from concrete vector store classes. + + +**Related Classes/Methods**: + +- `application.vectorstore.vector_creator.VectorCreator`:9-24 + + +### EmbeddingsWrapper +Encapsulates the logic for interacting with various embedding models. Its primary function is to convert textual data into high-dimensional numerical vectors (embeddings) that can be stored in the vector database and used for similarity search. This component abstracts away the specifics of different embedding providers. + + +**Related Classes/Methods**: + +- `application.vectorstore.base.EmbeddingsWrapper`:7-24 + + +### FAISSVectorStore +Provides a concrete implementation of the VectorStoreBase interface, leveraging the FAISS library for efficient similarity search on locally stored vector indexes. It's suitable for smaller-scale deployments or local development. + + +**Related Classes/Methods**: + +- `application.vectorstore.faiss.FAISSVectorStore` + + +### MongoDBVectorStore +Provides a concrete implementation of the VectorStoreBase interface, utilizing MongoDB as the backend for storing documents and their associated embeddings. This allows for scalable, document-oriented storage with vector search capabilities. + + +**Related Classes/Methods**: + +- `application.vectorstore.mongodb.MongoDBVectorStore`:7-177 + + +### PGVectorStore +Provides a concrete implementation of the VectorStoreBase interface, integrating with PostgreSQL databases via the PGVector extension. This enables storing and querying embeddings directly within a robust relational database system. + + +**Related Classes/Methods**: + +- `application.vectorstore.pgvector.PGVectorStore`:8-303 + + +### ElasticsearchVectorStore +Provides a concrete implementation of the VectorStoreBase interface, leveraging Elasticsearch for scalable, distributed storage and search of documents and their embeddings. It's well-suited for large datasets and complex query capabilities. + + +**Related Classes/Methods**: + +- `application.vectorstore.elasticsearch.ElasticsearchVectorStore` + + +### LanceDBVectorStore +Provides a concrete implementation of the VectorStoreBase interface, utilizing LanceDB for efficient, local, and serverless vector storage. It's designed for high-performance similarity search on embedded data. + + +**Related Classes/Methods**: + +- `application.vectorstore.lancedb.LanceDBVectorStore`:6-119 + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/analysis.json b/.codeboarding/analysis.json new file mode 100644 index 00000000..3d591380 --- /dev/null +++ b/.codeboarding/analysis.json @@ -0,0 +1,351 @@ +{ + "description": "DocsGPT operates on a clear client-server architecture, with the User Interface (UI) serving as the primary interaction point. User requests are sent to the Backend Core, which acts as the central orchestrator. The Backend Core handles routing, authentication, and core application logic. For long-running operations like document ingestion, tasks are enqueued to the Asynchronous Task Worker.\n\nWhen a user query requires information retrieval, the Backend Core interacts with the Retrieval Module, which in turn queries the Vector Database / Knowledge Base to fetch relevant document chunks. The retrieved context, along with the user's query, is then forwarded to the LLM Integration Layer. This layer provides a unified interface for various Large Language Models. For complex tasks, the LLM Integration Layer can delegate to the Agentic Reasoning & External Tools component, which leverages external tools and APIs to fulfill the request.\n\nThe Data Ingestion & Storage component is responsible for processing and storing documents, including parsing, chunking, and embedding, before they are stored in the Vector Database / Knowledge Base. Finally, the LLM Integration Layer sends the generated answers back to the Backend Core, which then relays them to the User Interface. This architecture ensures a scalable, modular, and efficient flow of data and operations within the DocsGPT system.", + "components": [ + { + "name": "User Interface (UI)", + "description": "The interactive frontend for users to engage with DocsGPT, encompassing chat functionalities, document management, and application settings.", + "referenced_source_code": [ + { + "qualified_name": "frontend.src.App", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/App.tsx", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "frontend.src.conversation.Conversation", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/conversation/Conversation.tsx", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "frontend.src.upload.Upload", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/upload/Upload.tsx", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "frontend.src.settings.index", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/settings/index.tsx", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "extensions.react_widget.src.components.DocsGPTWidget", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/react-widget/src/components/DocsGPTWidget.tsx", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Backend Core", + "description": "Acts as the central entry point for all frontend requests, routing them to appropriate backend services, and managing core application logic, authentication, and configuration.", + "referenced_source_code": [ + { + "qualified_name": "application.app", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/app.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.api.answer.routes", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/answer/routes", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.api.user.routes", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/user/routes.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.auth", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/auth.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.core.settings", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/core/settings.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Data Ingestion & Storage", + "description": "Handles the entire lifecycle of data preparation, including loading, parsing, chunking, and embedding various data sources, and manages the persistent storage and retrieval of raw and processed files.", + "referenced_source_code": [ + { + "qualified_name": "application.parser.embedding_pipeline", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/embedding_pipeline.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.parser.chunking", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/chunking.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.parser.file", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/file", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.parser.remote", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/remote", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.storage.storage_creator", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/storage_creator.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.storage.s3", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/s3.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.storage.local", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/local.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Vector Database / Knowledge Base", + "description": "Serves as the persistent storage for embedded document chunks, enabling efficient semantic search and acting as the system's primary knowledge repository. Supports multiple backend implementations.", + "referenced_source_code": [ + { + "qualified_name": "application.vectorstore.base", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/base.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.vectorstore.faiss", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/faiss.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.vectorstore.mongodb", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/mongodb.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.vectorstore.pgvector", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/pgvector.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.vectorstore.vectorstore_creator", + "reference_file": null, + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Retrieval Module", + "description": "Focuses on fetching the most relevant document chunks from the Vector Database based on user queries, preparing the contextual information required by the LLM.", + "referenced_source_code": [ + { + "qualified_name": "application.retriever.classic_rag", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/classic_rag.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.retriever.retriever_creator", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/retriever_creator.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "LLM Integration Layer", + "description": "Provides a unified abstraction for interacting with diverse Large Language Models (LLMs), managing model selection, message formatting, and handling streaming or batch responses.", + "referenced_source_code": [ + { + "qualified_name": "application.llm.base", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/base.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.llm.llm_creator", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/llm_creator.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.llm.openai", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/openai.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.llm.google_ai", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/google_ai.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.llm.anthropic", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/anthropic.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.llm.handlers", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/handlers", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Agentic Reasoning & External Tools", + "description": "Empowers the LLM to execute complex, multi-step tasks by breaking them down into sub-problems and leveraging a suite of external tools and APIs (e.g., web search, TTS) to gather information or perform actions.", + "referenced_source_code": [ + { + "qualified_name": "application.agents.base", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/base.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.agents.react_agent", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/react_agent.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.agents.agent_creator", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/agent_creator.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.agents.tools", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.tts.elevenlabs", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/tts/elevenlabs.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Asynchronous Task Worker", + "description": "Manages and executes long-running or computationally intensive tasks asynchronously (e.g., document ingestion, remote data synchronization, agent webhooks), preventing blocking of the main API.", + "referenced_source_code": [ + { + "qualified_name": "application.worker", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/worker.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.celery_init", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celery_init.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "application.celeryconfig", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celeryconfig.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "sends User Requests to", + "src_name": "User Interface (UI)", + "dst_name": "Backend Core" + }, + { + "relation": "sends Generated Responses to", + "src_name": "Backend Core", + "dst_name": "User Interface (UI)" + }, + { + "relation": "enqueues Background Tasks to", + "src_name": "Backend Core", + "dst_name": "Asynchronous Task Worker" + }, + { + "relation": "sends User Queries to", + "src_name": "Backend Core", + "dst_name": "Retrieval Module" + }, + { + "relation": "forwards User Queries & Context to", + "src_name": "Backend Core", + "dst_name": "LLM Integration Layer" + }, + { + "relation": "triggers Document Processing & Storage in", + "src_name": "Asynchronous Task Worker", + "dst_name": "Data Ingestion & Storage" + }, + { + "relation": "stores Embedded Chunks in", + "src_name": "Data Ingestion & Storage", + "dst_name": "Vector Database / Knowledge Base" + }, + { + "relation": "queries for Relevant Documents from", + "src_name": "Retrieval Module", + "dst_name": "Vector Database / Knowledge Base" + }, + { + "relation": "requests Context from", + "src_name": "LLM Integration Layer", + "dst_name": "Retrieval Module" + }, + { + "relation": "delegates Tool Execution to", + "src_name": "LLM Integration Layer", + "dst_name": "Agentic Reasoning & External Tools" + }, + { + "relation": "returns Tool Results to", + "src_name": "Agentic Reasoning & External Tools", + "dst_name": "LLM Integration Layer" + }, + { + "relation": "sends Generated Answers to", + "src_name": "LLM Integration Layer", + "dst_name": "Backend Core" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/codeboarding_version.json b/.codeboarding/codeboarding_version.json new file mode 100644 index 00000000..0c496dc7 --- /dev/null +++ b/.codeboarding/codeboarding_version.json @@ -0,0 +1,4 @@ +{ + "commit_hash": "c68273706ca6b3e1bf7f7fb67b00a2c84bf9ad2c", + "code_boarding_version": "0.1.0" +} \ No newline at end of file diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md index f1b79467..49716e40 100644 --- a/.codeboarding/on_boarding.md +++ b/.codeboarding/on_boarding.md @@ -1,235 +1,144 @@ ```mermaid - graph LR - - Experiment_Data_Core["Experiment Data Core"] - - Data_Ingestion_Validation["Data Ingestion & Validation"] - - Image_Processing_Management["Image Processing & Management"] - - Spot_Intensity_Analysis["Spot & Intensity Analysis"] - - Mask_Label_Management["Mask & Label Management"] - - Expression_Matrix_Generation["Expression Matrix Generation"] - - Core_Infrastructure["Core Infrastructure"] - - Data_Ingestion_Validation -- "provides data to" --> Experiment_Data_Core - - Experiment_Data_Core -- "provides experimental data to" --> Image_Processing_Management - - Data_Ingestion_Validation -- "loads and validates codebooks for" --> Spot_Intensity_Analysis - - Image_Processing_Management -- "provides image data to" --> Spot_Intensity_Analysis - - Image_Processing_Management -- "outputs masks to" --> Mask_Label_Management - - Mask_Label_Management -- "provides masks to" --> Spot_Intensity_Analysis - - Spot_Intensity_Analysis -- "provides data to" --> Expression_Matrix_Generation - - click Experiment_Data_Core href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Experiment_Data_Core.md" "Details" - - click Image_Processing_Management href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Image_Processing_Management.md" "Details" - - click Spot_Intensity_Analysis href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Spot_Intensity_Analysis.md" "Details" - - click Mask_Label_Management href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Mask_Label_Management.md" "Details" - - click Expression_Matrix_Generation href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Expression_Matrix_Generation.md" "Details" - - click Core_Infrastructure href "https://github.com/spacetx/starfish/blob/master/.codeboarding//Core_Infrastructure.md" "Details" - + User_Interface_UI_["User Interface (UI)"] + Backend_Core["Backend Core"] + Data_Ingestion_Storage["Data Ingestion & Storage"] + Vector_Database_Knowledge_Base["Vector Database / Knowledge Base"] + Retrieval_Module["Retrieval Module"] + LLM_Integration_Layer["LLM Integration Layer"] + Agentic_Reasoning_External_Tools["Agentic Reasoning & External Tools"] + Asynchronous_Task_Worker["Asynchronous Task Worker"] + User_Interface_UI_ -- "sends User Requests to" --> Backend_Core + Backend_Core -- "sends Generated Responses to" --> User_Interface_UI_ + Backend_Core -- "enqueues Background Tasks to" --> Asynchronous_Task_Worker + Backend_Core -- "sends User Queries to" --> Retrieval_Module + Backend_Core -- "forwards User Queries & Context to" --> LLM_Integration_Layer + Asynchronous_Task_Worker -- "triggers Document Processing & Storage in" --> Data_Ingestion_Storage + Data_Ingestion_Storage -- "stores Embedded Chunks in" --> Vector_Database_Knowledge_Base + Retrieval_Module -- "queries for Relevant Documents from" --> Vector_Database_Knowledge_Base + LLM_Integration_Layer -- "requests Context from" --> Retrieval_Module + LLM_Integration_Layer -- "delegates Tool Execution to" --> Agentic_Reasoning_External_Tools + Agentic_Reasoning_External_Tools -- "returns Tool Results to" --> LLM_Integration_Layer + LLM_Integration_Layer -- "sends Generated Answers to" --> Backend_Core + click User_Interface_UI_ href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/User_Interface_UI_.md" "Details" + click Backend_Core href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Backend_Core.md" "Details" + click Data_Ingestion_Storage href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Data_Ingestion_Storage.md" "Details" + click Vector_Database_Knowledge_Base href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Vector_Database_Knowledge_Base.md" "Details" + click Retrieval_Module href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Retrieval_Module.md" "Details" + click LLM_Integration_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/LLM_Integration_Layer.md" "Details" + click Agentic_Reasoning_External_Tools href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Agentic_Reasoning_External_Tools.md" "Details" + click Asynchronous_Task_Worker href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Asynchronous_Task_Worker.md" "Details" ``` - - [![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - - ## Details +DocsGPT operates on a clear client-server architecture, with the User Interface (UI) serving as the primary interaction point. User requests are sent to the Backend Core, which acts as the central orchestrator. The Backend Core handles routing, authentication, and core application logic. For long-running operations like document ingestion, tasks are enqueued to the Asynchronous Task Worker. +When a user query requires information retrieval, the Backend Core interacts with the Retrieval Module, which in turn queries the Vector Database / Knowledge Base to fetch relevant document chunks. The retrieved context, along with the user's query, is then forwarded to the LLM Integration Layer. This layer provides a unified interface for various Large Language Models. For complex tasks, the LLM Integration Layer can delegate to the Agentic Reasoning & External Tools component, which leverages external tools and APIs to fulfill the request. -High-level architecture overview of the `starfish` project, detailing its main components, their responsibilities, associated source code, and inter-component data flow and relationships. - - - -### Experiment Data Core [[Expand]](./Experiment_Data_Core.md) - -The central data structure representing a spatial transcriptomics experiment. It encapsulates fields of view, image stacks, and associated metadata, serving as the primary container for all experimental data. - - +The Data Ingestion & Storage component is responsible for processing and storing documents, including parsing, chunking, and embedding, before they are stored in the Vector Database / Knowledge Base. Finally, the LLM Integration Layer sends the generated answers back to the Backend Core, which then relays them to the User Interface. This architecture ensures a scalable, modular, and efficient flow of data and operations within the DocsGPT system. +### User Interface (UI) [[Expand]](./User_Interface_UI_.md) +The interactive frontend for users to engage with DocsGPT, encompassing chat functionalities, document management, and application settings. **Related Classes/Methods**: +- `frontend.src.App` +- `frontend.src.conversation.Conversation` +- `frontend.src.upload.Upload` +- `frontend.src.settings.index` +- `extensions.react_widget.src.components.DocsGPTWidget` -- `starfish.core.experiment.experiment.Experiment` (212:453) - -- `starfish.core.experiment.experiment.FieldOfView` (32:193) - - - - - -### Data Ingestion & Validation - -Responsible for loading various spatial transcriptomics datasets into the Experiment Data Core and validating their structure and content against SpaceTx schemas. It also handles the loading and validation of codebooks. - - - +### Backend Core [[Expand]](./Backend_Core.md) +Acts as the central entry point for all frontend requests, routing them to appropriate backend services, and managing core application logic, authentication, and configuration. **Related Classes/Methods**: +- `application.app` +- `application.api.answer.routes` +- `application.api.user.routes` +- `application.auth` +- `application.core.settings` -- `starfish.data.MERFISH` (3:26) - -- `starfish.data.ISS` (96:119) - -- `starfish.core.spacetx_format.util.SpaceTxValidator` (26:202) - -- `starfish.core.spacetx_format.validate_sptx.validate` (17:75) - -- `starfish.core.codebook.codebook.Codebook` (28:804) - - - - - -### Image Processing & Management [[Expand]](./Image_Processing_Management.md) - -Manages multi-dimensional image data, offering functionalities for loading, slicing, and basic transformations. It also applies various image processing techniques such as filtering, segmentation, and registration to image stacks. - - - +### Data Ingestion & Storage [[Expand]](./Data_Ingestion_Storage.md) +Handles the entire lifecycle of data preparation, including loading, parsing, chunking, and embedding various data sources, and manages the persistent storage and retrieval of raw and processed files. **Related Classes/Methods**: +- `application.parser.embedding_pipeline` +- `application.parser.chunking` +- `application.parser.file` +- `application.parser.remote` +- `application.storage.storage_creator` +- `application.storage.s3` +- `application.storage.local` -- `starfish.core.imagestack.imagestack.ImageStack` (67:1273) - -- `starfish.core.imagestack.parser` (0:0) - -- `starfish.core.image.Filter` (0:0) - -- `starfish.core.image.Segment` (0:0) - -- `starfish.core.image._registration` (0:0) - - - - - -### Spot & Intensity Analysis [[Expand]](./Spot_Intensity_Analysis.md) - -Contains algorithms for identifying potential spots (e.g., RNA molecules), decoding their intensity profiles using a codebook, detecting spots at the pixel level, and assigning decoded spots to specific biological targets or regions. It also manages the measured intensity values for detected spots. - - - +### Vector Database / Knowledge Base [[Expand]](./Vector_Database_Knowledge_Base.md) +Serves as the persistent storage for embedded document chunks, enabling efficient semantic search and acting as the system's primary knowledge repository. Supports multiple backend implementations. **Related Classes/Methods**: +- `application.vectorstore.base` +- `application.vectorstore.faiss` +- `application.vectorstore.mongodb` +- `application.vectorstore.pgvector` -- `starfish.core.spots.FindSpots` (0:0) - -- `starfish.core.spots.DecodeSpots` (0:0) - -- `starfish.core.spots.DetectPixels` (0:0) - -- `starfish.core.spots.AssignTargets` (0:0) - -- `starfish.core.intensity_table.intensity_table.IntensityTable` (26:455) - -- `starfish.core.intensity_table.decoded_intensity_table.DecodedIntensityTable` (15:190) - - - - - -### Mask & Label Management [[Expand]](./Mask_Label_Management.md) - -Manages and processes collections of binary masks, labeled images, and segmentation masks, which represent segmented regions or objects. It includes functionalities for binarization, filtering, merging, and general morphological operations. - - - +### Retrieval Module [[Expand]](./Retrieval_Module.md) +Focuses on fetching the most relevant document chunks from the Vector Database based on user queries, preparing the contextual information required by the LLM. **Related Classes/Methods**: +- `application.retriever.classic_rag` +- `application.retriever.retriever_creator` -- `starfish.core.morphology.binary_mask.binary_mask.BinaryMaskCollection` (48:760) - -- `starfish.core.morphology.label_image.label_image.LabelImage` (28:167) - -- `starfish.core.segmentation_mask.segmentation_mask.SegmentationMaskCollection` (15:48) - -- `starfish.core.morphology.Binarize` (0:0) - -- `starfish.core.morphology.Filter` (0:0) - -- `starfish.core.morphology.Merge` (0:0) - -- `starfish.core.morphology.Segment` (0:0) - - - - - -### Expression Matrix Generation [[Expand]](./Expression_Matrix_Generation.md) - -Creates and manages expression matrices, which quantify gene expression levels within defined regions or cells, serving as the final output for downstream biological analysis. - - - +### LLM Integration Layer [[Expand]](./LLM_Integration_Layer.md) +Provides a unified abstraction for interacting with diverse Large Language Models (LLMs), managing model selection, message formatting, and handling streaming or batch responses. **Related Classes/Methods**: +- `application.llm.base` +- `application.llm.llm_creator` +- `application.llm.openai` +- `application.llm.google_ai` +- `application.llm.anthropic` +- `application.llm.handlers` -- `starfish.core.expression_matrix.expression_matrix.ExpressionMatrix` (6:93) - - - - - -### Core Infrastructure [[Expand]](./Core_Infrastructure.md) - -Provides foundational utility functions for configuration management, logging, versioning, and defines fundamental data structures and constants used throughout the starfish library, ensuring consistent data representation. - - - +### Agentic Reasoning & External Tools [[Expand]](./Agentic_Reasoning_External_Tools.md) +Empowers the LLM to execute complex, multi-step tasks by breaking them down into sub-problems and leveraging a suite of external tools and APIs (e.g., web search, TTS) to gather information or perform actions. **Related Classes/Methods**: +- `application.agents.base` +- `application.agents.react_agent` +- `application.agents.agent_creator` +- `application.agents.tools` +- `application.tts.elevenlabs` -- `starfish.core.config.StarfishConfig` (0:0) - -- `starfish.core.util` (46:50) - -- `starfish.core._version` (0:0) - -- `starfish.core.types` (0:0) - - +### Asynchronous Task Worker [[Expand]](./Asynchronous_Task_Worker.md) +Manages and executes long-running or computationally intensive tasks asynchronously (e.g., document ingestion, remote data synchronization, agent webhooks), preventing blocking of the main API. +**Related Classes/Methods**: +- `application.worker` +- `application.celery_init` +- `application.celeryconfig` From 6885ca565d0379e2db492e6f8c23953a7680343c Mon Sep 17 00:00:00 2001 From: ivanmilevtues Date: Wed, 13 Aug 2025 11:59:47 +0200 Subject: [PATCH 3/4] Updated graphics --- .../Agentic_Reasoning_External_Tools.json | 151 --------- .../Agentic_Reasoning_External_Tools.md | 101 ------ .codeboarding/Asynchronous_Task_Worker.json | 56 ---- .codeboarding/Asynchronous_Task_Worker.md | 45 --- .codeboarding/Backend_Core.json | 112 ------- .codeboarding/Backend_Core.md | 71 ----- .codeboarding/Core_Data_Structures.json | 91 ++++++ .codeboarding/Core_Data_Structures.md | 59 ++++ .codeboarding/Data_Ingestion_Storage.json | 116 ------- .codeboarding/Data_Ingestion_Storage.md | 79 ----- .../Data_Input_Validation_Layer.json | 79 +++++ .codeboarding/Data_Input_Validation_Layer.md | 57 ++++ .codeboarding/Image_Processing_Engine.json | 156 +++++++++ .codeboarding/Image_Processing_Engine.md | 102 ++++++ .codeboarding/LLM_Integration_Layer.json | 57 ---- .codeboarding/LLM_Integration_Layer.md | 52 --- .codeboarding/Output_Export_Layer.json | 88 ++++++ .codeboarding/Output_Export_Layer.md | 51 +++ .codeboarding/Retrieval_Module.json | 38 --- .codeboarding/Retrieval_Module.md | 34 -- .codeboarding/Spot_Analysis_Engine.json | 120 +++++++ .codeboarding/Spot_Analysis_Engine.md | 80 +++++ .codeboarding/User_Interface_UI_.json | 112 ------- .codeboarding/User_Interface_UI_.md | 71 ----- .../Vector_Database_Knowledge_Base.json | 196 ------------ .../Vector_Database_Knowledge_Base.md | 110 ------- .codeboarding/Visualization_Utilities.json | 165 ++++++++++ .codeboarding/Visualization_Utilities.md | 103 ++++++ .codeboarding/analysis.json | 295 +++++------------- .codeboarding/codeboarding_version.json | 2 +- .codeboarding/on_boarding.md | 152 ++++----- 31 files changed, 1288 insertions(+), 1713 deletions(-) delete mode 100644 .codeboarding/Agentic_Reasoning_External_Tools.json delete mode 100644 .codeboarding/Agentic_Reasoning_External_Tools.md delete mode 100644 .codeboarding/Asynchronous_Task_Worker.json delete mode 100644 .codeboarding/Asynchronous_Task_Worker.md delete mode 100644 .codeboarding/Backend_Core.json delete mode 100644 .codeboarding/Backend_Core.md create mode 100644 .codeboarding/Core_Data_Structures.json create mode 100644 .codeboarding/Core_Data_Structures.md delete mode 100644 .codeboarding/Data_Ingestion_Storage.json delete mode 100644 .codeboarding/Data_Ingestion_Storage.md create mode 100644 .codeboarding/Data_Input_Validation_Layer.json create mode 100644 .codeboarding/Data_Input_Validation_Layer.md create mode 100644 .codeboarding/Image_Processing_Engine.json create mode 100644 .codeboarding/Image_Processing_Engine.md delete mode 100644 .codeboarding/LLM_Integration_Layer.json delete mode 100644 .codeboarding/LLM_Integration_Layer.md create mode 100644 .codeboarding/Output_Export_Layer.json create mode 100644 .codeboarding/Output_Export_Layer.md delete mode 100644 .codeboarding/Retrieval_Module.json delete mode 100644 .codeboarding/Retrieval_Module.md create mode 100644 .codeboarding/Spot_Analysis_Engine.json create mode 100644 .codeboarding/Spot_Analysis_Engine.md delete mode 100644 .codeboarding/User_Interface_UI_.json delete mode 100644 .codeboarding/User_Interface_UI_.md delete mode 100644 .codeboarding/Vector_Database_Knowledge_Base.json delete mode 100644 .codeboarding/Vector_Database_Knowledge_Base.md create mode 100644 .codeboarding/Visualization_Utilities.json create mode 100644 .codeboarding/Visualization_Utilities.md diff --git a/.codeboarding/Agentic_Reasoning_External_Tools.json b/.codeboarding/Agentic_Reasoning_External_Tools.json deleted file mode 100644 index 50f96375..00000000 --- a/.codeboarding/Agentic_Reasoning_External_Tools.json +++ /dev/null @@ -1,151 +0,0 @@ -{ - "description": "The Agentic Subsystem in DocsGPT is designed to enable intelligent, multi-step interactions by leveraging Large Language Models (LLMs) and external tools. At its core, the `Stream Processor` orchestrates the execution flow, initiating the `ReActAgent`. The `ReActAgent`, inheriting from `BaseAgent`, implements the ReAct pattern to reason, decide on actions, and execute them. It interacts with the `LLM Tool Call Handler` to process LLM outputs, which in turn utilizes the `ToolActionParser` to interpret tool calls. The `ToolManager` is responsible for loading and providing `Individual Tools` (such as `Elevenlabs TTS`) that the `ReActAgent` can invoke to perform specific actions, thereby extending the LLM's capabilities. This modular design ensures a clear separation of concerns, facilitating robust and extensible agentic behavior.", - "components": [ - { - "name": "BaseAgent", - "description": "Defines the abstract interface and common functionalities for all agents. It establishes the blueprint for how agents should process queries, interact with tools, and generate responses. It's fundamental as the base contract for any agent.", - "referenced_source_code": [ - { - "qualified_name": "application.agents.base.BaseAgent", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/base.py", - "reference_start_line": 19, - "reference_end_line": 326 - } - ], - "can_expand": true - }, - { - "name": "ReActAgent", - "description": "Implements the ReAct (Reasoning and Acting) pattern, enabling the LLM to perform complex, multi-step tasks. It orchestrates the iterative process of generating thoughts, deciding on actions (tool calls), executing them, and formulating a final answer. This is the core of the agentic reasoning.", - "referenced_source_code": [ - { - "qualified_name": "application.agents.react_agent.ReActAgent", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/react_agent.py", - "reference_start_line": 26, - "reference_end_line": 229 - } - ], - "can_expand": true - }, - { - "name": "ToolManager", - "description": "Manages the lifecycle and availability of external tools. It's responsible for discovering, loading, and providing access to the various tools that agents can utilize. Essential for agents to access their capabilities.", - "referenced_source_code": [ - { - "qualified_name": "application.agents.tools.tool_manager.ToolManager", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools/tool_manager.py", - "reference_start_line": 9, - "reference_end_line": 42 - } - ], - "can_expand": true - }, - { - "name": "ToolActionParser", - "description": "Interprets the raw output from the LLM to identify and parse tool calls, extracting the tool name and its arguments. This component is critical for translating the LLM's textual output into executable actions.", - "referenced_source_code": [ - { - "qualified_name": "application.agents.tools.tool_action_parser.ToolActionParser", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools/tool_action_parser.py", - "reference_start_line": 7, - "reference_end_line": 37 - } - ], - "can_expand": false - }, - { - "name": "Individual Tools", - "description": "Represents the collection of specific external tools, each encapsulating the logic for interacting with an external service or performing a distinct action (e.g., web search, TTS). These are the actual capabilities the agent leverages.", - "referenced_source_code": [ - { - "qualified_name": "application.agents.tools", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools", - "reference_start_line": 0, - "reference_end_line": 0 - } - ], - "can_expand": true - }, - { - "name": "Elevenlabs TTS", - "description": "Provides Text-to-Speech functionality, allowing agents to generate spoken responses. This is a concrete example of an external tool, highlighting the subsystem's ability to integrate diverse external services.", - "referenced_source_code": [ - { - "qualified_name": "application.tts.elevenlabs.ElevenlabsTTS", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/tts/elevenlabs.py", - "reference_start_line": 9, - "reference_end_line": 66 - } - ], - "can_expand": false - }, - { - "name": "LLM Tool Call Handler", - "description": "Acts as the bridge between the raw LLM output and the agent's tool execution logic, specifically handling tool calls. It's crucial for the agent to correctly interpret and act upon the LLM's instructions for tool usage.", - "referenced_source_code": [ - { - "qualified_name": "application.llm.handlers.LLMToolCallHandler", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/handlers/base.py", - "reference_start_line": 0, - "reference_end_line": 0 - } - ], - "can_expand": true - }, - { - "name": "Stream Processor", - "description": "Initiates and orchestrates the agent's execution flow within the main application, particularly for streaming responses. It serves as the entry point for triggering the agent's reasoning and tool-use cycle.", - "referenced_source_code": [ - { - "qualified_name": "application.api.answer.services.stream_processor.StreamProcessor", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/answer/services/stream_processor.py", - "reference_start_line": 56, - "reference_end_line": 260 - } - ], - "can_expand": true - } - ], - "components_relations": [ - { - "relation": "inherits from", - "src_name": "ReActAgent", - "dst_name": "BaseAgent" - }, - { - "relation": "interacts with", - "src_name": "ReActAgent", - "dst_name": "LLM Tool Call Handler" - }, - { - "relation": "uses", - "src_name": "LLM Tool Call Handler", - "dst_name": "ToolActionParser" - }, - { - "relation": "invokes", - "src_name": "ReActAgent", - "dst_name": "Individual Tools" - }, - { - "relation": "invokes", - "src_name": "ReActAgent", - "dst_name": "Elevenlabs TTS" - }, - { - "relation": "provides tools to", - "src_name": "ToolManager", - "dst_name": "BaseAgent" - }, - { - "relation": "loads", - "src_name": "ToolManager", - "dst_name": "Individual Tools" - }, - { - "relation": "orchestrates", - "src_name": "Stream Processor", - "dst_name": "ReActAgent" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/Agentic_Reasoning_External_Tools.md b/.codeboarding/Agentic_Reasoning_External_Tools.md deleted file mode 100644 index 3a2ec8bc..00000000 --- a/.codeboarding/Agentic_Reasoning_External_Tools.md +++ /dev/null @@ -1,101 +0,0 @@ -```mermaid -graph LR - BaseAgent["BaseAgent"] - ReActAgent["ReActAgent"] - ToolManager["ToolManager"] - ToolActionParser["ToolActionParser"] - Individual_Tools["Individual Tools"] - Elevenlabs_TTS["Elevenlabs TTS"] - LLM_Tool_Call_Handler["LLM Tool Call Handler"] - Stream_Processor["Stream Processor"] - ReActAgent -- "inherits from" --> BaseAgent - ReActAgent -- "interacts with" --> LLM_Tool_Call_Handler - LLM_Tool_Call_Handler -- "uses" --> ToolActionParser - ReActAgent -- "invokes" --> Individual_Tools - ReActAgent -- "invokes" --> Elevenlabs_TTS - ToolManager -- "provides tools to" --> BaseAgent - ToolManager -- "loads" --> Individual_Tools - Stream_Processor -- "orchestrates" --> ReActAgent -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -The Agentic Subsystem in DocsGPT is designed to enable intelligent, multi-step interactions by leveraging Large Language Models (LLMs) and external tools. At its core, the `Stream Processor` orchestrates the execution flow, initiating the `ReActAgent`. The `ReActAgent`, inheriting from `BaseAgent`, implements the ReAct pattern to reason, decide on actions, and execute them. It interacts with the `LLM Tool Call Handler` to process LLM outputs, which in turn utilizes the `ToolActionParser` to interpret tool calls. The `ToolManager` is responsible for loading and providing `Individual Tools` (such as `Elevenlabs TTS`) that the `ReActAgent` can invoke to perform specific actions, thereby extending the LLM's capabilities. This modular design ensures a clear separation of concerns, facilitating robust and extensible agentic behavior. - -### BaseAgent -Defines the abstract interface and common functionalities for all agents. It establishes the blueprint for how agents should process queries, interact with tools, and generate responses. It's fundamental as the base contract for any agent. - - -**Related Classes/Methods**: - -- `application.agents.base.BaseAgent`:19-326 - - -### ReActAgent -Implements the ReAct (Reasoning and Acting) pattern, enabling the LLM to perform complex, multi-step tasks. It orchestrates the iterative process of generating thoughts, deciding on actions (tool calls), executing them, and formulating a final answer. This is the core of the agentic reasoning. - - -**Related Classes/Methods**: - -- `application.agents.react_agent.ReActAgent`:26-229 - - -### ToolManager -Manages the lifecycle and availability of external tools. It's responsible for discovering, loading, and providing access to the various tools that agents can utilize. Essential for agents to access their capabilities. - - -**Related Classes/Methods**: - -- `application.agents.tools.tool_manager.ToolManager`:9-42 - - -### ToolActionParser -Interprets the raw output from the LLM to identify and parse tool calls, extracting the tool name and its arguments. This component is critical for translating the LLM's textual output into executable actions. - - -**Related Classes/Methods**: - -- `application.agents.tools.tool_action_parser.ToolActionParser`:7-37 - - -### Individual Tools -Represents the collection of specific external tools, each encapsulating the logic for interacting with an external service or performing a distinct action (e.g., web search, TTS). These are the actual capabilities the agent leverages. - - -**Related Classes/Methods**: - -- `application.agents.tools` - - -### Elevenlabs TTS -Provides Text-to-Speech functionality, allowing agents to generate spoken responses. This is a concrete example of an external tool, highlighting the subsystem's ability to integrate diverse external services. - - -**Related Classes/Methods**: - -- `application.tts.elevenlabs.ElevenlabsTTS`:9-66 - - -### LLM Tool Call Handler -Acts as the bridge between the raw LLM output and the agent's tool execution logic, specifically handling tool calls. It's crucial for the agent to correctly interpret and act upon the LLM's instructions for tool usage. - - -**Related Classes/Methods**: - -- `application.llm.handlers.LLMToolCallHandler` - - -### Stream Processor -Initiates and orchestrates the agent's execution flow within the main application, particularly for streaming responses. It serves as the entry point for triggering the agent's reasoning and tool-use cycle. - - -**Related Classes/Methods**: - -- `application.api.answer.services.stream_processor.StreamProcessor`:56-260 - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Asynchronous_Task_Worker.json b/.codeboarding/Asynchronous_Task_Worker.json deleted file mode 100644 index ef092466..00000000 --- a/.codeboarding/Asynchronous_Task_Worker.json +++ /dev/null @@ -1,56 +0,0 @@ -{ - "description": "This subsystem is responsible for managing and executing long-running or computationally intensive tasks asynchronously, such as document ingestion, remote data synchronization, and agent webhooks. It prevents the main API from blocking, ensuring responsiveness and scalability for the RAG system.", - "components": [ - { - "name": "Task Orchestrator", - "description": "Defines and encapsulates the actual long-running, computationally intensive tasks essential for the RAG system's operation. These tasks offload heavy processing from the main application thread, ensuring responsiveness.", - "referenced_source_code": [ - { - "qualified_name": "application.worker", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/worker.py", - "reference_start_line": 1, - "reference_end_line": 9999 - } - ], - "can_expand": true - }, - { - "name": "Celery Application Initializer", - "description": "Responsible for bootstrapping and initializing the Celery application instance. It establishes the connection to the message broker and result backend, effectively setting up the runtime environment for asynchronous tasks.", - "referenced_source_code": [ - { - "qualified_name": "application.celery_init", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celery_init.py", - "reference_start_line": 1, - "reference_end_line": 9999 - } - ], - "can_expand": false - }, - { - "name": "Celery Configuration Manager", - "description": "Centralizes and provides all necessary configuration parameters for the Celery application. It ensures that the asynchronous system operates correctly by defining settings such as broker URLs, backend URLs, and task queues.", - "referenced_source_code": [ - { - "qualified_name": "application.celeryconfig", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celeryconfig.py", - "reference_start_line": 1, - "reference_end_line": 9999 - } - ], - "can_expand": false - } - ], - "components_relations": [ - { - "relation": "registers tasks with", - "src_name": "Task Orchestrator", - "dst_name": "Celery Application Initializer" - }, - { - "relation": "provides configuration to", - "src_name": "Celery Configuration Manager", - "dst_name": "Celery Application Initializer" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/Asynchronous_Task_Worker.md b/.codeboarding/Asynchronous_Task_Worker.md deleted file mode 100644 index fa98fced..00000000 --- a/.codeboarding/Asynchronous_Task_Worker.md +++ /dev/null @@ -1,45 +0,0 @@ -```mermaid -graph LR - Task_Orchestrator["Task Orchestrator"] - Celery_Application_Initializer["Celery Application Initializer"] - Celery_Configuration_Manager["Celery Configuration Manager"] - Task_Orchestrator -- "registers tasks with" --> Celery_Application_Initializer - Celery_Configuration_Manager -- "provides configuration to" --> Celery_Application_Initializer -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -This subsystem is responsible for managing and executing long-running or computationally intensive tasks asynchronously, such as document ingestion, remote data synchronization, and agent webhooks. It prevents the main API from blocking, ensuring responsiveness and scalability for the RAG system. - -### Task Orchestrator -Defines and encapsulates the actual long-running, computationally intensive tasks essential for the RAG system's operation. These tasks offload heavy processing from the main application thread, ensuring responsiveness. - - -**Related Classes/Methods**: - -- `application.worker`:1-9999 - - -### Celery Application Initializer -Responsible for bootstrapping and initializing the Celery application instance. It establishes the connection to the message broker and result backend, effectively setting up the runtime environment for asynchronous tasks. - - -**Related Classes/Methods**: - -- `application.celery_init`:1-9999 - - -### Celery Configuration Manager -Centralizes and provides all necessary configuration parameters for the Celery application. It ensures that the asynchronous system operates correctly by defining settings such as broker URLs, backend URLs, and task queues. - - -**Related Classes/Methods**: - -- `application.celeryconfig`:1-9999 - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Backend_Core.json b/.codeboarding/Backend_Core.json deleted file mode 100644 index f611593d..00000000 --- a/.codeboarding/Backend_Core.json +++ /dev/null @@ -1,112 +0,0 @@ -{ - "description": "The Backend Core acts as the central entry point and orchestrator for the DocsGPT application, handling request routing, core application logic, authentication, and configuration.", - "components": [ - { - "name": "Application Orchestrator", - "description": "The main Flask application instance, serving as the primary orchestrator and central entry point for all incoming HTTP requests. It initializes the application and registers all API routes.", - "referenced_source_code": [ - { - "qualified_name": "application.app", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/app.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - }, - { - "name": "API Answer Routes", - "description": "Defines and handles API endpoints specifically for processing user queries and initiating the AI-powered answer generation process, delegating to the RAG services.", - "referenced_source_code": [ - { - "qualified_name": "application.api.answer.routes", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/answer/routes", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - }, - { - "name": "API User Routes", - "description": "Defines and handles API endpoints for user-specific functionalities, including document management, API key operations, and usage tracking.", - "referenced_source_code": [ - { - "qualified_name": "application.api.user.routes", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/user/routes.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - }, - { - "name": "Authentication & Authorization", - "description": "Manages user authentication and authorization, ensuring secure access to application functionalities and data.", - "referenced_source_code": [ - { - "qualified_name": "application.auth", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/auth.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": false - }, - { - "name": "Configuration Manager", - "description": "Centralized configuration management for application-wide parameters, environment variables, and secrets.", - "referenced_source_code": [ - { - "qualified_name": "application.core.settings", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/core/settings.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - } - ], - "components_relations": [ - { - "relation": "registers", - "src_name": "Application Orchestrator", - "dst_name": "API Answer Routes" - }, - { - "relation": "registers", - "src_name": "Application Orchestrator", - "dst_name": "API User Routes" - }, - { - "relation": "utilizes", - "src_name": "Application Orchestrator", - "dst_name": "Configuration Manager" - }, - { - "relation": "relies on", - "src_name": "API Answer Routes", - "dst_name": "Authentication & Authorization" - }, - { - "relation": "relies on", - "src_name": "API User Routes", - "dst_name": "Authentication & Authorization" - }, - { - "relation": "provides services to", - "src_name": "Authentication & Authorization", - "dst_name": "API Answer Routes" - }, - { - "relation": "provides services to", - "src_name": "Authentication & Authorization", - "dst_name": "API User Routes" - }, - { - "relation": "provides configuration to", - "src_name": "Configuration Manager", - "dst_name": "Application Orchestrator" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/Backend_Core.md b/.codeboarding/Backend_Core.md deleted file mode 100644 index 834e0e54..00000000 --- a/.codeboarding/Backend_Core.md +++ /dev/null @@ -1,71 +0,0 @@ -```mermaid -graph LR - Application_Orchestrator["Application Orchestrator"] - API_Answer_Routes["API Answer Routes"] - API_User_Routes["API User Routes"] - Authentication_Authorization["Authentication & Authorization"] - Configuration_Manager["Configuration Manager"] - Application_Orchestrator -- "registers" --> API_Answer_Routes - Application_Orchestrator -- "registers" --> API_User_Routes - Application_Orchestrator -- "utilizes" --> Configuration_Manager - API_Answer_Routes -- "relies on" --> Authentication_Authorization - API_User_Routes -- "relies on" --> Authentication_Authorization - Authentication_Authorization -- "provides services to" --> API_Answer_Routes - Authentication_Authorization -- "provides services to" --> API_User_Routes - Configuration_Manager -- "provides configuration to" --> Application_Orchestrator -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -The Backend Core acts as the central entry point and orchestrator for the DocsGPT application, handling request routing, core application logic, authentication, and configuration. - -### Application Orchestrator -The main Flask application instance, serving as the primary orchestrator and central entry point for all incoming HTTP requests. It initializes the application and registers all API routes. - - -**Related Classes/Methods**: - -- `application.app` - - -### API Answer Routes -Defines and handles API endpoints specifically for processing user queries and initiating the AI-powered answer generation process, delegating to the RAG services. - - -**Related Classes/Methods**: - -- `application.api.answer.routes` - - -### API User Routes -Defines and handles API endpoints for user-specific functionalities, including document management, API key operations, and usage tracking. - - -**Related Classes/Methods**: - -- `application.api.user.routes` - - -### Authentication & Authorization -Manages user authentication and authorization, ensuring secure access to application functionalities and data. - - -**Related Classes/Methods**: - -- `application.auth` - - -### Configuration Manager -Centralized configuration management for application-wide parameters, environment variables, and secrets. - - -**Related Classes/Methods**: - -- `application.core.settings` - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Core_Data_Structures.json b/.codeboarding/Core_Data_Structures.json new file mode 100644 index 00000000..c1ca9d54 --- /dev/null +++ b/.codeboarding/Core_Data_Structures.json @@ -0,0 +1,91 @@ +{ + "description": "The `starfish` core image processing pipeline is centered around the `ImageStack` component, which serves as the primary in-memory representation for multi-dimensional image data. Data ingestion into the `ImageStack` is handled by `ImageStack Parsers`, which are responsible for converting various external data formats into the standardized `ImageStack` structure. Once loaded, the `ImageStack` can be manipulated by components like `ImageStack Cropping` for spatial transformations. The `Codebook` component plays a crucial role in interpreting the raw intensity data within the `ImageStack`, mapping observed fluorescent signals to biological targets, thereby providing biological meaning to the image data. This architecture ensures a clear separation of concerns, with dedicated components for data representation, ingestion, transformation, and biological interpretation, facilitating a modular and extensible image processing workflow.", + "components": [ + { + "name": "ImageStack", + "description": "Serves as the fundamental in-memory representation for multi-dimensional image data within the Starfish pipeline. It provides a unified interface for accessing, manipulating, and transforming image data, managing associated metadata and coordinates. This component is central to any image processing library.", + "referenced_source_code": [ + { + "qualified_name": "starfish/core/imagestack/imagestack.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/imagestack.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Codebook", + "description": "Defines the crucial mapping between observed fluorescent signals (channels, imaging rounds) and specific biological targets (e.g., genes). It is essential for interpreting raw intensity data into meaningful biological information, supporting loading, validation, and decoding operations. This is a core data structure for biological interpretation.", + "referenced_source_code": [ + { + "qualified_name": "starfish/core/codebook/codebook.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/codebook/codebook.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "ImageStack Parsers", + "description": "These components are responsible for reading and parsing various external data formats (e.g., tilesets, raw NumPy arrays) into the internal `ImageStack` representation. They handle the specifics of data layout, metadata extraction, and efficient data loading, acting as the data ingestion layer for `ImageStack`.", + "referenced_source_code": [ + { + "qualified_name": "starfish/core/imagestack/parser/tileset/_parser.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/parser/tileset/_parser.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "starfish/core/imagestack/parser/tilefetcher/_parser.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/parser/tilefetcher/_parser.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "starfish/core/imagestack/parser/numpy/__init__.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/parser/numpy/__init__.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "ImageStack Cropping", + "description": "Manages the process of cropping `ImageStack` data. This includes determining the appropriate crop regions, applying the cropping operation to the image data, and adjusting associated metadata and coordinates. It represents a fundamental data transformation utility directly operating on the core `ImageStack` data.", + "referenced_source_code": [ + { + "qualified_name": "starfish/core/imagestack/parser/crop.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/parser/crop.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "utilizes", + "src_name": "ImageStack", + "dst_name": "ImageStack Cropping" + }, + { + "relation": "interprets data from", + "src_name": "Codebook", + "dst_name": "ImageStack" + }, + { + "relation": "produces data for", + "src_name": "ImageStack Parsers", + "dst_name": "ImageStack" + }, + { + "relation": "produces data for", + "src_name": "ImageStack Cropping", + "dst_name": "ImageStack" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Core_Data_Structures.md b/.codeboarding/Core_Data_Structures.md new file mode 100644 index 00000000..7150a6f8 --- /dev/null +++ b/.codeboarding/Core_Data_Structures.md @@ -0,0 +1,59 @@ +```mermaid +graph LR + ImageStack["ImageStack"] + Codebook["Codebook"] + ImageStack_Parsers["ImageStack Parsers"] + ImageStack_Cropping["ImageStack Cropping"] + ImageStack -- "utilizes" --> ImageStack_Cropping + Codebook -- "interprets data from" --> ImageStack + ImageStack_Parsers -- "produces data for" --> ImageStack + ImageStack_Cropping -- "produces data for" --> ImageStack +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The `starfish` core image processing pipeline is centered around the `ImageStack` component, which serves as the primary in-memory representation for multi-dimensional image data. Data ingestion into the `ImageStack` is handled by `ImageStack Parsers`, which are responsible for converting various external data formats into the standardized `ImageStack` structure. Once loaded, the `ImageStack` can be manipulated by components like `ImageStack Cropping` for spatial transformations. The `Codebook` component plays a crucial role in interpreting the raw intensity data within the `ImageStack`, mapping observed fluorescent signals to biological targets, thereby providing biological meaning to the image data. This architecture ensures a clear separation of concerns, with dedicated components for data representation, ingestion, transformation, and biological interpretation, facilitating a modular and extensible image processing workflow. + +### ImageStack +Serves as the fundamental in-memory representation for multi-dimensional image data within the Starfish pipeline. It provides a unified interface for accessing, manipulating, and transforming image data, managing associated metadata and coordinates. This component is central to any image processing library. + + +**Related Classes/Methods**: + +- `starfish/core/imagestack/imagestack.py` + + +### Codebook +Defines the crucial mapping between observed fluorescent signals (channels, imaging rounds) and specific biological targets (e.g., genes). It is essential for interpreting raw intensity data into meaningful biological information, supporting loading, validation, and decoding operations. This is a core data structure for biological interpretation. + + +**Related Classes/Methods**: + +- `starfish/core/codebook/codebook.py` + + +### ImageStack Parsers +These components are responsible for reading and parsing various external data formats (e.g., tilesets, raw NumPy arrays) into the internal `ImageStack` representation. They handle the specifics of data layout, metadata extraction, and efficient data loading, acting as the data ingestion layer for `ImageStack`. + + +**Related Classes/Methods**: + +- `starfish/core/imagestack/parser/tileset/_parser.py` +- `starfish/core/imagestack/parser/tilefetcher/_parser.py` +- `starfish/core/imagestack/parser/numpy/__init__.py` + + +### ImageStack Cropping +Manages the process of cropping `ImageStack` data. This includes determining the appropriate crop regions, applying the cropping operation to the image data, and adjusting associated metadata and coordinates. It represents a fundamental data transformation utility directly operating on the core `ImageStack` data. + + +**Related Classes/Methods**: + +- `starfish/core/imagestack/parser/crop.py` + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Data_Ingestion_Storage.json b/.codeboarding/Data_Ingestion_Storage.json deleted file mode 100644 index 690afbfc..00000000 --- a/.codeboarding/Data_Ingestion_Storage.json +++ /dev/null @@ -1,116 +0,0 @@ -{ - "description": "The DocsGPT system is designed around a modular data processing pipeline. It begins with the `Data Source Ingestion` component, responsible for acquiring raw data from diverse origins. This data then proceeds to `Document Chunking`, where it is segmented into optimized units. The `Embedding Pipeline` subsequently transforms these chunks into vector embeddings, which are crucial for knowledge representation. For persistent storage of these embeddings, the `Embedding Pipeline` interacts exclusively with the `Storage Abstraction` layer. This abstraction layer intelligently delegates storage operations to specific backends, such as `Local Storage` for local persistence or `S3 Storage` for scalable cloud-based storage, thereby ensuring a flexible and decoupled storage mechanism.", - "components": [ - { - "name": "Data Source Ingestion", - "description": "This logical component encompasses the initial ingestion of raw data. `application.parser.file` handles local file system inputs, while `application.parser.remote` manages data from external sources like GitHub or sitemaps. They are the entry points for all data into the system.", - "referenced_source_code": [ - { - "qualified_name": "application.parser.file", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/file", - "reference_start_line": 1, - "reference_end_line": 1 - }, - { - "qualified_name": "application.parser.remote", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/remote", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - }, - { - "name": "Document Chunking", - "description": "Responsible for breaking down large documents received from ingestion components into smaller, manageable chunks. This is critical for optimizing the data for embedding models and fitting within LLM context windows.", - "referenced_source_code": [ - { - "qualified_name": "application.parser.chunking", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/chunking.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - }, - { - "name": "Embedding Pipeline", - "description": "Orchestrates the conversion of text chunks into vector embeddings. This component is central to the data preparation process, bridging the gap between raw text and vector-based knowledge representation. It interacts with the `Storage Abstraction` for persistence.", - "referenced_source_code": [ - { - "qualified_name": "application.parser.embedding_pipeline", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/embedding_pipeline.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - }, - { - "name": "Storage Abstraction", - "description": "Acts as a factory or manager for abstracting different storage backends. It provides a unified interface for the rest of the system to interact with persistent storage, whether it's local or cloud-based. This promotes flexibility and extensibility in storage solutions by delegating to concrete implementations.", - "referenced_source_code": [ - { - "qualified_name": "application.storage.storage_creator", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/storage_creator.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": false - }, - { - "name": "Local Storage", - "description": "Implements the concrete logic for persistent storage and retrieval of files and data on the local file system. It's one of the specific storage backends supported by the system, managed by the `Storage Abstraction`.", - "referenced_source_code": [ - { - "qualified_name": "application.storage.local", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/local.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - }, - { - "name": "S3 Storage", - "description": "Implements the concrete logic for persistent storage and retrieval using S3-compatible object storage services. This provides cloud-based, scalable storage capabilities, managed by the `Storage Abstraction`.", - "referenced_source_code": [ - { - "qualified_name": "application.storage.s3", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/s3.py", - "reference_start_line": 1, - "reference_end_line": 1 - } - ], - "can_expand": true - } - ], - "components_relations": [ - { - "relation": "passes parsed documents to", - "src_name": "Data Source Ingestion", - "dst_name": "Document Chunking" - }, - { - "relation": "feeds text chunks to", - "src_name": "Document Chunking", - "dst_name": "Embedding Pipeline" - }, - { - "relation": "utilizes", - "src_name": "Embedding Pipeline", - "dst_name": "Storage Abstraction" - }, - { - "relation": "delegates to", - "src_name": "Storage Abstraction", - "dst_name": "Local Storage" - }, - { - "relation": "delegates to", - "src_name": "Storage Abstraction", - "dst_name": "S3 Storage" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/Data_Ingestion_Storage.md b/.codeboarding/Data_Ingestion_Storage.md deleted file mode 100644 index 5933f982..00000000 --- a/.codeboarding/Data_Ingestion_Storage.md +++ /dev/null @@ -1,79 +0,0 @@ -```mermaid -graph LR - Data_Source_Ingestion["Data Source Ingestion"] - Document_Chunking["Document Chunking"] - Embedding_Pipeline["Embedding Pipeline"] - Storage_Abstraction["Storage Abstraction"] - Local_Storage["Local Storage"] - S3_Storage["S3 Storage"] - Data_Source_Ingestion -- "passes parsed documents to" --> Document_Chunking - Document_Chunking -- "feeds text chunks to" --> Embedding_Pipeline - Embedding_Pipeline -- "utilizes" --> Storage_Abstraction - Storage_Abstraction -- "delegates to" --> Local_Storage - Storage_Abstraction -- "delegates to" --> S3_Storage -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -The DocsGPT system is designed around a modular data processing pipeline. It begins with the `Data Source Ingestion` component, responsible for acquiring raw data from diverse origins. This data then proceeds to `Document Chunking`, where it is segmented into optimized units. The `Embedding Pipeline` subsequently transforms these chunks into vector embeddings, which are crucial for knowledge representation. For persistent storage of these embeddings, the `Embedding Pipeline` interacts exclusively with the `Storage Abstraction` layer. This abstraction layer intelligently delegates storage operations to specific backends, such as `Local Storage` for local persistence or `S3 Storage` for scalable cloud-based storage, thereby ensuring a flexible and decoupled storage mechanism. - -### Data Source Ingestion -This logical component encompasses the initial ingestion of raw data. `application.parser.file` handles local file system inputs, while `application.parser.remote` manages data from external sources like GitHub or sitemaps. They are the entry points for all data into the system. - - -**Related Classes/Methods**: - -- `application.parser.file` -- `application.parser.remote` - - -### Document Chunking -Responsible for breaking down large documents received from ingestion components into smaller, manageable chunks. This is critical for optimizing the data for embedding models and fitting within LLM context windows. - - -**Related Classes/Methods**: - -- `application.parser.chunking` - - -### Embedding Pipeline -Orchestrates the conversion of text chunks into vector embeddings. This component is central to the data preparation process, bridging the gap between raw text and vector-based knowledge representation. It interacts with the `Storage Abstraction` for persistence. - - -**Related Classes/Methods**: - -- `application.parser.embedding_pipeline` - - -### Storage Abstraction -Acts as a factory or manager for abstracting different storage backends. It provides a unified interface for the rest of the system to interact with persistent storage, whether it's local or cloud-based. This promotes flexibility and extensibility in storage solutions by delegating to concrete implementations. - - -**Related Classes/Methods**: - -- `application.storage.storage_creator` - - -### Local Storage -Implements the concrete logic for persistent storage and retrieval of files and data on the local file system. It's one of the specific storage backends supported by the system, managed by the `Storage Abstraction`. - - -**Related Classes/Methods**: - -- `application.storage.local` - - -### S3 Storage -Implements the concrete logic for persistent storage and retrieval using S3-compatible object storage services. This provides cloud-based, scalable storage capabilities, managed by the `Storage Abstraction`. - - -**Related Classes/Methods**: - -- `application.storage.s3` - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Data_Input_Validation_Layer.json b/.codeboarding/Data_Input_Validation_Layer.json new file mode 100644 index 00000000..9e91532d --- /dev/null +++ b/.codeboarding/Data_Input_Validation_Layer.json @@ -0,0 +1,79 @@ +{ + "description": "This subsystem is the initial entry point for all data processing pipelines within starfish, responsible for loading raw experimental data and metadata and ensuring its conformity to the SpaceTx format. It embodies the \"Extract\" and initial \"Validate\" stages of an ETL pipeline, crucial for maintaining data integrity throughout the scientific analysis workflow.", + "components": [ + { + "name": "Experiment Data Model", + "description": "Represents the in-memory, structured form of the experimental data and metadata after loading. It serves as the canonical data structure that all subsequent processing components operate on.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.experiment.experiment:Experiment", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/experiment/experiment.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Experiment Builder", + "description": "Responsible for parsing raw experimental data (e.g., image files, JSON metadata) from various sources and constructing the Experiment Data Model (Experiment object) in memory. It acts as the primary \"Data Loader/Reader.\"", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.experiment.builder.builder", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/experiment/builder/builder.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "SpaceTx Validator", + "description": "Provides the main entry point for validating an Experiment Data Model or its constituent parts against the SpaceTx schema. It ensures data integrity and adherence to the defined format, acting as a \"Validation Tool\" and a gatekeeper for data quality.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.spacetx_format.validate_sptx", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spacetx_format/validate_sptx.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Schema Utilities", + "description": "Encapsulates the core logic for SpaceTx schema validation, including loading the SpaceTx JSON schemas, performing structural and data type validation, and enforcing constraints. It supports the SpaceTx Validator.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.spacetx_format.util", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spacetx_format/util.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "provides data to", + "src_name": "Experiment Builder", + "dst_name": "SpaceTx Validator" + }, + { + "relation": "creates/populates", + "src_name": "Experiment Builder", + "dst_name": "Experiment Data Model" + }, + { + "relation": "validates", + "src_name": "SpaceTx Validator", + "dst_name": "Experiment Data Model" + }, + { + "relation": "delegates validation to", + "src_name": "SpaceTx Validator", + "dst_name": "Schema Utilities" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Data_Input_Validation_Layer.md b/.codeboarding/Data_Input_Validation_Layer.md new file mode 100644 index 00000000..e7ec508e --- /dev/null +++ b/.codeboarding/Data_Input_Validation_Layer.md @@ -0,0 +1,57 @@ +```mermaid +graph LR + Experiment_Data_Model["Experiment Data Model"] + Experiment_Builder["Experiment Builder"] + SpaceTx_Validator["SpaceTx Validator"] + Schema_Utilities["Schema Utilities"] + Experiment_Builder -- "provides data to" --> SpaceTx_Validator + Experiment_Builder -- "creates/populates" --> Experiment_Data_Model + SpaceTx_Validator -- "validates" --> Experiment_Data_Model + SpaceTx_Validator -- "delegates validation to" --> Schema_Utilities +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +This subsystem is the initial entry point for all data processing pipelines within starfish, responsible for loading raw experimental data and metadata and ensuring its conformity to the SpaceTx format. It embodies the "Extract" and initial "Validate" stages of an ETL pipeline, crucial for maintaining data integrity throughout the scientific analysis workflow. + +### Experiment Data Model +Represents the in-memory, structured form of the experimental data and metadata after loading. It serves as the canonical data structure that all subsequent processing components operate on. + + +**Related Classes/Methods**: + +- `starfish.core.experiment.experiment:Experiment` + + +### Experiment Builder +Responsible for parsing raw experimental data (e.g., image files, JSON metadata) from various sources and constructing the Experiment Data Model (Experiment object) in memory. It acts as the primary "Data Loader/Reader." + + +**Related Classes/Methods**: + +- `starfish.core.experiment.builder.builder` + + +### SpaceTx Validator +Provides the main entry point for validating an Experiment Data Model or its constituent parts against the SpaceTx schema. It ensures data integrity and adherence to the defined format, acting as a "Validation Tool" and a gatekeeper for data quality. + + +**Related Classes/Methods**: + +- `starfish.core.spacetx_format.validate_sptx` + + +### Schema Utilities +Encapsulates the core logic for SpaceTx schema validation, including loading the SpaceTx JSON schemas, performing structural and data type validation, and enforcing constraints. It supports the SpaceTx Validator. + + +**Related Classes/Methods**: + +- `starfish.core.spacetx_format.util` + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Image_Processing_Engine.json b/.codeboarding/Image_Processing_Engine.json new file mode 100644 index 00000000..e1bbfa16 --- /dev/null +++ b/.codeboarding/Image_Processing_Engine.json @@ -0,0 +1,156 @@ +{ + "description": "The `Image Processing Engine` subsystem is primarily defined by the `starfish.core.image` and `starfish.core.morphology` packages, specifically focusing on `Filter`, `_registration`, `Segment`, `Binarize`, and `label_image` modules. It operates as a pipeline, where raw or pre-processed images are fed into various transformation and analysis steps. This modular and pipeline-driven structure aligns with the project's \"Data Pipeline / ETL\" and \"Modular Design\" architectural patterns, allowing for flexible construction of image analysis workflows.", + "components": [ + { + "name": "Image Filtering", + "description": "Responsible for applying various image enhancement and noise reduction algorithms to raw or pre-processed image data.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.image.Filter", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/image/Filter/__init__.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Image Registration", + "description": "Manages the application of geometric transformations to align and register images, correcting for spatial distortions between different acquisitions or time points.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.image._registration.ApplyTransform", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/image/_registration/ApplyTransform", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Transformation Parameters", + "description": "Provides a standardized mechanism for defining, serializing, and deserializing parameters required for various image transformations.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.image._registration.transforms_list", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/image/_registration/transforms_list.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": false + }, + { + "name": "Image Segmentation", + "description": "Implements the watershed algorithm to segment images into distinct regions, typically used for separating touching objects or identifying individual structures.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.image.Segment.watershed", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/image/Segment/watershed.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Image Binarization", + "description": "Performs image binarization by applying a threshold, converting grayscale images into binary (black and white) masks based on pixel intensity.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.morphology.Binarize.threshold", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/Binarize/threshold.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": false + }, + { + "name": "Labeled Image Representation", + "description": "Encapsulates and manages image data where pixels are assigned integer labels corresponding to distinct regions or objects identified through segmentation.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.morphology.label_image.label_image", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/label_image/label_image.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": false + }, + { + "name": "Binary Mask Collection", + "description": "Manages collections of binary masks, providing an organized structure for handling multiple mask datasets, often derived from binarization or segmentation.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.morphology.binary_mask.binary_mask", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/binary_mask/binary_mask.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": true + }, + { + "name": "Mask Persistence", + "description": "Handles the reading and writing of binary mask data to and from persistent storage, ensuring data integrity and reusability.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.morphology.binary_mask._io", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/binary_mask/_io.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": false + } + ], + "components_relations": [ + { + "relation": "provides processed images to", + "src_name": "Image Filtering", + "dst_name": "Image Segmentation" + }, + { + "relation": "provides processed images to", + "src_name": "Image Filtering", + "dst_name": "Image Binarization" + }, + { + "relation": "utilizes", + "src_name": "Image Registration", + "dst_name": "Transformation Parameters" + }, + { + "relation": "provides transformation parameters to", + "src_name": "Transformation Parameters", + "dst_name": "Image Registration" + }, + { + "relation": "outputs segmented regions, which are represented by", + "src_name": "Image Segmentation", + "dst_name": "Labeled Image Representation" + }, + { + "relation": "generates binary output, which is managed by", + "src_name": "Image Binarization", + "dst_name": "Binary Mask Collection" + }, + { + "relation": "contributes to the creation or population of", + "src_name": "Labeled Image Representation", + "dst_name": "Binary Mask Collection" + }, + { + "relation": "uses", + "src_name": "Binary Mask Collection", + "dst_name": "Mask Persistence" + }, + { + "relation": "provides persistence services for", + "src_name": "Mask Persistence", + "dst_name": "Binary Mask Collection" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Image_Processing_Engine.md b/.codeboarding/Image_Processing_Engine.md new file mode 100644 index 00000000..f635937c --- /dev/null +++ b/.codeboarding/Image_Processing_Engine.md @@ -0,0 +1,102 @@ +```mermaid +graph LR + Image_Filtering["Image Filtering"] + Image_Registration["Image Registration"] + Transformation_Parameters["Transformation Parameters"] + Image_Segmentation["Image Segmentation"] + Image_Binarization["Image Binarization"] + Labeled_Image_Representation["Labeled Image Representation"] + Binary_Mask_Collection["Binary Mask Collection"] + Mask_Persistence["Mask Persistence"] + Image_Filtering -- "provides processed images to" --> Image_Segmentation + Image_Filtering -- "provides processed images to" --> Image_Binarization + Image_Registration -- "utilizes" --> Transformation_Parameters + Transformation_Parameters -- "provides transformation parameters to" --> Image_Registration + Image_Segmentation -- "outputs segmented regions, which are represented by" --> Labeled_Image_Representation + Image_Binarization -- "generates binary output, which is managed by" --> Binary_Mask_Collection + Labeled_Image_Representation -- "contributes to the creation or population of" --> Binary_Mask_Collection + Binary_Mask_Collection -- "uses" --> Mask_Persistence + Mask_Persistence -- "provides persistence services for" --> Binary_Mask_Collection +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The `Image Processing Engine` subsystem is primarily defined by the `starfish.core.image` and `starfish.core.morphology` packages, specifically focusing on `Filter`, `_registration`, `Segment`, `Binarize`, and `label_image` modules. It operates as a pipeline, where raw or pre-processed images are fed into various transformation and analysis steps. This modular and pipeline-driven structure aligns with the project's "Data Pipeline / ETL" and "Modular Design" architectural patterns, allowing for flexible construction of image analysis workflows. + +### Image Filtering +Responsible for applying various image enhancement and noise reduction algorithms to raw or pre-processed image data. + + +**Related Classes/Methods**: + +- `starfish.core.image.Filter` + + +### Image Registration +Manages the application of geometric transformations to align and register images, correcting for spatial distortions between different acquisitions or time points. + + +**Related Classes/Methods**: + +- `starfish.core.image._registration.ApplyTransform` + + +### Transformation Parameters +Provides a standardized mechanism for defining, serializing, and deserializing parameters required for various image transformations. + + +**Related Classes/Methods**: + +- `starfish.core.image._registration.transforms_list` + + +### Image Segmentation +Implements the watershed algorithm to segment images into distinct regions, typically used for separating touching objects or identifying individual structures. + + +**Related Classes/Methods**: + +- `starfish.core.image.Segment.watershed` + + +### Image Binarization +Performs image binarization by applying a threshold, converting grayscale images into binary (black and white) masks based on pixel intensity. + + +**Related Classes/Methods**: + +- `starfish.core.morphology.Binarize.threshold` + + +### Labeled Image Representation +Encapsulates and manages image data where pixels are assigned integer labels corresponding to distinct regions or objects identified through segmentation. + + +**Related Classes/Methods**: + +- `starfish.core.morphology.label_image.label_image` + + +### Binary Mask Collection +Manages collections of binary masks, providing an organized structure for handling multiple mask datasets, often derived from binarization or segmentation. + + +**Related Classes/Methods**: + +- `starfish.core.morphology.binary_mask.binary_mask` + + +### Mask Persistence +Handles the reading and writing of binary mask data to and from persistent storage, ensuring data integrity and reusability. + + +**Related Classes/Methods**: + +- `starfish.core.morphology.binary_mask._io` + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/LLM_Integration_Layer.json b/.codeboarding/LLM_Integration_Layer.json deleted file mode 100644 index 06a02966..00000000 --- a/.codeboarding/LLM_Integration_Layer.json +++ /dev/null @@ -1,57 +0,0 @@ -{ - "description": "The LLM subsystem is designed around a core LLM Abstraction that provides a consistent interface for interacting with various LLM Provider Implementations. The LLM Creator is responsible for dynamically instantiating the appropriate LLM Provider Implementation based on configuration. Once an LLM interaction occurs, the LLM Response Handler orchestrates the subsequent message processing, including parsing responses and managing tool calls. The LLM Handler Creator facilitates the selection and instantiation of the correct LLM Response Handler. This architecture ensures modularity, extensibility, and LLM agnosticism by centralizing interactions through the LLM Abstraction and separating response processing logic.", - "components": [ - { - "name": "LLM Abstraction", - "description": "Serves as the central contract for all LLM interactions, abstracting away the specifics of different LLM Provider Implementations.", - "referenced_source_code": [], - "can_expand": true - }, - { - "name": "LLM Creator", - "description": "Acts as a factory, providing instances of concrete LLM Provider Implementations.", - "referenced_source_code": [], - "can_expand": true - }, - { - "name": "LLM Provider Implementations", - "description": "Concrete implementations that adhere to the LLM Abstraction.", - "referenced_source_code": [], - "can_expand": true - }, - { - "name": "LLM Response Handler", - "description": "Manages the entire post-response flow, including parsing, tool call execution, and message preparation.", - "referenced_source_code": [], - "can_expand": true - }, - { - "name": "LLM Handler Creator", - "description": "Dynamically provides the appropriate LLM Response Handler instance.", - "referenced_source_code": [], - "can_expand": true - } - ], - "components_relations": [ - { - "relation": "instantiates", - "src_name": "LLM Creator", - "dst_name": "LLM Provider Implementations" - }, - { - "relation": "implements", - "src_name": "LLM Provider Implementations", - "dst_name": "LLM Abstraction" - }, - { - "relation": "interacts with", - "src_name": "LLM Response Handler", - "dst_name": "LLM Abstraction" - }, - { - "relation": "instantiates", - "src_name": "LLM Handler Creator", - "dst_name": "LLM Response Handler" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/LLM_Integration_Layer.md b/.codeboarding/LLM_Integration_Layer.md deleted file mode 100644 index 09dead0f..00000000 --- a/.codeboarding/LLM_Integration_Layer.md +++ /dev/null @@ -1,52 +0,0 @@ -```mermaid -graph LR - LLM_Abstraction["LLM Abstraction"] - LLM_Creator["LLM Creator"] - LLM_Provider_Implementations["LLM Provider Implementations"] - LLM_Response_Handler["LLM Response Handler"] - LLM_Handler_Creator["LLM Handler Creator"] - LLM_Creator -- "instantiates" --> LLM_Provider_Implementations - LLM_Provider_Implementations -- "implements" --> LLM_Abstraction - LLM_Response_Handler -- "interacts with" --> LLM_Abstraction - LLM_Handler_Creator -- "instantiates" --> LLM_Response_Handler -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -The LLM subsystem is designed around a core LLM Abstraction that provides a consistent interface for interacting with various LLM Provider Implementations. The LLM Creator is responsible for dynamically instantiating the appropriate LLM Provider Implementation based on configuration. Once an LLM interaction occurs, the LLM Response Handler orchestrates the subsequent message processing, including parsing responses and managing tool calls. The LLM Handler Creator facilitates the selection and instantiation of the correct LLM Response Handler. This architecture ensures modularity, extensibility, and LLM agnosticism by centralizing interactions through the LLM Abstraction and separating response processing logic. - -### LLM Abstraction -Serves as the central contract for all LLM interactions, abstracting away the specifics of different LLM Provider Implementations. - - -**Related Classes/Methods**: _None_ - -### LLM Creator -Acts as a factory, providing instances of concrete LLM Provider Implementations. - - -**Related Classes/Methods**: _None_ - -### LLM Provider Implementations -Concrete implementations that adhere to the LLM Abstraction. - - -**Related Classes/Methods**: _None_ - -### LLM Response Handler -Manages the entire post-response flow, including parsing, tool call execution, and message preparation. - - -**Related Classes/Methods**: _None_ - -### LLM Handler Creator -Dynamically provides the appropriate LLM Response Handler instance. - - -**Related Classes/Methods**: _None_ - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Output_Export_Layer.json b/.codeboarding/Output_Export_Layer.json new file mode 100644 index 00000000..7ea3bad1 --- /dev/null +++ b/.codeboarding/Output_Export_Layer.json @@ -0,0 +1,88 @@ +{ + "description": "The `starfish` core data persistence layer is responsible for serializing and exporting processed biological data, ensuring data integrity and accessibility for downstream analysis. Key components include `ImageStack` for managing and exporting multi-dimensional image data into formats like multi-page TIFF, `DecodedSpots` for persisting decoded biological spot information into tabular formats such as CSV, and `BinaryMaskIO` which handles the specialized, versioned storage of binary mask data. These components primarily interact by taking processed in-memory data structures and writing them to disk in standardized formats, facilitating data sharing and reproducibility in scientific workflows.", + "components": [ + { + "name": "ImageStack", + "description": "This component represents and manages multi-dimensional image data. Its primary responsibility within this layer is to provide methods for serializing and exporting processed image stacks into standard persistent formats, specifically multi-page TIFF files, which are crucial for scientific image data.", + "referenced_source_code": [ + { + "qualified_name": "ImageStack:to_multipage_tiff", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/imagestack.py", + "reference_start_line": 0, + "reference_end_line": 0 + }, + { + "qualified_name": "ImageStack:export", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/imagestack.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": false + }, + { + "name": "DecodedSpots", + "description": "This component encapsulates the results of the decoding process, holding information about identified biological spots (e.g., coordinates, intensities, gene assignments). Its core responsibility in this layer is to facilitate the persistence of this structured data, typically by saving it into tabular formats like CSV, enabling easy access and analysis.", + "referenced_source_code": [ + { + "qualified_name": "DecodedSpots:save_csv", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/types/_decoded_spots.py", + "reference_start_line": 0, + "reference_end_line": 0 + } + ], + "can_expand": false + }, + { + "name": "BinaryMaskIO", + "description": "This component is responsible for the specialized handling and persistence of binary mask data. It provides both low-level writing capabilities and a higher-level interface that ensures proper versioning of mask files, aligning with data integrity and reproducibility requirements in scientific workflows. This component abstracts the direct file writing operations for masks by delegating to specific versioned implementations.", + "referenced_source_code": [ + { + "qualified_name": "write_binary_mask", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/binary_mask/_io.py", + "reference_start_line": 136, + "reference_end_line": 162 + }, + { + "qualified_name": "write_versioned_binary_mask", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/binary_mask/_io.py", + "reference_start_line": 58, + "reference_end_line": 74 + } + ], + "can_expand": true + } + ], + "components_relations": [ + { + "relation": "exports", + "src_name": "ImageStack", + "dst_name": "Multi-page TIFF files" + }, + { + "relation": "serializes", + "src_name": "ImageStack", + "dst_name": "Image data" + }, + { + "relation": "saves to", + "src_name": "DecodedSpots", + "dst_name": "CSV files" + }, + { + "relation": "persists", + "src_name": "DecodedSpots", + "dst_name": "Decoded spot data" + }, + { + "relation": "delegates writing to", + "src_name": "BinaryMaskIO", + "dst_name": "Versioned Binary Mask Writers" + }, + { + "relation": "manages versioning for", + "src_name": "BinaryMaskIO", + "dst_name": "Binary Mask Data" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Output_Export_Layer.md b/.codeboarding/Output_Export_Layer.md new file mode 100644 index 00000000..ee16b526 --- /dev/null +++ b/.codeboarding/Output_Export_Layer.md @@ -0,0 +1,51 @@ +```mermaid +graph LR + ImageStack["ImageStack"] + DecodedSpots["DecodedSpots"] + BinaryMaskIO["BinaryMaskIO"] + ImageStack -- "exports" --> Multi_page_TIFF_files + ImageStack -- "serializes" --> Image_data + DecodedSpots -- "saves to" --> CSV_files + DecodedSpots -- "persists" --> Decoded_spot_data + BinaryMaskIO -- "delegates writing to" --> Versioned_Binary_Mask_Writers + BinaryMaskIO -- "manages versioning for" --> Binary_Mask_Data +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The `starfish` core data persistence layer is responsible for serializing and exporting processed biological data, ensuring data integrity and accessibility for downstream analysis. Key components include `ImageStack` for managing and exporting multi-dimensional image data into formats like multi-page TIFF, `DecodedSpots` for persisting decoded biological spot information into tabular formats such as CSV, and `BinaryMaskIO` which handles the specialized, versioned storage of binary mask data. These components primarily interact by taking processed in-memory data structures and writing them to disk in standardized formats, facilitating data sharing and reproducibility in scientific workflows. + +### ImageStack +This component represents and manages multi-dimensional image data. Its primary responsibility within this layer is to provide methods for serializing and exporting processed image stacks into standard persistent formats, specifically multi-page TIFF files, which are crucial for scientific image data. + + +**Related Classes/Methods**: + +- `ImageStack:to_multipage_tiff` +- `ImageStack:export` + + +### DecodedSpots +This component encapsulates the results of the decoding process, holding information about identified biological spots (e.g., coordinates, intensities, gene assignments). Its core responsibility in this layer is to facilitate the persistence of this structured data, typically by saving it into tabular formats like CSV, enabling easy access and analysis. + + +**Related Classes/Methods**: + +- `DecodedSpots:save_csv` + + +### BinaryMaskIO +This component is responsible for the specialized handling and persistence of binary mask data. It provides both low-level writing capabilities and a higher-level interface that ensures proper versioning of mask files, aligning with data integrity and reproducibility requirements in scientific workflows. This component abstracts the direct file writing operations for masks by delegating to specific versioned implementations. + + +**Related Classes/Methods**: + +- `write_binary_mask`:136-162 +- `write_versioned_binary_mask`:58-74 + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Retrieval_Module.json b/.codeboarding/Retrieval_Module.json deleted file mode 100644 index 3ef9c695..00000000 --- a/.codeboarding/Retrieval_Module.json +++ /dev/null @@ -1,38 +0,0 @@ -{ - "description": "The Retrieval Module is a core subsystem responsible for efficiently fetching the most relevant document chunks from the Vector Database based on user queries. It acts as the bridge between the user's information need and the contextual data required by the Large Language Model (LLM) for generating responses. Its boundaries encompass the logic for query processing, interaction with the knowledge base, and preparation of retrieved content.", - "components": [ - { - "name": "Retriever Creator", - "description": "This component serves as a factory for instantiating various retriever implementations. It abstracts the creation process, allowing other parts of the system to obtain a retriever instance (e.g., `Classic RAG Retriever`) without needing to know the specific concrete class or its initialization details. This design promotes modularity, extensibility, and supports the project's \"LLM Agnosticism\" and \"Modularity\" architectural biases by enabling easy swapping or addition of different retrieval strategies.", - "referenced_source_code": [ - { - "qualified_name": "Retriever Creator", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/retriever_creator.py", - "reference_start_line": 1, - "reference_end_line": 9999 - } - ], - "can_expand": true - }, - { - "name": "Classic RAG Retriever", - "description": "This component embodies a concrete and fundamental retrieval strategy within the RAG pipeline. It encapsulates the core logic for transforming a user query into an effective search query, interacting with the Vector Database to retrieve the most relevant document chunks, and preparing these chunks as contextual information for the LLM. It represents the \"how\" of fetching information in a standard RAG flow.", - "referenced_source_code": [ - { - "qualified_name": "Classic RAG Retriever", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/classic_rag.py", - "reference_start_line": 1, - "reference_end_line": 9999 - } - ], - "can_expand": false - } - ], - "components_relations": [ - { - "relation": "instantiates", - "src_name": "Retriever Creator", - "dst_name": "Classic RAG Retriever" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/Retrieval_Module.md b/.codeboarding/Retrieval_Module.md deleted file mode 100644 index 534c86d9..00000000 --- a/.codeboarding/Retrieval_Module.md +++ /dev/null @@ -1,34 +0,0 @@ -```mermaid -graph LR - Retriever_Creator["Retriever Creator"] - Classic_RAG_Retriever["Classic RAG Retriever"] - Retriever_Creator -- "instantiates" --> Classic_RAG_Retriever -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -The Retrieval Module is a core subsystem responsible for efficiently fetching the most relevant document chunks from the Vector Database based on user queries. It acts as the bridge between the user's information need and the contextual data required by the Large Language Model (LLM) for generating responses. Its boundaries encompass the logic for query processing, interaction with the knowledge base, and preparation of retrieved content. - -### Retriever Creator -This component serves as a factory for instantiating various retriever implementations. It abstracts the creation process, allowing other parts of the system to obtain a retriever instance (e.g., `Classic RAG Retriever`) without needing to know the specific concrete class or its initialization details. This design promotes modularity, extensibility, and supports the project's "LLM Agnosticism" and "Modularity" architectural biases by enabling easy swapping or addition of different retrieval strategies. - - -**Related Classes/Methods**: - -- `Retriever Creator`:1-9999 - - -### Classic RAG Retriever -This component embodies a concrete and fundamental retrieval strategy within the RAG pipeline. It encapsulates the core logic for transforming a user query into an effective search query, interacting with the Vector Database to retrieve the most relevant document chunks, and preparing these chunks as contextual information for the LLM. It represents the "how" of fetching information in a standard RAG flow. - - -**Related Classes/Methods**: - -- `Classic RAG Retriever`:1-9999 - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Spot_Analysis_Engine.json b/.codeboarding/Spot_Analysis_Engine.json new file mode 100644 index 00000000..2a7c1773 --- /dev/null +++ b/.codeboarding/Spot_Analysis_Engine.json @@ -0,0 +1,120 @@ +{ + "description": "The `Spot Analysis Engine` subsystem is dedicated to identifying, quantifying, and decoding biological spots within processed image data. It encompasses the entire pipeline from initial spot detection and intensity measurement to genetic decoding and target assignment.", + "components": [ + { + "name": "Spot Detection Initializer", + "description": "Initiates the spot analysis by detecting preliminary spot candidates from raw image data. This component represents the \"Extract\" phase for initial spot identification, preparing the raw image for further processing.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.spots.FindSpots", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/FindSpots/", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Pixel-Level Spot Refiner", + "description": "Refines spot detection at a pixel level, grouping adjacent features and contributing to the detailed attributes of spots. It's part of the \"Transform\" phase, enhancing raw detection data for greater accuracy.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.spots.DetectPixels", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/DetectPixels/", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": false + }, + { + "name": "Intensity Data Manager", + "description": "This is the central data structure of the subsystem. It manages and stores all quantitative intensity data for detected spots across various imaging rounds and channels. It serves as the primary data hub, embodying the \"Data-Centric Architecture\" principle.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.intensity_table.intensity_table", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/intensity_table/intensity_table.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Intensity Overlap Resolver", + "description": "A supporting component for the `Intensity Data Manager`, responsible for resolving spatial overlaps between intensity regions, ensuring accurate quantification and preventing data redundancy or misinterpretation.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.intensity_table.overlap", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/intensity_table/overlap.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Spot Decoder", + "description": "This component is responsible for the genetic decoding process, assigning genetic identities to spots based on their intensity profiles. This is a crucial \"Transform\" stage, converting raw intensity data into meaningful biological information.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.spots.DecodeSpots", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/DecodeSpots/", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": true + }, + { + "name": "Target Assignment Processor", + "description": "This component assigns detected and decoded spots to specific biological targets (e.g., cells, nuclei), providing biological context and completing the analysis by associating spots with higher-level entities. This can be seen as the \"Load\" or contextualization phase.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core.spots.AssignTargets.label", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/AssignTargets/label.py", + "reference_start_line": 1, + "reference_end_line": 1 + } + ], + "can_expand": false + } + ], + "components_relations": [ + { + "relation": "produces data for", + "src_name": "Spot Detection Initializer", + "dst_name": "Intensity Data Manager" + }, + { + "relation": "refines data in", + "src_name": "Pixel-Level Spot Refiner", + "dst_name": "Intensity Data Manager" + }, + { + "relation": "provides data to", + "src_name": "Intensity Data Manager", + "dst_name": "Spot Decoder" + }, + { + "relation": "provides data to", + "src_name": "Intensity Data Manager", + "dst_name": "Target Assignment Processor" + }, + { + "relation": "relies on", + "src_name": "Intensity Data Manager", + "dst_name": "Intensity Overlap Resolver" + }, + { + "relation": "supports", + "src_name": "Intensity Overlap Resolver", + "dst_name": "Intensity Data Manager" + }, + { + "relation": "updates data in", + "src_name": "Spot Decoder", + "dst_name": "Intensity Data Manager" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Spot_Analysis_Engine.md b/.codeboarding/Spot_Analysis_Engine.md new file mode 100644 index 00000000..b9be22ae --- /dev/null +++ b/.codeboarding/Spot_Analysis_Engine.md @@ -0,0 +1,80 @@ +```mermaid +graph LR + Spot_Detection_Initializer["Spot Detection Initializer"] + Pixel_Level_Spot_Refiner["Pixel-Level Spot Refiner"] + Intensity_Data_Manager["Intensity Data Manager"] + Intensity_Overlap_Resolver["Intensity Overlap Resolver"] + Spot_Decoder["Spot Decoder"] + Target_Assignment_Processor["Target Assignment Processor"] + Spot_Detection_Initializer -- "produces data for" --> Intensity_Data_Manager + Pixel_Level_Spot_Refiner -- "refines data in" --> Intensity_Data_Manager + Intensity_Data_Manager -- "provides data to" --> Spot_Decoder + Intensity_Data_Manager -- "provides data to" --> Target_Assignment_Processor + Intensity_Data_Manager -- "relies on" --> Intensity_Overlap_Resolver + Intensity_Overlap_Resolver -- "supports" --> Intensity_Data_Manager + Spot_Decoder -- "updates data in" --> Intensity_Data_Manager +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +The `Spot Analysis Engine` subsystem is dedicated to identifying, quantifying, and decoding biological spots within processed image data. It encompasses the entire pipeline from initial spot detection and intensity measurement to genetic decoding and target assignment. + +### Spot Detection Initializer +Initiates the spot analysis by detecting preliminary spot candidates from raw image data. This component represents the "Extract" phase for initial spot identification, preparing the raw image for further processing. + + +**Related Classes/Methods**: + +- `starfish.core.spots.FindSpots` + + +### Pixel-Level Spot Refiner +Refines spot detection at a pixel level, grouping adjacent features and contributing to the detailed attributes of spots. It's part of the "Transform" phase, enhancing raw detection data for greater accuracy. + + +**Related Classes/Methods**: + +- `starfish.core.spots.DetectPixels` + + +### Intensity Data Manager +This is the central data structure of the subsystem. It manages and stores all quantitative intensity data for detected spots across various imaging rounds and channels. It serves as the primary data hub, embodying the "Data-Centric Architecture" principle. + + +**Related Classes/Methods**: + +- `starfish.core.intensity_table.intensity_table` + + +### Intensity Overlap Resolver +A supporting component for the `Intensity Data Manager`, responsible for resolving spatial overlaps between intensity regions, ensuring accurate quantification and preventing data redundancy or misinterpretation. + + +**Related Classes/Methods**: + +- `starfish.core.intensity_table.overlap` + + +### Spot Decoder +This component is responsible for the genetic decoding process, assigning genetic identities to spots based on their intensity profiles. This is a crucial "Transform" stage, converting raw intensity data into meaningful biological information. + + +**Related Classes/Methods**: + +- `starfish.core.spots.DecodeSpots` + + +### Target Assignment Processor +This component assigns detected and decoded spots to specific biological targets (e.g., cells, nuclei), providing biological context and completing the analysis by associating spots with higher-level entities. This can be seen as the "Load" or contextualization phase. + + +**Related Classes/Methods**: + +- `starfish.core.spots.AssignTargets.label` + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/User_Interface_UI_.json b/.codeboarding/User_Interface_UI_.json deleted file mode 100644 index 486a146a..00000000 --- a/.codeboarding/User_Interface_UI_.json +++ /dev/null @@ -1,112 +0,0 @@ -{ - "description": "The DocsGPT application is structured around a core `App` component that orchestrates the user interface and manages the overall application flow. It renders key functional components such as `Conversation`, `Upload`, and `Settings`. The `Conversation` component provides the interactive chat interface, allowing users to query the RAG system and view responses. The `Upload` component handles the backend processing of document ingestion, integrating new knowledge into the system. The `Settings` component centralizes the application's configuration, providing critical parameters for LLM providers, API keys, and vector store settings, which are utilized by other components like `Upload`. Additionally, the `DocsGPTWidget` offers an embeddable version of the chat functionality, leveraging the core logic of the `Conversation` component for external website integration. This architecture ensures a modular and maintainable system, with clear responsibilities and interaction pathways between components.", - "components": [ - { - "name": "App", - "description": "Acts as the main application orchestrator and entry point. It manages global UI state (e.g., authentication status, theming), handles client-side routing, and defines the overall layout and navigation structure of the DocsGPT application. It ensures a cohesive user experience across different functionalities.", - "referenced_source_code": [ - { - "qualified_name": "App", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/chatwoot/app.py", - "reference_start_line": 48, - "reference_end_line": 48 - } - ], - "can_expand": true - }, - { - "name": "Conversation", - "description": "Manages the core interactive chat interface. This component is central to the RAG system's user interaction, displaying conversation history, allowing users to input queries, and presenting the generated responses from the backend. It also facilitates user feedback on responses, which is crucial for iterative improvement of the RAG model.", - "referenced_source_code": [ - { - "qualified_name": "Conversation", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/chatwoot/app.py", - "reference_start_line": 61, - "reference_end_line": 61 - } - ], - "can_expand": false - }, - { - "name": "Upload", - "description": "Handles the backend logic for ingesting new documents into the RAG system's knowledge base. It processes file uploads, manages different ingestor types, and interacts with the parsing and vector store components to store the document content.", - "referenced_source_code": [ - { - "qualified_name": "Upload", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/user/routes.py", - "reference_start_line": 164, - "reference_end_line": 184 - } - ], - "can_expand": true - }, - { - "name": "Settings", - "description": "Manages application-wide configurations and environment variables. This component is responsible for loading and providing access to various settings, including LLM provider details, API keys, vector store configurations, and other operational parameters that influence the RAG system's behavior.", - "referenced_source_code": [ - { - "qualified_name": "Settings", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/elasticsearch.py", - "reference_start_line": 121, - "reference_end_line": 121 - } - ], - "can_expand": true - }, - { - "name": "DocsGPTWidget", - "description": "Serves as an embeddable DocsGPT chat widget, designed for seamless integration into external websites. It encapsulates core chat functionality, providing a portable and lightweight experience of the DocsGPT RAG system.", - "referenced_source_code": [ - { - "qualified_name": "DocsGPTWidget", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/react-widget/src/components/DocsGPTWidget.tsx", - "reference_start_line": 554, - "reference_end_line": 586 - } - ], - "can_expand": false - } - ], - "components_relations": [ - { - "relation": "renders", - "src_name": "App", - "dst_name": "Conversation" - }, - { - "relation": "renders", - "src_name": "App", - "dst_name": "Upload" - }, - { - "relation": "renders", - "src_name": "App", - "dst_name": "Settings" - }, - { - "relation": "manages state for", - "src_name": "App", - "dst_name": "Conversation" - }, - { - "relation": "manages state for", - "src_name": "App", - "dst_name": "Upload" - }, - { - "relation": "manages state for", - "src_name": "App", - "dst_name": "Settings" - }, - { - "relation": "leverages core logic from", - "src_name": "DocsGPTWidget", - "dst_name": "Conversation" - }, - { - "relation": "configures via", - "src_name": "Upload", - "dst_name": "Settings" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/User_Interface_UI_.md b/.codeboarding/User_Interface_UI_.md deleted file mode 100644 index 3e67ca70..00000000 --- a/.codeboarding/User_Interface_UI_.md +++ /dev/null @@ -1,71 +0,0 @@ -```mermaid -graph LR - App["App"] - Conversation["Conversation"] - Upload["Upload"] - Settings["Settings"] - DocsGPTWidget["DocsGPTWidget"] - App -- "renders" --> Conversation - App -- "renders" --> Upload - App -- "renders" --> Settings - App -- "manages state for" --> Conversation - App -- "manages state for" --> Upload - App -- "manages state for" --> Settings - DocsGPTWidget -- "leverages core logic from" --> Conversation - Upload -- "configures via" --> Settings -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -The DocsGPT application is structured around a core `App` component that orchestrates the user interface and manages the overall application flow. It renders key functional components such as `Conversation`, `Upload`, and `Settings`. The `Conversation` component provides the interactive chat interface, allowing users to query the RAG system and view responses. The `Upload` component handles the backend processing of document ingestion, integrating new knowledge into the system. The `Settings` component centralizes the application's configuration, providing critical parameters for LLM providers, API keys, and vector store settings, which are utilized by other components like `Upload`. Additionally, the `DocsGPTWidget` offers an embeddable version of the chat functionality, leveraging the core logic of the `Conversation` component for external website integration. This architecture ensures a modular and maintainable system, with clear responsibilities and interaction pathways between components. - -### App -Acts as the main application orchestrator and entry point. It manages global UI state (e.g., authentication status, theming), handles client-side routing, and defines the overall layout and navigation structure of the DocsGPT application. It ensures a cohesive user experience across different functionalities. - - -**Related Classes/Methods**: - -- `App` - - -### Conversation -Manages the core interactive chat interface. This component is central to the RAG system's user interaction, displaying conversation history, allowing users to input queries, and presenting the generated responses from the backend. It also facilitates user feedback on responses, which is crucial for iterative improvement of the RAG model. - - -**Related Classes/Methods**: - -- `Conversation` - - -### Upload -Handles the backend logic for ingesting new documents into the RAG system's knowledge base. It processes file uploads, manages different ingestor types, and interacts with the parsing and vector store components to store the document content. - - -**Related Classes/Methods**: - -- `Upload`:164-184 - - -### Settings -Manages application-wide configurations and environment variables. This component is responsible for loading and providing access to various settings, including LLM provider details, API keys, vector store configurations, and other operational parameters that influence the RAG system's behavior. - - -**Related Classes/Methods**: - -- `Settings` - - -### DocsGPTWidget -Serves as an embeddable DocsGPT chat widget, designed for seamless integration into external websites. It encapsulates core chat functionality, providing a portable and lightweight experience of the DocsGPT RAG system. - - -**Related Classes/Methods**: - -- `DocsGPTWidget`:554-586 - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Vector_Database_Knowledge_Base.json b/.codeboarding/Vector_Database_Knowledge_Base.json deleted file mode 100644 index c1c3e749..00000000 --- a/.codeboarding/Vector_Database_Knowledge_Base.json +++ /dev/null @@ -1,196 +0,0 @@ -{ - "description": "The feedback correctly identified an issue with the VectorStoreCreator component's source file reference. The original analysis had FileRef: None, which has been corrected to FileRef: /home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/vector_creator.py based on the readFile tool's output. It appears there was a typo in the original QName and FileRef for VectorStoreCreator, as the correct file is vector_creator.py not vectorstore_creator.py. The VectorCreator (formerly VectorStoreCreator) acts as a factory, centralizing the creation of various vector store instances. This design pattern allows the application to dynamically switch between different vector store backends (e.g., FAISS, MongoDB, PGVector) at runtime, promoting modularity and decoupling.", - "components": [ - { - "name": "VectorStoreBase", - "description": "Defines the abstract interface for all vector store operations, including adding documents, performing similarity searches, and managing embedding model configurations. It establishes the contract for how any vector store backend should behave, ensuring extensibility and interchangeability.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.base.VectorStoreBase", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/base.py", - "reference_start_line": 0, - "reference_end_line": 0 - } - ], - "can_expand": false - }, - { - "name": "VectorCreator", - "description": "Centralizes the logic for creating instances of specific vector store implementations (e.g., FAISS, MongoDB, PGVector) based on system configuration. This factory pattern enables dynamic backend switching at runtime and promotes modularity by decoupling the client from concrete vector store classes.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.vector_creator.VectorCreator", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/vector_creator.py", - "reference_start_line": 9, - "reference_end_line": 24 - } - ], - "can_expand": false - }, - { - "name": "EmbeddingsWrapper", - "description": "Encapsulates the logic for interacting with various embedding models. Its primary function is to convert textual data into high-dimensional numerical vectors (embeddings) that can be stored in the vector database and used for similarity search. This component abstracts away the specifics of different embedding providers.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.base.EmbeddingsWrapper", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/base.py", - "reference_start_line": 7, - "reference_end_line": 24 - } - ], - "can_expand": true - }, - { - "name": "FAISSVectorStore", - "description": "Provides a concrete implementation of the VectorStoreBase interface, leveraging the FAISS library for efficient similarity search on locally stored vector indexes. It's suitable for smaller-scale deployments or local development.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.faiss.FAISSVectorStore", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/faiss.py", - "reference_start_line": 0, - "reference_end_line": 0 - } - ], - "can_expand": true - }, - { - "name": "MongoDBVectorStore", - "description": "Provides a concrete implementation of the VectorStoreBase interface, utilizing MongoDB as the backend for storing documents and their associated embeddings. This allows for scalable, document-oriented storage with vector search capabilities.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.mongodb.MongoDBVectorStore", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/mongodb.py", - "reference_start_line": 7, - "reference_end_line": 177 - } - ], - "can_expand": false - }, - { - "name": "PGVectorStore", - "description": "Provides a concrete implementation of the VectorStoreBase interface, integrating with PostgreSQL databases via the PGVector extension. This enables storing and querying embeddings directly within a robust relational database system.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.pgvector.PGVectorStore", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/pgvector.py", - "reference_start_line": 8, - "reference_end_line": 303 - } - ], - "can_expand": true - }, - { - "name": "ElasticsearchVectorStore", - "description": "Provides a concrete implementation of the VectorStoreBase interface, leveraging Elasticsearch for scalable, distributed storage and search of documents and their embeddings. It's well-suited for large datasets and complex query capabilities.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.elasticsearch.ElasticsearchVectorStore", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/elasticsearch.py", - "reference_start_line": 0, - "reference_end_line": 0 - } - ], - "can_expand": true - }, - { - "name": "LanceDBVectorStore", - "description": "Provides a concrete implementation of the VectorStoreBase interface, utilizing LanceDB for efficient, local, and serverless vector storage. It's designed for high-performance similarity search on embedded data.", - "referenced_source_code": [ - { - "qualified_name": "application.vectorstore.lancedb.LanceDBVectorStore", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/lancedb.py", - "reference_start_line": 6, - "reference_end_line": 119 - } - ], - "can_expand": false - } - ], - "components_relations": [ - { - "relation": "utilizes", - "src_name": "VectorStoreBase", - "dst_name": "EmbeddingsWrapper" - }, - { - "relation": "implements", - "src_name": "FAISSVectorStore", - "dst_name": "VectorStoreBase" - }, - { - "relation": "implements", - "src_name": "MongoDBVectorStore", - "dst_name": "VectorStoreBase" - }, - { - "relation": "implements", - "src_name": "PGVectorStore", - "dst_name": "VectorStoreBase" - }, - { - "relation": "implements", - "src_name": "ElasticsearchVectorStore", - "dst_name": "VectorStoreBase" - }, - { - "relation": "implements", - "src_name": "LanceDBVectorStore", - "dst_name": "VectorStoreBase" - }, - { - "relation": "creates", - "src_name": "VectorCreator", - "dst_name": "FAISSVectorStore" - }, - { - "relation": "creates", - "src_name": "VectorCreator", - "dst_name": "MongoDBVectorStore" - }, - { - "relation": "creates", - "src_name": "VectorCreator", - "dst_name": "PGVectorStore" - }, - { - "relation": "creates", - "src_name": "VectorCreator", - "dst_name": "ElasticsearchVectorStore" - }, - { - "relation": "creates", - "src_name": "VectorCreator", - "dst_name": "LanceDBVectorStore" - }, - { - "relation": "relies on", - "src_name": "VectorCreator", - "dst_name": "VectorStoreBase" - }, - { - "relation": "utilizes", - "src_name": "FAISSVectorStore", - "dst_name": "EmbeddingsWrapper" - }, - { - "relation": "utilizes", - "src_name": "MongoDBVectorStore", - "dst_name": "EmbeddingsWrapper" - }, - { - "relation": "utilizes", - "src_name": "PGVectorStore", - "dst_name": "EmbeddingsWrapper" - }, - { - "relation": "utilizes", - "src_name": "ElasticsearchVectorStore", - "dst_name": "EmbeddingsWrapper" - }, - { - "relation": "utilizes", - "src_name": "LanceDBVectorStore", - "dst_name": "EmbeddingsWrapper" - } - ] -} \ No newline at end of file diff --git a/.codeboarding/Vector_Database_Knowledge_Base.md b/.codeboarding/Vector_Database_Knowledge_Base.md deleted file mode 100644 index 1b5598a9..00000000 --- a/.codeboarding/Vector_Database_Knowledge_Base.md +++ /dev/null @@ -1,110 +0,0 @@ -```mermaid -graph LR - VectorStoreBase["VectorStoreBase"] - VectorCreator["VectorCreator"] - EmbeddingsWrapper["EmbeddingsWrapper"] - FAISSVectorStore["FAISSVectorStore"] - MongoDBVectorStore["MongoDBVectorStore"] - PGVectorStore["PGVectorStore"] - ElasticsearchVectorStore["ElasticsearchVectorStore"] - LanceDBVectorStore["LanceDBVectorStore"] - VectorStoreBase -- "utilizes" --> EmbeddingsWrapper - FAISSVectorStore -- "implements" --> VectorStoreBase - MongoDBVectorStore -- "implements" --> VectorStoreBase - PGVectorStore -- "implements" --> VectorStoreBase - ElasticsearchVectorStore -- "implements" --> VectorStoreBase - LanceDBVectorStore -- "implements" --> VectorStoreBase - VectorCreator -- "creates" --> FAISSVectorStore - VectorCreator -- "creates" --> MongoDBVectorStore - VectorCreator -- "creates" --> PGVectorStore - VectorCreator -- "creates" --> ElasticsearchVectorStore - VectorCreator -- "creates" --> LanceDBVectorStore - VectorCreator -- "relies on" --> VectorStoreBase - FAISSVectorStore -- "utilizes" --> EmbeddingsWrapper - MongoDBVectorStore -- "utilizes" --> EmbeddingsWrapper - PGVectorStore -- "utilizes" --> EmbeddingsWrapper - ElasticsearchVectorStore -- "utilizes" --> EmbeddingsWrapper - LanceDBVectorStore -- "utilizes" --> EmbeddingsWrapper -``` - -[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) - -## Details - -The feedback correctly identified an issue with the VectorStoreCreator component's source file reference. The original analysis had FileRef: None, which has been corrected to FileRef: /home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/vector_creator.py based on the readFile tool's output. It appears there was a typo in the original QName and FileRef for VectorStoreCreator, as the correct file is vector_creator.py not vectorstore_creator.py. The VectorCreator (formerly VectorStoreCreator) acts as a factory, centralizing the creation of various vector store instances. This design pattern allows the application to dynamically switch between different vector store backends (e.g., FAISS, MongoDB, PGVector) at runtime, promoting modularity and decoupling. - -### VectorStoreBase -Defines the abstract interface for all vector store operations, including adding documents, performing similarity searches, and managing embedding model configurations. It establishes the contract for how any vector store backend should behave, ensuring extensibility and interchangeability. - - -**Related Classes/Methods**: - -- `application.vectorstore.base.VectorStoreBase` - - -### VectorCreator -Centralizes the logic for creating instances of specific vector store implementations (e.g., FAISS, MongoDB, PGVector) based on system configuration. This factory pattern enables dynamic backend switching at runtime and promotes modularity by decoupling the client from concrete vector store classes. - - -**Related Classes/Methods**: - -- `application.vectorstore.vector_creator.VectorCreator`:9-24 - - -### EmbeddingsWrapper -Encapsulates the logic for interacting with various embedding models. Its primary function is to convert textual data into high-dimensional numerical vectors (embeddings) that can be stored in the vector database and used for similarity search. This component abstracts away the specifics of different embedding providers. - - -**Related Classes/Methods**: - -- `application.vectorstore.base.EmbeddingsWrapper`:7-24 - - -### FAISSVectorStore -Provides a concrete implementation of the VectorStoreBase interface, leveraging the FAISS library for efficient similarity search on locally stored vector indexes. It's suitable for smaller-scale deployments or local development. - - -**Related Classes/Methods**: - -- `application.vectorstore.faiss.FAISSVectorStore` - - -### MongoDBVectorStore -Provides a concrete implementation of the VectorStoreBase interface, utilizing MongoDB as the backend for storing documents and their associated embeddings. This allows for scalable, document-oriented storage with vector search capabilities. - - -**Related Classes/Methods**: - -- `application.vectorstore.mongodb.MongoDBVectorStore`:7-177 - - -### PGVectorStore -Provides a concrete implementation of the VectorStoreBase interface, integrating with PostgreSQL databases via the PGVector extension. This enables storing and querying embeddings directly within a robust relational database system. - - -**Related Classes/Methods**: - -- `application.vectorstore.pgvector.PGVectorStore`:8-303 - - -### ElasticsearchVectorStore -Provides a concrete implementation of the VectorStoreBase interface, leveraging Elasticsearch for scalable, distributed storage and search of documents and their embeddings. It's well-suited for large datasets and complex query capabilities. - - -**Related Classes/Methods**: - -- `application.vectorstore.elasticsearch.ElasticsearchVectorStore` - - -### LanceDBVectorStore -Provides a concrete implementation of the VectorStoreBase interface, utilizing LanceDB for efficient, local, and serverless vector storage. It's designed for high-performance similarity search on embedded data. - - -**Related Classes/Methods**: - -- `application.vectorstore.lancedb.LanceDBVectorStore`:6-119 - - - - -### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/Visualization_Utilities.json b/.codeboarding/Visualization_Utilities.json new file mode 100644 index 00000000..73aab24a --- /dev/null +++ b/.codeboarding/Visualization_Utilities.json @@ -0,0 +1,165 @@ +{ + "description": "This subsystem provides a comprehensive set of tools for generating visual representations of intermediate and final analysis results, crucial for quality control, debugging, and interpretation within the `starfish` project. It adheres to the `Data Processing Library / Scientific Toolkit` patterns by offering modular and specialized visualization capabilities.", + "components": [ + { + "name": "starfish.core._display.display", + "description": "The primary orchestrator for high-level data visualization. It acts as a facade, coordinating various data preparation steps before rendering complex visual outputs. This component is fundamental as it provides the main entry point for users to visualize processed data.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core._display._mask_low_intensity_spots", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 70, + "reference_end_line": 83 + }, + { + "qualified_name": "starfish.core._display._spots_to_markers", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 86, + "reference_end_line": 118 + }, + { + "qualified_name": "starfish.core._display._max_intensity_table_maintain_dims", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 39, + "reference_end_line": 67 + } + ], + "can_expand": true + }, + { + "name": "starfish.util.plot.imshow_plane", + "description": "A foundational utility for displaying single image planes. It serves as a low-level building block for more complex plotting functions, providing the basic image rendering capability.", + "referenced_source_code": [ + { + "qualified_name": "starfish.util.plot.overlay_spot_calls", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/util/plot.py", + "reference_start_line": 102, + "reference_end_line": 165 + }, + { + "qualified_name": "starfish.util.plot.diagnose_registration", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/util/plot.py", + "reference_start_line": 175, + "reference_end_line": 223 + } + ], + "can_expand": false + }, + { + "name": "starfish.util.plot.overlay_spot_calls", + "description": "A specialized function for visualizing detected spots by overlaying them onto an image. This component is crucial for interpreting spot detection results.", + "referenced_source_code": [ + { + "qualified_name": "starfish.util.plot.imshow_plane", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/util/plot.py", + "reference_start_line": 15, + "reference_end_line": 61 + } + ], + "can_expand": false + }, + { + "name": "starfish.util.plot.diagnose_registration", + "description": "Provides visual diagnostics to assess the quality of image registration. This component is vital for quality control in image processing pipelines.", + "referenced_source_code": [ + { + "qualified_name": "starfish.util.plot.imshow_plane", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/util/plot.py", + "reference_start_line": 15, + "reference_end_line": 61 + } + ], + "can_expand": false + }, + { + "name": "starfish.core._display._mask_low_intensity_spots", + "description": "A utility function for pre-processing spot data by filtering out entries below a certain intensity threshold, ensuring cleaner visualizations.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core._display.display", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 121, + "reference_end_line": 296 + } + ], + "can_expand": false + }, + { + "name": "starfish.core._display._spots_to_markers", + "description": "Responsible for transforming raw spot data into a format suitable for graphical markers, often used in scatter plots or overlays. This is a key data preparation step for visualization.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core._display.display", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 121, + "reference_end_line": 296 + } + ], + "can_expand": false + }, + { + "name": "starfish.core._display._max_intensity_table_maintain_dims", + "description": "Handles the preparation of intensity data tables for display, ensuring that dimensional consistency is maintained across different data views. This is critical for accurate and consistent data representation.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core._display.display", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 121, + "reference_end_line": 296 + }, + { + "qualified_name": "starfish.core._display._normalize_axes", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 25, + "reference_end_line": 36 + } + ], + "can_expand": false + }, + { + "name": "starfish.core._display._normalize_axes", + "description": "A utility function that normalizes image axes to ensure consistent display scaling, regardless of the original data dimensions. This ensures visual comparability across different datasets.", + "referenced_source_code": [ + { + "qualified_name": "starfish.core._display._max_intensity_table_maintain_dims", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", + "reference_start_line": 39, + "reference_end_line": 67 + } + ], + "can_expand": false + } + ], + "components_relations": [ + { + "relation": "calls", + "src_name": "starfish.core._display.display", + "dst_name": "starfish.core._display._mask_low_intensity_spots" + }, + { + "relation": "calls", + "src_name": "starfish.core._display.display", + "dst_name": "starfish.core._display._spots_to_markers" + }, + { + "relation": "calls", + "src_name": "starfish.core._display.display", + "dst_name": "starfish.core._display._max_intensity_table_maintain_dims" + }, + { + "relation": "calls", + "src_name": "starfish.util.plot.overlay_spot_calls", + "dst_name": "starfish.util.plot.imshow_plane" + }, + { + "relation": "calls", + "src_name": "starfish.util.plot.diagnose_registration", + "dst_name": "starfish.util.plot.imshow_plane" + }, + { + "relation": "calls", + "src_name": "starfish.core._display._max_intensity_table_maintain_dims", + "dst_name": "starfish.core._display._normalize_axes" + } + ] +} \ No newline at end of file diff --git a/.codeboarding/Visualization_Utilities.md b/.codeboarding/Visualization_Utilities.md new file mode 100644 index 00000000..bb034d83 --- /dev/null +++ b/.codeboarding/Visualization_Utilities.md @@ -0,0 +1,103 @@ +```mermaid +graph LR + starfish_core__display_display["starfish.core._display.display"] + starfish_util_plot_imshow_plane["starfish.util.plot.imshow_plane"] + starfish_util_plot_overlay_spot_calls["starfish.util.plot.overlay_spot_calls"] + starfish_util_plot_diagnose_registration["starfish.util.plot.diagnose_registration"] + starfish_core__display__mask_low_intensity_spots["starfish.core._display._mask_low_intensity_spots"] + starfish_core__display__spots_to_markers["starfish.core._display._spots_to_markers"] + starfish_core__display__max_intensity_table_maintain_dims["starfish.core._display._max_intensity_table_maintain_dims"] + starfish_core__display__normalize_axes["starfish.core._display._normalize_axes"] + starfish_core__display_display -- "calls" --> starfish_core__display__mask_low_intensity_spots + starfish_core__display_display -- "calls" --> starfish_core__display__spots_to_markers + starfish_core__display_display -- "calls" --> starfish_core__display__max_intensity_table_maintain_dims + starfish_util_plot_overlay_spot_calls -- "calls" --> starfish_util_plot_imshow_plane + starfish_util_plot_diagnose_registration -- "calls" --> starfish_util_plot_imshow_plane + starfish_core__display__max_intensity_table_maintain_dims -- "calls" --> starfish_core__display__normalize_axes +``` + +[![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) + +## Details + +This subsystem provides a comprehensive set of tools for generating visual representations of intermediate and final analysis results, crucial for quality control, debugging, and interpretation within the `starfish` project. It adheres to the `Data Processing Library / Scientific Toolkit` patterns by offering modular and specialized visualization capabilities. + +### starfish.core._display.display +The primary orchestrator for high-level data visualization. It acts as a facade, coordinating various data preparation steps before rendering complex visual outputs. This component is fundamental as it provides the main entry point for users to visualize processed data. + + +**Related Classes/Methods**: + +- `starfish.core._display._mask_low_intensity_spots`:70-83 +- `starfish.core._display._spots_to_markers`:86-118 +- `starfish.core._display._max_intensity_table_maintain_dims`:39-67 + + +### starfish.util.plot.imshow_plane +A foundational utility for displaying single image planes. It serves as a low-level building block for more complex plotting functions, providing the basic image rendering capability. + + +**Related Classes/Methods**: + +- `starfish.util.plot.overlay_spot_calls`:102-165 +- `starfish.util.plot.diagnose_registration`:175-223 + + +### starfish.util.plot.overlay_spot_calls +A specialized function for visualizing detected spots by overlaying them onto an image. This component is crucial for interpreting spot detection results. + + +**Related Classes/Methods**: + +- `starfish.util.plot.imshow_plane`:15-61 + + +### starfish.util.plot.diagnose_registration +Provides visual diagnostics to assess the quality of image registration. This component is vital for quality control in image processing pipelines. + + +**Related Classes/Methods**: + +- `starfish.util.plot.imshow_plane`:15-61 + + +### starfish.core._display._mask_low_intensity_spots +A utility function for pre-processing spot data by filtering out entries below a certain intensity threshold, ensuring cleaner visualizations. + + +**Related Classes/Methods**: + +- `starfish.core._display.display`:121-296 + + +### starfish.core._display._spots_to_markers +Responsible for transforming raw spot data into a format suitable for graphical markers, often used in scatter plots or overlays. This is a key data preparation step for visualization. + + +**Related Classes/Methods**: + +- `starfish.core._display.display`:121-296 + + +### starfish.core._display._max_intensity_table_maintain_dims +Handles the preparation of intensity data tables for display, ensuring that dimensional consistency is maintained across different data views. This is critical for accurate and consistent data representation. + + +**Related Classes/Methods**: + +- `starfish.core._display.display`:121-296 +- `starfish.core._display._normalize_axes`:25-36 + + +### starfish.core._display._normalize_axes +A utility function that normalizes image axes to ensure consistent display scaling, regardless of the original data dimensions. This ensures visual comparability across different datasets. + + +**Related Classes/Methods**: + +- `starfish.core._display._max_intensity_table_maintain_dims`:39-67 + + + + +### [FAQ](https://github.com/CodeBoarding/GeneratedOnBoardings/tree/main?tab=readme-ov-file#faq) \ No newline at end of file diff --git a/.codeboarding/analysis.json b/.codeboarding/analysis.json index 3d591380..77983ca1 100644 --- a/.codeboarding/analysis.json +++ b/.codeboarding/analysis.json @@ -1,37 +1,25 @@ { - "description": "DocsGPT operates on a clear client-server architecture, with the User Interface (UI) serving as the primary interaction point. User requests are sent to the Backend Core, which acts as the central orchestrator. The Backend Core handles routing, authentication, and core application logic. For long-running operations like document ingestion, tasks are enqueued to the Asynchronous Task Worker.\n\nWhen a user query requires information retrieval, the Backend Core interacts with the Retrieval Module, which in turn queries the Vector Database / Knowledge Base to fetch relevant document chunks. The retrieved context, along with the user's query, is then forwarded to the LLM Integration Layer. This layer provides a unified interface for various Large Language Models. For complex tasks, the LLM Integration Layer can delegate to the Agentic Reasoning & External Tools component, which leverages external tools and APIs to fulfill the request.\n\nThe Data Ingestion & Storage component is responsible for processing and storing documents, including parsing, chunking, and embedding, before they are stored in the Vector Database / Knowledge Base. Finally, the LLM Integration Layer sends the generated answers back to the Backend Core, which then relays them to the User Interface. This architecture ensures a scalable, modular, and efficient flow of data and operations within the DocsGPT system.", + "description": "The Starfish project is structured around a robust data processing pipeline for spatial transcriptomics. It begins with a `Data Input & Validation Layer` responsible for ingesting and validating raw experimental data against the SpaceTx format. This validated data is then transformed into `Core Data Structures`, primarily multi-dimensional image stacks and codebooks, which serve as the central in-memory representation. The `Image Processing Engine` operates on these image stacks, applying various algorithms for enhancement and transformation. Concurrently, the `Spot Analysis Engine` identifies, quantifies, and decodes biological spots using both the processed images and the codebook. Finally, the `Output & Export Layer` handles the serialization and persistence of all processed data, while `Visualization Utilities` provide crucial tools for inspecting intermediate and final results. This modular design ensures clear separation of concerns and facilitates a streamlined data flow from raw input to interpretable biological insights.", "components": [ { - "name": "User Interface (UI)", - "description": "The interactive frontend for users to engage with DocsGPT, encompassing chat functionalities, document management, and application settings.", + "name": "Data Input & Validation Layer", + "description": "Responsible for loading raw experimental data and metadata, ensuring it conforms to the SpaceTx format. It acts as the initial entry point for all data processing pipelines.", "referenced_source_code": [ { - "qualified_name": "frontend.src.App", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/App.tsx", + "qualified_name": "starfish/core/experiment/experiment.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/experiment/experiment.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "frontend.src.conversation.Conversation", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/conversation/Conversation.tsx", + "qualified_name": "starfish/core/experiment/builder/builder.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/experiment/builder/builder.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "frontend.src.upload.Upload", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/upload/Upload.tsx", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "frontend.src.settings.index", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/frontend/src/settings/index.tsx", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "extensions.react_widget.src.components.DocsGPTWidget", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/extensions/react-widget/src/components/DocsGPTWidget.tsx", + "qualified_name": "starfish/core/spacetx_format/validate_sptx.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spacetx_format/validate_sptx.py", "reference_start_line": 0, "reference_end_line": 0 } @@ -39,85 +27,24 @@ "can_expand": true }, { - "name": "Backend Core", - "description": "Acts as the central entry point for all frontend requests, routing them to appropriate backend services, and managing core application logic, authentication, and configuration.", + "name": "Core Data Structures", + "description": "Defines and manages the fundamental in-memory data structures used throughout the Starfish pipeline, primarily multi-dimensional image stacks and codebooks.", "referenced_source_code": [ { - "qualified_name": "application.app", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/app.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.api.answer.routes", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/answer/routes", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.api.user.routes", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/api/user/routes.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.auth", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/auth.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.core.settings", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/core/settings.py", - "reference_start_line": 0, - "reference_end_line": 0 - } - ], - "can_expand": true - }, - { - "name": "Data Ingestion & Storage", - "description": "Handles the entire lifecycle of data preparation, including loading, parsing, chunking, and embedding various data sources, and manages the persistent storage and retrieval of raw and processed files.", - "referenced_source_code": [ - { - "qualified_name": "application.parser.embedding_pipeline", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/embedding_pipeline.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.parser.chunking", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/chunking.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.parser.file", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/file", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.parser.remote", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/parser/remote", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.storage.storage_creator", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/storage_creator.py", + "qualified_name": "starfish/core/imagestack/imagestack.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/imagestack.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.storage.s3", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/s3.py", + "qualified_name": "starfish/core/imagestack/parser/", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/parser/", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.storage.local", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/storage/local.py", + "qualified_name": "starfish/core/codebook/codebook.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/codebook/codebook.py", "reference_start_line": 0, "reference_end_line": 0 } @@ -125,55 +52,36 @@ "can_expand": true }, { - "name": "Vector Database / Knowledge Base", - "description": "Serves as the persistent storage for embedded document chunks, enabling efficient semantic search and acting as the system's primary knowledge repository. Supports multiple backend implementations.", + "name": "Image Processing Engine", + "description": "Applies various algorithms to transform and enhance image data, including filtering, registration, and basic segmentation operations.", "referenced_source_code": [ { - "qualified_name": "application.vectorstore.base", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/base.py", + "qualified_name": "starfish/core/image/Filter/", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/image/Filter/", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.vectorstore.faiss", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/faiss.py", + "qualified_name": "starfish/core/image/_registration/", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/image/_registration/", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.vectorstore.mongodb", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/mongodb.py", + "qualified_name": "starfish/core/image/Segment/watershed.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/image/Segment/watershed.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.vectorstore.pgvector", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/vectorstore/pgvector.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.vectorstore.vectorstore_creator", - "reference_file": null, - "reference_start_line": 0, - "reference_end_line": 0 - } - ], - "can_expand": true - }, - { - "name": "Retrieval Module", - "description": "Focuses on fetching the most relevant document chunks from the Vector Database based on user queries, preparing the contextual information required by the LLM.", - "referenced_source_code": [ - { - "qualified_name": "application.retriever.classic_rag", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/classic_rag.py", + "qualified_name": "starfish/core/morphology/Binarize/threshold.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/Binarize/threshold.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.retriever.retriever_creator", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/retriever/retriever_creator.py", + "qualified_name": "starfish/core/morphology/label_image/label_image.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/label_image/label_image.py", "reference_start_line": 0, "reference_end_line": 0 } @@ -181,42 +89,36 @@ "can_expand": true }, { - "name": "LLM Integration Layer", - "description": "Provides a unified abstraction for interacting with diverse Large Language Models (LLMs), managing model selection, message formatting, and handling streaming or batch responses.", + "name": "Spot Analysis Engine", + "description": "Dedicated to identifying, quantifying, and decoding biological spots within processed image data. It integrates spot detection, intensity measurement, and genetic decoding.", "referenced_source_code": [ { - "qualified_name": "application.llm.base", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/base.py", + "qualified_name": "starfish/core/spots/FindSpots/", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/FindSpots/", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.llm.llm_creator", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/llm_creator.py", + "qualified_name": "starfish/core/intensity_table/intensity_table.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/intensity_table/intensity_table.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.llm.openai", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/openai.py", + "qualified_name": "starfish/core/spots/DecodeSpots/", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/DecodeSpots/", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.llm.google_ai", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/google_ai.py", + "qualified_name": "starfish/core/spots/AssignTargets/label.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/AssignTargets/label.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.llm.anthropic", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/anthropic.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.llm.handlers", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/llm/handlers", + "qualified_name": "starfish/core/spots/DetectPixels/", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/spots/DetectPixels/", "reference_start_line": 0, "reference_end_line": 0 } @@ -224,36 +126,24 @@ "can_expand": true }, { - "name": "Agentic Reasoning & External Tools", - "description": "Empowers the LLM to execute complex, multi-step tasks by breaking them down into sub-problems and leveraging a suite of external tools and APIs (e.g., web search, TTS) to gather information or perform actions.", + "name": "Output & Export Layer", + "description": "Handles the serialization and export of all processed data, including transformed image stacks, decoded spot information, and generated masks, into various persistent formats.", "referenced_source_code": [ { - "qualified_name": "application.agents.base", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/base.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.agents.react_agent", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/react_agent.py", + "qualified_name": "starfish/core/imagestack/imagestack.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/imagestack/imagestack.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.agents.agent_creator", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/agent_creator.py", + "qualified_name": "starfish/core/types/_decoded_spots.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/types/_decoded_spots.py", "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.agents.tools", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/agents/tools", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.tts.elevenlabs", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/tts/elevenlabs.py", + "qualified_name": "starfish/core/morphology/binary_mask/_io.py", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/morphology/binary_mask/_io.py", "reference_start_line": 0, "reference_end_line": 0 } @@ -261,24 +151,18 @@ "can_expand": true }, { - "name": "Asynchronous Task Worker", - "description": "Manages and executes long-running or computationally intensive tasks asynchronously (e.g., document ingestion, remote data synchronization, agent webhooks), preventing blocking of the main API.", + "name": "Visualization Utilities", + "description": "Provides tools and functions for generating visual representations of intermediate and final analysis results, aiding in quality control, debugging, and interpretation.", "referenced_source_code": [ { - "qualified_name": "application.worker", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/worker.py", - "reference_start_line": 0, - "reference_end_line": 0 - }, - { - "qualified_name": "application.celery_init", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celery_init.py", + "qualified_name": "starfish/util/plot/", + "reference_file": null, "reference_start_line": 0, "reference_end_line": 0 }, { - "qualified_name": "application.celeryconfig", - "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/DocsGPT/application/celeryconfig.py", + "qualified_name": "starfish/core/_display/", + "reference_file": "/home/ivan/StartUp/CodeBoarding/repos/starfish/starfish/core/_display.py", "reference_start_line": 0, "reference_end_line": 0 } @@ -288,64 +172,49 @@ ], "components_relations": [ { - "relation": "sends User Requests to", - "src_name": "User Interface (UI)", - "dst_name": "Backend Core" - }, - { - "relation": "sends Generated Responses to", - "src_name": "Backend Core", - "dst_name": "User Interface (UI)" - }, - { - "relation": "enqueues Background Tasks to", - "src_name": "Backend Core", - "dst_name": "Asynchronous Task Worker" - }, - { - "relation": "sends User Queries to", - "src_name": "Backend Core", - "dst_name": "Retrieval Module" + "relation": "provides SpaceTx Experiment Data to", + "src_name": "Data Input & Validation Layer", + "dst_name": "Core Data Structures" }, { - "relation": "forwards User Queries & Context to", - "src_name": "Backend Core", - "dst_name": "LLM Integration Layer" + "relation": "supplies Image Stacks to", + "src_name": "Core Data Structures", + "dst_name": "Image Processing Engine" }, { - "relation": "triggers Document Processing & Storage in", - "src_name": "Asynchronous Task Worker", - "dst_name": "Data Ingestion & Storage" + "relation": "provides Codebook to", + "src_name": "Core Data Structures", + "dst_name": "Spot Analysis Engine" }, { - "relation": "stores Embedded Chunks in", - "src_name": "Data Ingestion & Storage", - "dst_name": "Vector Database / Knowledge Base" + "relation": "outputs Processed Image Stacks to", + "src_name": "Image Processing Engine", + "dst_name": "Core Data Structures" }, { - "relation": "queries for Relevant Documents from", - "src_name": "Retrieval Module", - "dst_name": "Vector Database / Knowledge Base" + "relation": "feeds Processed Images to", + "src_name": "Image Processing Engine", + "dst_name": "Spot Analysis Engine" }, { - "relation": "requests Context from", - "src_name": "LLM Integration Layer", - "dst_name": "Retrieval Module" + "relation": "generates Binary Masks for", + "src_name": "Image Processing Engine", + "dst_name": "Output & Export Layer" }, { - "relation": "delegates Tool Execution to", - "src_name": "LLM Integration Layer", - "dst_name": "Agentic Reasoning & External Tools" + "relation": "outputs Decoded Spot Data & Intensity Tables to", + "src_name": "Spot Analysis Engine", + "dst_name": "Output & Export Layer" }, { - "relation": "returns Tool Results to", - "src_name": "Agentic Reasoning & External Tools", - "dst_name": "LLM Integration Layer" + "relation": "provides Data for Visualization to", + "src_name": "Core Data Structures", + "dst_name": "Visualization Utilities" }, { - "relation": "sends Generated Answers to", - "src_name": "LLM Integration Layer", - "dst_name": "Backend Core" + "relation": "provides Decoded Spots for Plotting to", + "src_name": "Spot Analysis Engine", + "dst_name": "Visualization Utilities" } ] } \ No newline at end of file diff --git a/.codeboarding/codeboarding_version.json b/.codeboarding/codeboarding_version.json index 0c496dc7..34f4c526 100644 --- a/.codeboarding/codeboarding_version.json +++ b/.codeboarding/codeboarding_version.json @@ -1,4 +1,4 @@ { - "commit_hash": "c68273706ca6b3e1bf7f7fb67b00a2c84bf9ad2c", + "commit_hash": "43762625fe9ef2497f5aa5ff9e6d7c33b88290fd", "code_boarding_version": "0.1.0" } \ No newline at end of file diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md index 49716e40..95f67281 100644 --- a/.codeboarding/on_boarding.md +++ b/.codeboarding/on_boarding.md @@ -1,144 +1,100 @@ ```mermaid graph LR - User_Interface_UI_["User Interface (UI)"] - Backend_Core["Backend Core"] - Data_Ingestion_Storage["Data Ingestion & Storage"] - Vector_Database_Knowledge_Base["Vector Database / Knowledge Base"] - Retrieval_Module["Retrieval Module"] - LLM_Integration_Layer["LLM Integration Layer"] - Agentic_Reasoning_External_Tools["Agentic Reasoning & External Tools"] - Asynchronous_Task_Worker["Asynchronous Task Worker"] - User_Interface_UI_ -- "sends User Requests to" --> Backend_Core - Backend_Core -- "sends Generated Responses to" --> User_Interface_UI_ - Backend_Core -- "enqueues Background Tasks to" --> Asynchronous_Task_Worker - Backend_Core -- "sends User Queries to" --> Retrieval_Module - Backend_Core -- "forwards User Queries & Context to" --> LLM_Integration_Layer - Asynchronous_Task_Worker -- "triggers Document Processing & Storage in" --> Data_Ingestion_Storage - Data_Ingestion_Storage -- "stores Embedded Chunks in" --> Vector_Database_Knowledge_Base - Retrieval_Module -- "queries for Relevant Documents from" --> Vector_Database_Knowledge_Base - LLM_Integration_Layer -- "requests Context from" --> Retrieval_Module - LLM_Integration_Layer -- "delegates Tool Execution to" --> Agentic_Reasoning_External_Tools - Agentic_Reasoning_External_Tools -- "returns Tool Results to" --> LLM_Integration_Layer - LLM_Integration_Layer -- "sends Generated Answers to" --> Backend_Core - click User_Interface_UI_ href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/User_Interface_UI_.md" "Details" - click Backend_Core href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Backend_Core.md" "Details" - click Data_Ingestion_Storage href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Data_Ingestion_Storage.md" "Details" - click Vector_Database_Knowledge_Base href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Vector_Database_Knowledge_Base.md" "Details" - click Retrieval_Module href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Retrieval_Module.md" "Details" - click LLM_Integration_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/LLM_Integration_Layer.md" "Details" - click Agentic_Reasoning_External_Tools href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Agentic_Reasoning_External_Tools.md" "Details" - click Asynchronous_Task_Worker href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/DocsGPT/Asynchronous_Task_Worker.md" "Details" + Data_Input_Validation_Layer["Data Input & Validation Layer"] + Core_Data_Structures["Core Data Structures"] + Image_Processing_Engine["Image Processing Engine"] + Spot_Analysis_Engine["Spot Analysis Engine"] + Output_Export_Layer["Output & Export Layer"] + Visualization_Utilities["Visualization Utilities"] + Data_Input_Validation_Layer -- "provides SpaceTx Experiment Data to" --> Core_Data_Structures + Core_Data_Structures -- "supplies Image Stacks to" --> Image_Processing_Engine + Core_Data_Structures -- "provides Codebook to" --> Spot_Analysis_Engine + Image_Processing_Engine -- "outputs Processed Image Stacks to" --> Core_Data_Structures + Image_Processing_Engine -- "feeds Processed Images to" --> Spot_Analysis_Engine + Image_Processing_Engine -- "generates Binary Masks for" --> Output_Export_Layer + Spot_Analysis_Engine -- "outputs Decoded Spot Data & Intensity Tables to" --> Output_Export_Layer + Core_Data_Structures -- "provides Data for Visualization to" --> Visualization_Utilities + Spot_Analysis_Engine -- "provides Decoded Spots for Plotting to" --> Visualization_Utilities + click Data_Input_Validation_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Data_Input_Validation_Layer.md" "Details" + click Core_Data_Structures href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Core_Data_Structures.md" "Details" + click Image_Processing_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Image_Processing_Engine.md" "Details" + click Spot_Analysis_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Spot_Analysis_Engine.md" "Details" + click Output_Export_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Output_Export_Layer.md" "Details" + click Visualization_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Visualization_Utilities.md" "Details" ``` [![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org) ## Details -DocsGPT operates on a clear client-server architecture, with the User Interface (UI) serving as the primary interaction point. User requests are sent to the Backend Core, which acts as the central orchestrator. The Backend Core handles routing, authentication, and core application logic. For long-running operations like document ingestion, tasks are enqueued to the Asynchronous Task Worker. +The Starfish project is structured around a robust data processing pipeline for spatial transcriptomics. It begins with a `Data Input & Validation Layer` responsible for ingesting and validating raw experimental data against the SpaceTx format. This validated data is then transformed into `Core Data Structures`, primarily multi-dimensional image stacks and codebooks, which serve as the central in-memory representation. The `Image Processing Engine` operates on these image stacks, applying various algorithms for enhancement and transformation. Concurrently, the `Spot Analysis Engine` identifies, quantifies, and decodes biological spots using both the processed images and the codebook. Finally, the `Output & Export Layer` handles the serialization and persistence of all processed data, while `Visualization Utilities` provide crucial tools for inspecting intermediate and final results. This modular design ensures clear separation of concerns and facilitates a streamlined data flow from raw input to interpretable biological insights. -When a user query requires information retrieval, the Backend Core interacts with the Retrieval Module, which in turn queries the Vector Database / Knowledge Base to fetch relevant document chunks. The retrieved context, along with the user's query, is then forwarded to the LLM Integration Layer. This layer provides a unified interface for various Large Language Models. For complex tasks, the LLM Integration Layer can delegate to the Agentic Reasoning & External Tools component, which leverages external tools and APIs to fulfill the request. - -The Data Ingestion & Storage component is responsible for processing and storing documents, including parsing, chunking, and embedding, before they are stored in the Vector Database / Knowledge Base. Finally, the LLM Integration Layer sends the generated answers back to the Backend Core, which then relays them to the User Interface. This architecture ensures a scalable, modular, and efficient flow of data and operations within the DocsGPT system. - -### User Interface (UI) [[Expand]](./User_Interface_UI_.md) -The interactive frontend for users to engage with DocsGPT, encompassing chat functionalities, document management, and application settings. - - -**Related Classes/Methods**: - -- `frontend.src.App` -- `frontend.src.conversation.Conversation` -- `frontend.src.upload.Upload` -- `frontend.src.settings.index` -- `extensions.react_widget.src.components.DocsGPTWidget` - - -### Backend Core [[Expand]](./Backend_Core.md) -Acts as the central entry point for all frontend requests, routing them to appropriate backend services, and managing core application logic, authentication, and configuration. - - -**Related Classes/Methods**: - -- `application.app` -- `application.api.answer.routes` -- `application.api.user.routes` -- `application.auth` -- `application.core.settings` - - -### Data Ingestion & Storage [[Expand]](./Data_Ingestion_Storage.md) -Handles the entire lifecycle of data preparation, including loading, parsing, chunking, and embedding various data sources, and manages the persistent storage and retrieval of raw and processed files. +### Data Input & Validation Layer [[Expand]](./Data_Input_Validation_Layer.md) +Responsible for loading raw experimental data and metadata, ensuring it conforms to the SpaceTx format. It acts as the initial entry point for all data processing pipelines. **Related Classes/Methods**: -- `application.parser.embedding_pipeline` -- `application.parser.chunking` -- `application.parser.file` -- `application.parser.remote` -- `application.storage.storage_creator` -- `application.storage.s3` -- `application.storage.local` +- `starfish/core/experiment/experiment.py` +- `starfish/core/experiment/builder/builder.py` +- `starfish/core/spacetx_format/validate_sptx.py` -### Vector Database / Knowledge Base [[Expand]](./Vector_Database_Knowledge_Base.md) -Serves as the persistent storage for embedded document chunks, enabling efficient semantic search and acting as the system's primary knowledge repository. Supports multiple backend implementations. +### Core Data Structures [[Expand]](./Core_Data_Structures.md) +Defines and manages the fundamental in-memory data structures used throughout the Starfish pipeline, primarily multi-dimensional image stacks and codebooks. **Related Classes/Methods**: -- `application.vectorstore.base` -- `application.vectorstore.faiss` -- `application.vectorstore.mongodb` -- `application.vectorstore.pgvector` +- `starfish/core/imagestack/imagestack.py` +- `starfish/core/imagestack/parser/` +- `starfish/core/codebook/codebook.py` -### Retrieval Module [[Expand]](./Retrieval_Module.md) -Focuses on fetching the most relevant document chunks from the Vector Database based on user queries, preparing the contextual information required by the LLM. +### Image Processing Engine [[Expand]](./Image_Processing_Engine.md) +Applies various algorithms to transform and enhance image data, including filtering, registration, and basic segmentation operations. **Related Classes/Methods**: -- `application.retriever.classic_rag` -- `application.retriever.retriever_creator` +- `starfish/core/image/Filter/` +- `starfish/core/image/_registration/` +- `starfish/core/image/Segment/watershed.py` +- `starfish/core/morphology/Binarize/threshold.py` +- `starfish/core/morphology/label_image/label_image.py` -### LLM Integration Layer [[Expand]](./LLM_Integration_Layer.md) -Provides a unified abstraction for interacting with diverse Large Language Models (LLMs), managing model selection, message formatting, and handling streaming or batch responses. +### Spot Analysis Engine [[Expand]](./Spot_Analysis_Engine.md) +Dedicated to identifying, quantifying, and decoding biological spots within processed image data. It integrates spot detection, intensity measurement, and genetic decoding. **Related Classes/Methods**: -- `application.llm.base` -- `application.llm.llm_creator` -- `application.llm.openai` -- `application.llm.google_ai` -- `application.llm.anthropic` -- `application.llm.handlers` +- `starfish/core/spots/FindSpots/` +- `starfish/core/intensity_table/intensity_table.py` +- `starfish/core/spots/DecodeSpots/` +- `starfish/core/spots/AssignTargets/label.py` +- `starfish/core/spots/DetectPixels/` -### Agentic Reasoning & External Tools [[Expand]](./Agentic_Reasoning_External_Tools.md) -Empowers the LLM to execute complex, multi-step tasks by breaking them down into sub-problems and leveraging a suite of external tools and APIs (e.g., web search, TTS) to gather information or perform actions. +### Output & Export Layer [[Expand]](./Output_Export_Layer.md) +Handles the serialization and export of all processed data, including transformed image stacks, decoded spot information, and generated masks, into various persistent formats. **Related Classes/Methods**: -- `application.agents.base` -- `application.agents.react_agent` -- `application.agents.agent_creator` -- `application.agents.tools` -- `application.tts.elevenlabs` +- `starfish/core/imagestack/imagestack.py` +- `starfish/core/types/_decoded_spots.py` +- `starfish/core/morphology/binary_mask/_io.py` -### Asynchronous Task Worker [[Expand]](./Asynchronous_Task_Worker.md) -Manages and executes long-running or computationally intensive tasks asynchronously (e.g., document ingestion, remote data synchronization, agent webhooks), preventing blocking of the main API. +### Visualization Utilities [[Expand]](./Visualization_Utilities.md) +Provides tools and functions for generating visual representations of intermediate and final analysis results, aiding in quality control, debugging, and interpretation. **Related Classes/Methods**: -- `application.worker` -- `application.celery_init` -- `application.celeryconfig` +- `starfish/core/_display/` From 04865c15af244806a88b8ea1a7d5d860885eeb75 Mon Sep 17 00:00:00 2001 From: ivanmilevtues Date: Wed, 13 Aug 2025 12:03:18 +0200 Subject: [PATCH 4/4] Updated links --- .codeboarding/on_boarding.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/.codeboarding/on_boarding.md b/.codeboarding/on_boarding.md index 95f67281..f5e601bb 100644 --- a/.codeboarding/on_boarding.md +++ b/.codeboarding/on_boarding.md @@ -15,12 +15,12 @@ graph LR Spot_Analysis_Engine -- "outputs Decoded Spot Data & Intensity Tables to" --> Output_Export_Layer Core_Data_Structures -- "provides Data for Visualization to" --> Visualization_Utilities Spot_Analysis_Engine -- "provides Decoded Spots for Plotting to" --> Visualization_Utilities - click Data_Input_Validation_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Data_Input_Validation_Layer.md" "Details" - click Core_Data_Structures href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Core_Data_Structures.md" "Details" - click Image_Processing_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Image_Processing_Engine.md" "Details" - click Spot_Analysis_Engine href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Spot_Analysis_Engine.md" "Details" - click Output_Export_Layer href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Output_Export_Layer.md" "Details" - click Visualization_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main/starfish/Visualization_Utilities.md" "Details" + click Data_Input_Validation_Layer href "https://github.com/spacetx/starfish/blob/main/starfish/Data_Input_Validation_Layer.md" "Details" + click Core_Data_Structures href "https://github.com/spacetx/starfish/blob/main/starfish/Core_Data_Structures.md" "Details" + click Image_Processing_Engine href "https://github.com/spacetx/starfish/blob/main/starfish/Image_Processing_Engine.md" "Details" + click Spot_Analysis_Engine href "https://github.com/spacetx/starfish/blob/main/starfish/Spot_Analysis_Engine.md" "Details" + click Output_Export_Layer href "https://github.com/spacetx/starfish/blob/main/starfish/Output_Export_Layer.md" "Details" + click Visualization_Utilities href "https://github.com/spacetx/starfish/blob/main/starfish/Visualization_Utilities.md" "Details" ``` [![CodeBoarding](https://img.shields.io/badge/Generated%20by-CodeBoarding-9cf?style=flat-square)](https://github.com/CodeBoarding/GeneratedOnBoardings)[![Demo](https://img.shields.io/badge/Try%20our-Demo-blue?style=flat-square)](https://www.codeboarding.org/demo)[![Contact](https://img.shields.io/badge/Contact%20us%20-%20contact@codeboarding.org-lightgrey?style=flat-square)](mailto:contact@codeboarding.org)