Lollms Text-to-Image (TTI) Module Documentation

Objective: To provide a standardized way to integrate various Text-to-Image generation services (APIs, local models, etc.) into the Lollms ecosystem.

Core Idea: Lollms uses a modular approach. Each specific TTI implementation (like Stable Diffusion via Automatic1111, ComfyUI, an online API like DALL-E, Midjourney, or Google Imagen/Gemini) is contained within its own binding or module. These modules all inherit from a common base class, LollmsTTI, ensuring they provide a consistent interface for the main Lollms application.


1. The LollmsTTI Base Class

The lollms.tti.LollmsTTI class serves as the foundation for all TTI modules. It defines the essential methods and properties that Lollms expects any TTI service to provide.

  • Inheritance:LollmsTTI inherits from lollms.service.LollmsSERVICE. This is important because LollmsSERVICE provides the basic framework for any Lollms service, including:
    • Access to the main LollmsApplication instance (self.app).
    • Handling of configuration via TypedConfig (self.service_config).
    • A unique name for the service.
    • Standard logging methods via self.app (e.g., self.app.info, self.app.error).
  • Key Purpose: To define a contract. Any class inheriting from LollmsTTI must implement (or override) certain methods to be compatible with Lollms’ image generation features.
  • Initialization (__init__):
    • name: str: A unique identifier for this specific TTI service (e.g., “google_gemini”, “automatic1111”, “comfyui”).
    • app: LollmsApplication: The main Lollms application instance, providing access to global settings, paths, logging, etc.
    • service_config: TypedConfig: An object holding the specific configuration settings for this TTI service (e.g., API keys, model paths, default parameters). This is usually created from a ConfigTemplate.
    • output_folder: str | Path | None: Specifies where generated images should be saved. If None, it defaults to a standard location within the Lollms personal outputs path (lollms_paths.personal_outputs_path / name). The constructor ensures this folder exists.
  • Core Generation Methods:
    • paint(...): The primary method for standard text-to-image generation.
      • Input: positive_prompt, negative_prompt, and common Stable Diffusion-style parameters (sampler_name, seed, scale, steps, width, height). It also takes optional output_folder and output_file_name overrides.
      • Implementation Detail: Crucially, not all TTI backends support all these parameters. A specific module (like LollmsGoogleGemini) might ignore parameters like seed, steps, sampler_name, or derive width/height from an aspect_ratio setting, as seen in the example. The implementation should handle the parameters relevant to its backend API or model.
      • Output: Tuple[Path | None, Dict | None]. A tuple containing the Path to the first successfully generated image and a Dict with its corresponding metadata, OR (None, error_dict) on failure.
    • paint_from_images(...): Intended for image-to-image, inpainting, or image variation tasks.
      • Input: positive_prompt, a List[str] of input image file paths (images), negative_prompt, and the same optional parameters as paint.
      • Implementation Detail: Similar to paint, the specific module must adapt this to its backend’s capabilities. Some backends might only use the first image, others might support multiple. The LollmsGoogleGemini example shows using the first input image with Gemini for image+text prompting.
      • Output: Same format as paint: Tuple[Path | None, Dict | None]. Returns the path to the first successfully generated output image and its metadata, or (None, error_dict) on failure or if the backend doesn’t support image-to-image.
  • Static Utility Methods: These methods are called by Lollms before an instance of the TTI module is necessarily created. They operate at the class level.
    • verify(app: LollmsApplication) -> bool: Checks if the necessary prerequisites for this TTI module are met. This usually involves checking if required libraries are installed (e.g., google-generativeai for the Gemini example) or if essential configuration (like an API key placeholder) exists. It often uses helper libraries like PackageManager. Should return True if usable, False otherwise.
    • install(app: LollmsApplication) -> bool: Attempts to install any missing prerequisites identified by verify. Typically uses PackageManager.install_package. Should return True on success, False on failure.
    • get(app: LollmsApplication, config: Optional[dict]=None, lollms_paths: Optional[LollmsPaths]=None) -> 'LollmsTTI': A factory method. Lollms calls this static method to get an actual instance of the specific TTI class (e.g., LollmsGoogleGemini). It passes the app instance and potentially pre-loaded configuration (config) and paths (lollms_paths).

2. How TTI Modules are Used by Lollms

  1. Discovery: Lollms scans designated directories for bindings/modules.
  2. Verification & Installation: For each potential TTI module found, Lollms might call its static verify method. If verification fails and installation is requested or automatic, it calls the static install method.
  3. Listing: Verified modules are presented to the user as available TTI services.
  4. Selection & Configuration: The user selects a TTI service. Lollms loads its configuration template (ConfigTemplate) and presents the settings (like API key, model choice, etc.) to the user. User inputs are saved.
  5. Instantiation: When the user wants to generate an image using the selected service, Lollms calls the static get method of the chosen module’s class, passing the app instance and the loaded service config. This returns an active instance (e.g., an instance of LollmsGoogleGemini).
  6. Generation Request: When the user provides prompts (and potentially other parameters or input images), Lollms calls the appropriate method on the instantiated object:
    • For text-to-image: instance.paint(...)
    • For image-to-image/variation: instance.paint_from_images(...)
  7. Execution: The module’s paint or paint_from_images method executes:
    • It potentially initializes its specific client or loads its model if not already done (as seen in _initialize_client in the example).
    • It translates the Lollms parameters (prompts, dimensions, etc.) into the format required by its backend (API call parameters, model inference arguments). It may ignore unsupported Lollms parameters.
    • It communicates with the backend (makes the API call, runs the inference).
    • It receives the image data (e.g., bytes, base64). Even if the backend generates multiple images, the implementation should focus on processing and saving the first one for the standard return value.
    • It saves the first image’s data to a file in the designated output folder (using helpers like find_next_available_filename if no specific name is given). PIL (Pillow) is commonly used for saving.
    • It constructs the metadata dictionary for the saved image.
    • It returns the tuple (saved_path, metadata_dict) or (None, error_dict) back to Lollms.
  8. Display: Lollms receives the tuple. If the first element (image_path) is not None, it displays the generated image and potentially the metadata. If it’s None, it displays the error message from the second element (error_dict).

3. Developing a New TTI Module

Here’s a step-by-step guide based on the LollmsGoogleGemini example:

  1. Create the Module File: Create a Python file in the appropriate Lollms bindings directory (e.g., bindings/google_gemini/binding.py). Add header comments detailing the binding (Title, Author, Licence, Description, Requirements).
  2. Import Necessary Libraries: Import LollmsTTI, LollmsApplication, Path, List, Dict, Optional, Tuple, configuration classes (TypedConfig, ConfigTemplate), utility functions (find_next_available_filename, ASCIIColors, trace_exception), and any libraries required for your specific TTI backend (e.g., requests, google.generativeai, PIL, diffusers, etc.).
  3. Handle Dependencies:
    • Use a try...except ImportError block to check if the required backend libraries are installed.
    • If missing, use ASCIIColors.info to inform the user and PackageManager.install_package to install them. Re-import after installation or raise an error if it fails. This ensures the binding doesn’t crash Lollms if dependencies are missing initially.
  4. Define the Class: Create a class that inherits from LollmsTTI. from lollms.tti import LollmsTTI # ... other imports class LollmsMyTTIService(LollmsTTI): # ... implementation ...
  5. Implement __init__:
    • Define the __init__ method accepting app, service_config, and optional lollms_paths (if needed beyond what app provides) or config dict.
    • Define the ConfigTemplate for your service’s settings (API keys, models, defaults).
    • Create the TypedConfig instance: service_config = TypedConfig(service_config_template, config or {}).
    • Crucially, call the parent constructor: super().__init__("my_tti_service_name", app, service_config, output_folder). Ensure the name is unique. Handle the output_folder logic as in the base class or example.
    • Initialize any state needed for your service (e.g., set API clients to None, load default values from service_config).
    • Optionally, attempt to initialize the backend client immediately if configuration allows (like the Gemini example checking for an API key).
  6. Implement paint:
    • Define the paint method with the -> Tuple[Path | None, Dict | None] signature.
    • Add logic to ensure your backend client/model is initialized (call an internal _initialize_client or similar helper if needed). Handle initialization failures by returning (None, {"error": ...}).
    • Retrieve necessary parameters from self.service_config (e.g., model name, specific quality settings).
    • Translate the input parameters (positive_prompt, negative_prompt, etc.) into the format expected by your backend API or model. Handle unsupported parameters and combine prompts as needed.
    • Make the call to your TTI backend within a try...except block. Catch specific API/model errors and generic exceptions. In except blocks, log the error using self.app.error and trace_exception, then return (None, {"error": f"Descriptive error message: {e}"}).
    • Process the response. Extract the image data for the first image received.
    • Determine the output filename (use output_file_name if provided, otherwise generate one using find_next_available_filename).
    • Save the image data to a file using PIL or another library. Ensure the output folder exists (self._ensure_output_folder). If saving fails, return (None, {"error": "Failed to save image"}).
    • Create the metadata dictionary for the saved image.
    • Return (saved_file_path, metadata). If processing fails after the API call but before saving, return (None, {"error": "Failed to process image data"}).
  7. Implement paint_from_images:
    • Define the paint_from_images method with the -> Tuple[Path | None, Dict | None] signature.
    • Check if your backend supports image inputs. If not, log an error and return (None, {"error": "Backend does not support image-to-image"}).
    • Load the first input image from images[0]. Handle errors (file not found, load errors) by returning (None, {"error": ...}).
    • Follow similar steps as paint: initialize client, translate parameters, make the backend call (providing the image data), handle errors (returning (None, error_dict)), process the response for the first output image, save it, create metadata (include input image info), and return (saved_file_path, metadata).
  8. Implement Static Methods:
    • verify(app: LollmsApplication) -> bool: Implement the check for dependencies (e.g., return PackageManager.check_package_installed("my_required_library")).
    • install(app: LollmsApplication) -> bool: Implement the installation command (e.g., return PackageManager.install_package("my_required_library")).
    • get(app: LollmsApplication, config: Optional[dict]=None, lollms_paths: Optional[LollmsPaths]=None) -> 'LollmsMyTTIService': Implement the factory method: return LollmsMyTTIService(app=app, config=config, lollms_paths=lollms_paths).
  9. Helper Methods (Optional but Recommended):
    • Create private helper methods like _initialize_client, _ensure_output_folder, etc., to keep paint and paint_from_images cleaner, as shown in the Gemini example.
    • Consider a settings_updated method if client re-initialization is needed when settings (like API key) change via the UI.
  10. Testing: Test thoroughly with different prompts, negative prompts, parameters, error conditions (invalid API key, network down, invalid input image), and ensure the correct tuple (Path, Dict) or (None, Dict) is returned in all scenarios.