Objective: To provide a standardized way to integrate various Text-to-Image generation services (APIs, local models, etc.) into the Lollms ecosystem.
Core Idea: Lollms uses a modular approach. Each specific TTI implementation (like Stable Diffusion via Automatic1111, ComfyUI, an online API like DALL-E, Midjourney, or Google Imagen/Gemini) is contained within its own binding or module. These modules all inherit from a common base class, LollmsTTI
, ensuring they provide a consistent interface for the main Lollms application.
1. The LollmsTTI
Base Class
The lollms.tti.LollmsTTI
class serves as the foundation for all TTI modules. It defines the essential methods and properties that Lollms expects any TTI service to provide.
- Inheritance:
LollmsTTI
inherits fromlollms.service.LollmsSERVICE
. This is important becauseLollmsSERVICE
provides the basic framework for any Lollms service, including:- Access to the main
LollmsApplication
instance (self.app
). - Handling of configuration via
TypedConfig
(self.service_config
). - A unique
name
for the service. - Standard logging methods via
self.app
(e.g.,self.app.info
,self.app.error
).
- Access to the main
- Key Purpose: To define a contract. Any class inheriting from
LollmsTTI
must implement (or override) certain methods to be compatible with Lollms’ image generation features. - Initialization (
__init__
):name: str
: A unique identifier for this specific TTI service (e.g., “google_gemini”, “automatic1111”, “comfyui”).app: LollmsApplication
: The main Lollms application instance, providing access to global settings, paths, logging, etc.service_config: TypedConfig
: An object holding the specific configuration settings for this TTI service (e.g., API keys, model paths, default parameters). This is usually created from aConfigTemplate
.output_folder: str | Path | None
: Specifies where generated images should be saved. IfNone
, it defaults to a standard location within the Lollms personal outputs path (lollms_paths.personal_outputs_path / name
). The constructor ensures this folder exists.
- Core Generation Methods:
paint(...)
: The primary method for standard text-to-image generation.- Input:
positive_prompt
,negative_prompt
, and common Stable Diffusion-style parameters (sampler_name
,seed
,scale
,steps
,width
,height
). It also takes optionaloutput_folder
andoutput_file_name
overrides. - Implementation Detail: Crucially, not all TTI backends support all these parameters. A specific module (like
LollmsGoogleGemini
) might ignore parameters likeseed
,steps
,sampler_name
, or derivewidth
/height
from anaspect_ratio
setting, as seen in the example. The implementation should handle the parameters relevant to its backend API or model. - Output:
Tuple[Path | None, Dict | None]
. A tuple containing thePath
to the first successfully generated image and aDict
with its corresponding metadata, OR(None, error_dict)
on failure.
- Input:
paint_from_images(...)
: Intended for image-to-image, inpainting, or image variation tasks.- Input:
positive_prompt
, aList[str]
of input image file paths (images
),negative_prompt
, and the same optional parameters aspaint
. - Implementation Detail: Similar to
paint
, the specific module must adapt this to its backend’s capabilities. Some backends might only use the first image, others might support multiple. TheLollmsGoogleGemini
example shows using the first input image with Gemini for image+text prompting. - Output: Same format as
paint
:Tuple[Path | None, Dict | None]
. Returns the path to the first successfully generated output image and its metadata, or(None, error_dict)
on failure or if the backend doesn’t support image-to-image.
- Input:
- Static Utility Methods: These methods are called by Lollms before an instance of the TTI module is necessarily created. They operate at the class level.
verify(app: LollmsApplication) -> bool
: Checks if the necessary prerequisites for this TTI module are met. This usually involves checking if required libraries are installed (e.g.,google-generativeai
for the Gemini example) or if essential configuration (like an API key placeholder) exists. It often uses helper libraries likePackageManager
. Should returnTrue
if usable,False
otherwise.install(app: LollmsApplication) -> bool
: Attempts to install any missing prerequisites identified byverify
. Typically usesPackageManager.install_package
. Should returnTrue
on success,False
on failure.get(app: LollmsApplication, config: Optional[dict]=None, lollms_paths: Optional[LollmsPaths]=None) -> 'LollmsTTI'
: A factory method. Lollms calls this static method to get an actual instance of the specific TTI class (e.g.,LollmsGoogleGemini
). It passes theapp
instance and potentially pre-loaded configuration (config
) and paths (lollms_paths
).
2. How TTI Modules are Used by Lollms
- Discovery: Lollms scans designated directories for bindings/modules.
- Verification & Installation: For each potential TTI module found, Lollms might call its static
verify
method. If verification fails and installation is requested or automatic, it calls the staticinstall
method. - Listing: Verified modules are presented to the user as available TTI services.
- Selection & Configuration: The user selects a TTI service. Lollms loads its configuration template (
ConfigTemplate
) and presents the settings (like API key, model choice, etc.) to the user. User inputs are saved. - Instantiation: When the user wants to generate an image using the selected service, Lollms calls the static
get
method of the chosen module’s class, passing theapp
instance and the loaded serviceconfig
. This returns an active instance (e.g., an instance ofLollmsGoogleGemini
). - Generation Request: When the user provides prompts (and potentially other parameters or input images), Lollms calls the appropriate method on the instantiated object:
- For text-to-image:
instance.paint(...)
- For image-to-image/variation:
instance.paint_from_images(...)
- For text-to-image:
- Execution: The module’s
paint
orpaint_from_images
method executes:- It potentially initializes its specific client or loads its model if not already done (as seen in
_initialize_client
in the example). - It translates the Lollms parameters (prompts, dimensions, etc.) into the format required by its backend (API call parameters, model inference arguments). It may ignore unsupported Lollms parameters.
- It communicates with the backend (makes the API call, runs the inference).
- It receives the image data (e.g., bytes, base64). Even if the backend generates multiple images, the implementation should focus on processing and saving the first one for the standard return value.
- It saves the first image’s data to a file in the designated output folder (using helpers like
find_next_available_filename
if no specific name is given). PIL (Pillow) is commonly used for saving. - It constructs the metadata dictionary for the saved image.
- It returns the tuple
(saved_path, metadata_dict)
or(None, error_dict)
back to Lollms.
- It potentially initializes its specific client or loads its model if not already done (as seen in
- Display: Lollms receives the tuple. If the first element (
image_path
) is notNone
, it displays the generated image and potentially the metadata. If it’sNone
, it displays the error message from the second element (error_dict
).
3. Developing a New TTI Module
Here’s a step-by-step guide based on the LollmsGoogleGemini
example:
- Create the Module File: Create a Python file in the appropriate Lollms bindings directory (e.g.,
bindings/google_gemini/binding.py
). Add header comments detailing the binding (Title, Author, Licence, Description, Requirements). - Import Necessary Libraries: Import
LollmsTTI
,LollmsApplication
,Path
,List
,Dict
,Optional
,Tuple
, configuration classes (TypedConfig
,ConfigTemplate
), utility functions (find_next_available_filename
,ASCIIColors
,trace_exception
), and any libraries required for your specific TTI backend (e.g.,requests
,google.generativeai
,PIL
,diffusers
, etc.). - Handle Dependencies:
- Use a
try...except ImportError
block to check if the required backend libraries are installed. - If missing, use
ASCIIColors.info
to inform the user andPackageManager.install_package
to install them. Re-import after installation or raise an error if it fails. This ensures the binding doesn’t crash Lollms if dependencies are missing initially.
- Use a
- Define the Class: Create a class that inherits from
LollmsTTI
.from lollms.tti import LollmsTTI # ... other imports class LollmsMyTTIService(LollmsTTI): # ... implementation ...
- Implement
__init__
:- Define the
__init__
method acceptingapp
,service_config
, and optionallollms_paths
(if needed beyond whatapp
provides) orconfig
dict. - Define the
ConfigTemplate
for your service’s settings (API keys, models, defaults). - Create the
TypedConfig
instance:service_config = TypedConfig(service_config_template, config or {})
. - Crucially, call the parent constructor:
super().__init__("my_tti_service_name", app, service_config, output_folder)
. Ensure thename
is unique. Handle theoutput_folder
logic as in the base class or example. - Initialize any state needed for your service (e.g., set API clients to
None
, load default values fromservice_config
). - Optionally, attempt to initialize the backend client immediately if configuration allows (like the Gemini example checking for an API key).
- Define the
- Implement
paint
:- Define the
paint
method with the-> Tuple[Path | None, Dict | None]
signature. - Add logic to ensure your backend client/model is initialized (call an internal
_initialize_client
or similar helper if needed). Handle initialization failures by returning(None, {"error": ...})
. - Retrieve necessary parameters from
self.service_config
(e.g., model name, specific quality settings). - Translate the input parameters (
positive_prompt
,negative_prompt
, etc.) into the format expected by your backend API or model. Handle unsupported parameters and combine prompts as needed. - Make the call to your TTI backend within a
try...except
block. Catch specific API/model errors and generic exceptions. Inexcept
blocks, log the error usingself.app.error
andtrace_exception
, thenreturn (None, {"error": f"Descriptive error message: {e}"})
. - Process the response. Extract the image data for the first image received.
- Determine the output filename (use
output_file_name
if provided, otherwise generate one usingfind_next_available_filename
). - Save the image data to a file using
PIL
or another library. Ensure the output folder exists (self._ensure_output_folder
). If saving fails, return(None, {"error": "Failed to save image"})
. - Create the
metadata
dictionary for the saved image. - Return
(saved_file_path, metadata)
. If processing fails after the API call but before saving, return(None, {"error": "Failed to process image data"})
.
- Define the
- Implement
paint_from_images
:- Define the
paint_from_images
method with the-> Tuple[Path | None, Dict | None]
signature. - Check if your backend supports image inputs. If not, log an error and return
(None, {"error": "Backend does not support image-to-image"})
. - Load the first input image from
images[0]
. Handle errors (file not found, load errors) by returning(None, {"error": ...})
. - Follow similar steps as
paint
: initialize client, translate parameters, make the backend call (providing the image data), handle errors (returning(None, error_dict)
), process the response for the first output image, save it, create metadata (include input image info), and return(saved_file_path, metadata)
.
- Define the
- Implement Static Methods:
verify(app: LollmsApplication) -> bool
: Implement the check for dependencies (e.g.,return PackageManager.check_package_installed("my_required_library")
).install(app: LollmsApplication) -> bool
: Implement the installation command (e.g.,return PackageManager.install_package("my_required_library")
).get(app: LollmsApplication, config: Optional[dict]=None, lollms_paths: Optional[LollmsPaths]=None) -> 'LollmsMyTTIService'
: Implement the factory method:return LollmsMyTTIService(app=app, config=config, lollms_paths=lollms_paths)
.
- Helper Methods (Optional but Recommended):
- Create private helper methods like
_initialize_client
,_ensure_output_folder
, etc., to keeppaint
andpaint_from_images
cleaner, as shown in the Gemini example. - Consider a
settings_updated
method if client re-initialization is needed when settings (like API key) change via the UI.
- Create private helper methods like
- Testing: Test thoroughly with different prompts, negative prompts, parameters, error conditions (invalid API key, network down, invalid input image), and ensure the correct tuple
(Path, Dict)
or(None, Dict)
is returned in all scenarios.