Embedding API

Note

This page is generated automatically from the repository’s maintained Python module inventory.

Embedding providers, payload preparation, and the embedding service layer.

lalandre_embedding

Source: packages/lalandre_embedding/lalandre_embedding/__init__.py

Embedding package public API.

lalandre_embedding.base

Source: packages/lalandre_embedding/lalandre_embedding/base.py

Base class for embedding providers

class lalandre_embedding.base.EmbeddingProvider[source]

Bases: ABC

Abstract base class for embedding providers

abstractmethod embed_text(text)[source]

Generate embedding for a single text

Parameters:

text (str)

Return type:

List[float]

abstractmethod embed_batch(texts)[source]

Generate embeddings for multiple texts

Parameters:

texts (List[str])

Return type:

List[List[float]]

abstractmethod get_vector_size()[source]

Get the dimension of the embedding vectors

Return type:

int

estimate_tokens(text)[source]

Return a provider-native token count when available.

Parameters:

text (str)

Return type:

int | None

get_max_input_tokens()[source]

Return the provider-native max input length when known.

Return type:

int | None

class lalandre_embedding.base.SupportsCacheSize[source]

Bases: ABC

Protocol-like mixin for providers exposing cache occupancy.

abstractmethod get_cache_size()[source]

Return the current number of cached embeddings.

Return type:

int

class lalandre_embedding.base.SupportsNumKeys[source]

Bases: ABC

Protocol-like mixin for providers exposing API-key fan-out.

abstractmethod get_num_keys()[source]

Return the number of configured upstream API keys.

Return type:

int

lalandre_embedding.pipeline

Source: packages/lalandre_embedding/lalandre_embedding/pipeline.py

Reusable business logic for vector embedding.

Contains payload construction, batched embedding, mean-pooling, and incremental state tracking. The worker imports this module and delegates the heavy lifting here.

class lalandre_embedding.pipeline.ChunkEmbeddingService(*args, **kwargs)[source]

Bases: Protocol

Embedding interface required by the chunk embedding pipeline.

embed_batch(texts, batch_size=None)[source]

Embed a batch of texts and preserve input ordering.

Parameters:
  • texts (list[str])

  • batch_size (int | None)

Return type:

list[list[float]]

estimate_tokens(text)[source]

Estimate the token count for one input text when supported.

Parameters:

text (str)

Return type:

int | None

lalandre_embedding.pipeline.compute_payload_hash(payload)[source]

Stable hash of payload to detect changes across runs.

Parameters:

payload (dict[str, Any])

Return type:

str

lalandre_embedding.pipeline.mean_pool_vectors(vectors)[source]

Return the element-wise mean of vectors, or None if empty.

Parameters:

vectors (list[list[float]])

Return type:

list[float] | None

lalandre_embedding.pipeline.truncate_subdivision_content(content, max_chars)[source]

Truncate content for embedding if it exceeds max_chars.

Parameters:
  • content (str)

  • max_chars (int)

Return type:

str

lalandre_embedding.pipeline.resolved_subdivision_embed_max_chars(config)[source]

Compute the max embedding chars from a LalandreConfig.

Parameters:

config (Any)

Return type:

int

lalandre_embedding.pipeline.make_state_record(*, object_type, object_id, provider, model_name, vector_size, content_hash)[source]

Build one embedding-state upsert record.

Parameters:
  • object_type (str)

  • object_id (int)

  • provider (str)

  • model_name (str)

  • vector_size (int)

  • content_hash (str)

Return type:

dict[str, Any]

lalandre_embedding.pipeline.extract_metadata(act)[source]

Extract metadata entries from an ORM act, or None.

Parameters:

act (Any)

Return type:

dict[str, str] | None

lalandre_embedding.pipeline.build_subdivision_payload(payload_builder, subdivision, act, version, metadata, content_override=None)[source]

Assemble the Qdrant payload dict from ORM models for a subdivision.

Parameters:
  • payload_builder (PayloadBuilder)

  • subdivision (Any)

  • act (Any)

  • version (Any | None)

  • metadata (dict[str, str] | None)

  • content_override (str | None)

Return type:

dict[str, Any]

lalandre_embedding.pipeline.build_chunk_payload(payload_builder, chunk, subdivision, act)[source]

Assemble the Qdrant payload dict from ORM models for a chunk.

Parameters:
  • payload_builder (PayloadBuilder)

  • chunk (Any)

  • subdivision (Any)

  • act (Any)

Return type:

dict[str, Any]

lalandre_embedding.pipeline.build_act_document_payload(payload_builder, act, subdivisions, *, vector_method, chunk_count)[source]

Assemble the Qdrant payload dict from ORM models for a whole-act vector.

Parameters:
  • payload_builder (PayloadBuilder)

  • act (Any)

  • subdivisions (list[Any])

  • vector_method (str)

  • chunk_count (int)

Return type:

dict[str, Any]

class lalandre_embedding.pipeline.EmbedBatchResult(points=<factory>, state_records=<factory>, delete_filters=<factory>, embedded_count=0, skipped_count=0)[source]

Bases: object

Result of preparing one embedding batch — ready to be persisted by the caller.

Parameters:
  • points (list[VectorPoint])

  • state_records (list[dict[str, Any]])

  • delete_filters (list[dict[str, Any]])

  • embedded_count (int)

  • skipped_count (int)

lalandre_embedding.pipeline.prepare_subdivision_embeddings(*, act, subdivisions, version, metadata, embedding_service, payload_builder, state_map, batch_size, max_batch_size, max_chars, provider, model_name, vector_size, force=False)[source]

Prepare embedding batches for subdivisions. Pure logic — no DB access.

Parameters:
  • act (Any) – The ORM act object (for payload building).

  • subdivisions (list[Any]) – ORM subdivision objects.

  • version (Any | None) – Current version for the act (or None).

  • metadata (dict[str, str] | None) – Act metadata dict (or None).

  • embedding_service (ChunkEmbeddingService) – Service to produce vectors.

  • payload_builder (PayloadBuilder) – Builds Qdrant payloads.

  • state_map (dict[int, str]) – Pre-fetched {subdivision_id: content_hash} from embedding_state.

  • max_batch_size (int) – Embedding batch sizing.

  • max_chars (int) – Truncation limit for subdivision content.

  • vector_size (int) – Embedding identity.

  • force (bool) – If True, embed everything regardless of state_map.

  • batch_size (int)

  • max_batch_size

  • provider (str)

  • model_name (str)

  • vector_size

Returns:

One EmbedBatchResult per batch, ready to be persisted.

Return type:

list[EmbedBatchResult]

lalandre_embedding.pipeline.prepare_chunk_embeddings(*, chunks, act, embedding_service, payload_builder, state_map, batch_size, retrieval_segment_chars, retrieval_segment_overlap, provider, model_name, vector_size, force)[source]

Prepare embedding batches for chunks. Pure logic — no DB access.

Parameters:
  • chunks (list[Any]) – List of (chunk, subdivision) tuples.

  • act (Any) – The ORM act object (for payload building).

  • embedding_service (ChunkEmbeddingService) – Service to produce vectors.

  • payload_builder (PayloadBuilder) – Builds Qdrant payloads.

  • state_map (dict[int, str]) – Pre-fetched {chunk_id: content_hash} from embedding_state.

  • batch_size (int) – Embedding batch sizing.

  • retrieval_segment_overlap (int) – Target segmentation budget for long article retrieval vectors.

  • vector_size (int) – Embedding identity.

  • force (bool) – If True, embed everything regardless of state_map.

  • retrieval_segment_chars (int)

  • retrieval_segment_overlap

  • provider (str)

  • model_name (str)

  • vector_size

Returns:

One EmbedBatchResult per batch, ready to be persisted.

Return type:

list[EmbedBatchResult]

lalandre_embedding.pipeline.prepare_act_document_embedding(*, act, subdivisions, chunks, chunk_vectors, payload_builder, state_map, provider, model_name, vector_size, force=False)[source]

Prepare whole-act embedding via mean pooling. Pure logic — no DB access.

Parameters:
  • act (Any) – The ORM act object.

  • subdivisions (list[Any]) – ORM subdivision objects for the act.

  • chunks (list[Any]) – ORM (chunk, subdivision) tuples for the act.

  • chunk_vectors (dict[int, list[float]]) – Pre-fetched {chunk_id: vector} from Qdrant.

  • payload_builder (PayloadBuilder) – Builds Qdrant payloads.

  • state_map (dict[int, str]) – Pre-fetched {act_id: content_hash} from embedding_state.

  • vector_size (int) – Embedding identity.

  • force (bool) – If True, embed regardless of state_map.

  • provider (str)

  • model_name (str)

  • vector_size

Returns:

EmbedBatchResult with a single point, or None if skipped.

Return type:

EmbedBatchResult | None

lalandre_embedding.providers

Source: packages/lalandre_embedding/lalandre_embedding/providers/__init__.py

Embedding provider implementations.

lalandre_embedding.providers.local

Source: packages/lalandre_embedding/lalandre_embedding/providers/local.py

Local embedding provider

class lalandre_embedding.providers.local.LocalEmbedding(model_name=None, device=None, cache_dir=None, normalize_embeddings=True, enable_cache=None, cache_max_size=None)[source]

Bases: EmbeddingProvider, SupportsCacheSize

Local embedding provider using sentence-transformers with in-memory LRU cache

Parameters:
  • model_name (str | None)

  • device (str | None)

  • cache_dir (str | None)

  • normalize_embeddings (bool)

  • enable_cache (bool | None)

  • cache_max_size (int | None)

estimate_tokens(text)[source]

Estimate token usage with the underlying sentence-transformers tokenizer.

Parameters:

text (str)

Return type:

int | None

get_max_input_tokens()[source]

Return the maximum sequence length supported by the local model.

Return type:

int | None

embed_text(text)[source]

Generate embedding with cache support

Parameters:

text (str)

Return type:

List[float]

embed_batch(texts)[source]

Generate batch embeddings with cache support

Parameters:

texts (List[str])

Return type:

List[List[float]]

get_vector_size()[source]

Return the embedding vector dimension exposed by the model.

Return type:

int

get_cache_size()[source]

Return the current number of cached embeddings.

Return type:

int

lalandre_embedding.providers.mistral

Source: packages/lalandre_embedding/lalandre_embedding/providers/mistral.py

Mistral embedding provider with multi-key round-robin support and Redis cache.

Token counting uses mistral-common with the v1 SentencePiece tokenizer (the tokenizer family used by mistral-embed).

class lalandre_embedding.providers.mistral.MistralEmbedding(model_name=None, redis_client=None, cache_ttl=None, key_pool=None)[source]

Bases: EmbeddingProvider, SupportsNumKeys

Mistral embedding provider with multi-key round-robin support and Redis cache

Parameters:
  • model_name (str | None)

  • redis_client (Any | None)

  • cache_ttl (int | None)

  • key_pool (APIKeyPool | None)

get_max_input_tokens()[source]

Return the documented maximum input length for mistral-embed.

Return type:

int | None

estimate_tokens(text)[source]

Count tokens using the Mistral v1 SentencePiece tokenizer.

Parameters:

text (str)

Return type:

int | None

embed_text(text)[source]

Generate embedding using next available client (round-robin) with cache

Parameters:

text (str)

Return type:

List[float]

embed_batch(texts)[source]

Generate batch embeddings using next available client (round-robin) with cache

Parameters:

texts (List[str])

Return type:

List[List[float]]

get_vector_size()[source]

Return the embedding vector dimension configured for the provider.

Return type:

int

get_num_keys()[source]

Return the number of API keys participating in round-robin calls.

Return type:

int

lalandre_embedding.service

Source: packages/lalandre_embedding/lalandre_embedding/service.py

Embedding service.

Supports multiple providers (Mistral API, local sentence-transformers) with automatic token-limit guard, adaptive text splitting, and weighted-average aggregation for long documents.

class lalandre_embedding.service.EmbeddingService(provider=None, model_name=None, api_key=None, device=None, cache_dir=None, normalize_embeddings=None, enable_cache=None, cache_max_size=None, key_pool=None)[source]

Bases: EmbeddingProvider

Supports multiple providers: Mistral and local models.

The service itself exposes the EmbeddingProvider interface so callers can benefit from token guards and adaptive splitting without reaching into the raw provider implementation.

Initialize embedding service

Parameters:
  • provider (str | None)

  • model_name (str | None)

  • api_key (str | None)

  • device (str | None)

  • cache_dir (str | None)

  • normalize_embeddings (bool | None)

  • enable_cache (bool | None)

  • cache_max_size (int | None)

  • key_pool (Any | None)

embed_text(text)[source]

Generate embedding for a single text :returns: List of floats representing the embedding vector

Parameters:

text (str)

Return type:

List[float]

embed_batch(texts, batch_size=None)[source]

Generate embeddings for multiple texts efficiently

Parameters:
  • texts (List[str]) – List of input texts

  • batch_size (int | None) – Number of texts to process at once (None = use config default)

Returns:

List of embedding vectors

Return type:

List[List[float]]

estimate_tokens(text)[source]

Delegate to the internal token estimation chain (provider → char-ratio).

Parameters:

text (str)

Return type:

int | None

get_max_input_tokens()[source]

Return the effective token limit (config ∩ provider).

Return type:

int | None

get_vector_size()[source]

Get the dimension of the embedding vectors

Return type:

int