Storage API

Note

This page is generated automatically from the repository’s maintained Python module inventory.

PostgreSQL, Qdrant, and Neo4j persistence layers.

lalandre_db_neo4j

Source: packages/lalandre_db_neo4j/lalandre_db_neo4j/__init__.py

Neo4j Repository Module Provides graph database functionality for Graph RAG

lalandre_db_neo4j.models

Source: packages/lalandre_db_neo4j/lalandre_db_neo4j/models.py

Neo4j Graph Models

class lalandre_db_neo4j.models.ActNode(*, id, celex, title, act_type, language, adoption_date=None, force_date=None, end_date=None, sector=None, level=None, official_journal_reference=None, eli=None, url_eurlex=None)[source]

Bases: BaseModel

Represents an Act as a node in the knowledge graph

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • id (int)

  • celex (str)

  • title (str)

  • act_type (str)

  • language (str)

  • adoption_date (datetime | None)

  • force_date (datetime | None)

  • end_date (datetime | None)

  • sector (int | None)

  • level (int | None)

  • official_journal_reference (str | None)

  • eli (str | None)

  • url_eurlex (str | None)

to_neo4j_properties()[source]

Convert to Neo4j node properties

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lalandre_db_neo4j.models.ActRelationship(*, source_act_id, target_act_id, relation_type, effect_date=None, description=None, source_subdivision_id=None, target_subdivision_id=None)[source]

Bases: BaseModel

Represents a relationship between two acts

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • source_act_id (int)

  • target_act_id (int)

  • relation_type (str)

  • effect_date (datetime | None)

  • description (str | None)

  • source_subdivision_id (int | None)

  • target_subdivision_id (int | None)

to_neo4j_properties()[source]

Convert to Neo4j relationship properties

Return type:

dict[str, Any]

get_neo4j_type()[source]

Get Neo4j relationship type (uppercase)

Return type:

str

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lalandre_db_neo4j.models.EntityNode(*, name, entity_type, description=None)[source]

Bases: BaseModel

Represents a named legal entity (org, concept, jurisdiction, topic) in the knowledge graph. Linked to Acts via MENTIONS relationships.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • name (str)

  • entity_type (str)

  • description (str | None)

to_neo4j_properties()[source]

Return the serializable property map for the entity node.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lalandre_db_neo4j.models.CommunityNode(*, id, num_acts=0, num_relations=0, relation_types='{}', central_acts='[]', summary='', modularity=None, resolution=None)[source]

Bases: BaseModel

Represents a detected community of Acts in the knowledge graph. Created by Louvain community detection; linked to Acts via BELONGS_TO.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • id (int)

  • num_acts (int)

  • num_relations (int)

  • relation_types (str)

  • central_acts (str)

  • summary (str)

  • modularity (float | None)

  • resolution (float | None)

to_neo4j_properties()[source]

Return the serializable property map for the community node.

Return type:

dict[str, Any]

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lalandre_db_neo4j.models.GraphQueryResult(*, nodes=<factory>, relationships=<factory>, metadata=<factory>)[source]

Bases: BaseModel

Result from a graph query Contains act nodes and relationships that form a subgraph

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • nodes (list[dict[str, Any]])

  • relationships (list[dict[str, Any]])

  • metadata (dict[str, Any])

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

lalandre_db_neo4j.repository

Source: packages/lalandre_db_neo4j/lalandre_db_neo4j/repository.py

Neo4j Repository Handles all interactions with the Neo4j graph database

class lalandre_db_neo4j.repository.Neo4jRepository(settings=None)[source]

Bases: BaseRepository

Repository for Neo4j graph operations

Responsibilities: - Manage Neo4j driver and sessions - Create/update Act nodes and relationships - Execute graph traversal queries - Support Graph RAG operations

Note: Subdivisions and Versions are managed in PostgreSQL/Qdrant only.

Initialize Neo4j connection

Parameters:

settings (GraphConfig | None) – Neo4j connection settings (defaults to global config)

get_session()[source]

Context manager for Neo4j sessions

Return type:

Generator[Session, None, None]

close()[source]

Close the Neo4j driver

health_check()[source]

Verify Neo4j connectivity

Return type:

bool

validate_read_only_cypher(cypher)[source]

Public wrapper — validate that a Cypher statement is read-only.

Parameters:

cypher (str)

Return type:

str

serialize_neo4j_value(value)[source]

Public wrapper — convert a Neo4j value to a JSON-friendly payload.

Parameters:

value (Any)

Return type:

Any

execute_read_only_query(cypher, *, params=None, result_limit=None)[source]

Execute a validated read-only Cypher query and return JSON-safe rows.

Parameters:
  • cypher (str)

  • params (Dict[str, Any] | None)

  • result_limit (int | None)

Return type:

List[Dict[str, Any]]

create_act_node(act)[source]

Create an Act node in the graph

Parameters:

act (ActNode) – ActNode data

Returns:

The act ID

Return type:

int

upsert_entity_mention(act_celex, entity)[source]

Idempotently create (or update) an Entity node and a MENTIONS edge from the Act.

Uses MERGE on (name, type) so duplicate extractions are safe to call repeatedly.

Parameters:
Return type:

None

create_act_relationship(relationship)[source]

Create a relationship between two Acts

Parameters:

relationship (ActRelationship) – ActRelationship data

Returns:

True if created successfully

Return type:

bool

clear_act_relationships()[source]

Delete all relationships between Act nodes and return the count removed.

Return type:

int

clear_communities()[source]

Delete all Community nodes and BELONGS_TO relationships. Returns count deleted.

Return type:

int

upsert_community(community)[source]

Create or update a Community node.

Parameters:

community (CommunityNode)

Return type:

None

upsert_communities_batch(communities, batch_size=100)[source]

Batch upsert Community nodes.

Parameters:
Return type:

None

Create BELONGS_TO relationships from Act nodes to Community nodes.

Parameters:
  • partition (Dict[int, int])

  • batch_size (int)

Return type:

None

get_communities_for_acts(act_ids)[source]

Given seed act IDs, return the Community nodes they belong to.

Returns a list of community dicts with all properties.

Parameters:

act_ids (List[int])

Return type:

List[Dict[str, Any]]

expand_from_acts(act_ids, max_depth=None)[source]

Expand graph context from multiple seed act IDs in a single query.

Returns deduplicated nodes and relationships reachable within max_depth hops from any of the seed acts.

Parameters:
  • act_ids (List[int])

  • max_depth (int | None)

Return type:

GraphQueryResult

get_statistics()[source]
Returns:

Dictionary with counts and metrics

Return type:

Dict[str, Any]

lalandre_db_postgres

Source: packages/lalandre_db_postgres/lalandre_db_postgres/__init__.py

PostgreSQL Repository Package

lalandre_db_postgres.models

Source: packages/lalandre_db_postgres/lalandre_db_postgres/models.py

PostgreSQL SQLAlchemy models

class lalandre_db_postgres.models.Base(**kwargs)[source]

Bases: DeclarativeBase

Base class for all SQLAlchemy declarative models.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

Parameters:

kwargs (Any)

class lalandre_db_postgres.models.ActsSQL(**kwargs)[source]

Bases: Base

Persisted legislative act.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.ActRelationsSQL(**kwargs)[source]

Bases: Base

Directed relationship extracted between two acts.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.ActMetadataSQL(**kwargs)[source]

Bases: Base

Key-value metadata attached to an act.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.ActSubjectsSQL(**kwargs)[source]

Bases: Base

Association table between acts and subject matters.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.SubjectMattersSQL(**kwargs)[source]

Bases: Base

EuroVoc-like subject taxonomy entry.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.VersionsSQL(**kwargs)[source]

Bases: Base

Version row describing one published state of an act.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.SubdivisionsSQL(**kwargs)[source]

Bases: Base

Hierarchical subdivision belonging to an act version.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.ChunksSQL(**kwargs)[source]

Bases: Base

Chunks of subdivisions for fine-grained retrieval

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.ConversationSQL(**kwargs)[source]

Bases: Base

Multi-turn conversation session.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.ConversationMessageSQL(**kwargs)[source]

Bases: Base

Single message within a conversation (human or assistant).

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.EmbeddingStateSQL(**kwargs)[source]

Bases: Base

Tracks embedding status per object and model to avoid unnecessary re-embedding

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

class lalandre_db_postgres.models.ActSummarySQL(**kwargs)[source]

Bases: Base

Canonical or derived summary generated for an act.

A simple constructor that allows initialization from kwargs.

Sets attributes on the constructed instance using the names and values in kwargs.

Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.

lalandre_db_postgres.repository

Source: packages/lalandre_db_postgres/lalandre_db_postgres/repository.py

PostgreSQL repository implementation

class lalandre_db_postgres.repository.PostgresRepository(connection_string)[source]

Bases: BaseRepository

Repository for PostgreSQL database operations for RAG: retrieval, context enrichment, and structured legal data access

Parameters:

connection_string (str)

get_session()[source]

Get a database session

Return type:

Session

close()[source]

Close the database connection

health_check()[source]

Verify PostgreSQL connectivity

Return type:

bool

static ensure_embedding_state_table(session)[source]

Ensure embedding_state table exists and supports all runtime object types.

Parameters:

session (Session)

Return type:

None

static purge_orphan_embedding_states(session)[source]

Delete embedding-state rows referencing missing acts/chunks/subdivisions.

Parameters:

session (Session)

Return type:

int

static count_subdivisions(session)[source]

Return the total number of stored subdivisions.

Parameters:

session (Session)

Return type:

int

static count_chunks(session)[source]

Return the total number of stored chunks.

Parameters:

session (Session)

Return type:

int

static count_embedded_subdivisions(session, provider, model_name, vector_size)[source]

Count subdivisions already embedded for one embedding runtime.

Parameters:
  • session (Session)

  • provider (str)

  • model_name (str)

  • vector_size (int)

Return type:

int

static count_embedded_chunks(session, provider, model_name, vector_size)[source]

Count chunks already embedded for one embedding runtime.

Parameters:
  • session (Session)

  • provider (str)

  • model_name (str)

  • vector_size (int)

Return type:

int

static get_embedding_state_map(session, object_type, object_ids, provider, model_name, vector_size)[source]

Return {object_id: content_hash} for existing embeddings.

Parameters:
  • session (Session)

  • object_type (str)

  • object_ids (list[int])

  • provider (str)

  • model_name (str)

  • vector_size (int)

Return type:

dict[int, str]

static upsert_embedding_states(session, records)[source]

Upsert embedding-state rows for a batch.

Parameters:
  • session (Session)

  • records (list[dict[str, Any]])

Return type:

None

static list_acts_with_metadata(session)[source]

Return acts with eager-loaded metadata, versions, and summaries.

Parameters:

session (Session)

Return type:

list[Any]

static get_act_by_celex(session, celex)[source]

Return the act identified by celex, if present.

Parameters:
  • session (Session)

  • celex (str)

Return type:

Any | None

static list_subdivisions_for_act(session, act_id)[source]

List subdivisions for one act ordered by sequence.

Parameters:
  • session (Session)

  • act_id (int)

Return type:

list[Any]

static list_subdivisions_for_act_version(session, act_id, version_id)[source]

List subdivisions for an act, optionally restricted to one version.

Parameters:
  • session (Session)

  • act_id (int)

  • version_id (int | None)

Return type:

list[Any]

static get_current_version_for_act(session, act_id)[source]

Return the current version row for an act, if any.

Parameters:
  • session (Session)

  • act_id (int)

Return type:

Any | None

static get_act_summary(session, *, act_id, language, summary_kind='canonical')[source]

Return one summary row for an act/language/kind triplet.

Parameters:
  • session (Session)

  • act_id (int)

  • language (str)

  • summary_kind (str)

Return type:

Any | None

static upsert_act_summary(session, record)[source]

Insert or update one act summary record.

Parameters:
  • session (Session)

  • record (Dict[str, Any])

Return type:

None

static list_chunks_for_act(session, act_id)[source]

Return chunk rows paired with their subdivisions for one act.

Parameters:
  • session (Session)

  • act_id (int)

Return type:

list[tuple[Any, Any]]

static count_acts(session)[source]

Return the total number of acts.

Parameters:

session (Session)

Return type:

int

static count_acts_pending_extraction(session)[source]

Return the number of acts not yet marked as extracted.

Parameters:

session (Session)

Return type:

int

static reset_extraction_status(session, act_id)[source]

Reset extraction state so the act is re-extracted on next run.

Parameters:
  • session (Session)

  • act_id (int)

Return type:

None

static reset_all_extraction_statuses(session)[source]

Reset extraction state for all acts after a global pipeline purge.

Parameters:

session (Session)

Return type:

int

static reset_stale_extracting_acts(session, timeout_minutes)[source]

Reset acts stuck in ‘extracting’ for longer than timeout_minutes.

Parameters:
  • session (Session)

  • timeout_minutes (int)

Return type:

int

static count_subdivisions_without_chunks(session, min_content_length)[source]

Count eligible subdivisions that still have no generated chunks.

Parameters:
  • session (Session)

  • min_content_length (int)

Return type:

int

static list_chunk_ids_for_subdivision(session, subdivision_id)[source]

Return all chunk identifiers attached to one subdivision.

Parameters:
  • session (Session)

  • subdivision_id (int)

Return type:

list[int]

static list_chunk_ids_for_act(session, act_id)[source]

Return all chunk identifiers attached to one act.

Parameters:
  • session (Session)

  • act_id (int)

Return type:

list[int]

static subdivision_has_chunks(session, subdivision_id)[source]

Return whether a subdivision already has at least one chunk.

Parameters:
  • session (Session)

  • subdivision_id (int)

Return type:

bool

static delete_embedding_states_for_chunk_ids(session, chunk_ids)[source]

Delete embedding-state rows for the given chunk identifiers.

Parameters:
  • session (Session)

  • chunk_ids (list[int])

Return type:

int

static delete_embedding_states_for_act_ids(session, act_ids)[source]

Delete embedding-state rows for the given act identifiers.

Parameters:
  • session (Session)

  • act_ids (list[int])

Return type:

int

static delete_chunks_for_subdivision(session, subdivision_id)[source]

Delete all chunks belonging to one subdivision.

Parameters:
  • session (Session)

  • subdivision_id (int)

Return type:

int

static delete_chunks_for_act(session, act_id)[source]

Delete all chunks belonging to one act.

Parameters:
  • session (Session)

  • act_id (int)

Return type:

int

static insert_chunk_records(session, records)[source]

Insert a batch of precomputed chunk rows.

Parameters:
  • session (Session)

  • records (list[dict[str, Any]])

Return type:

None

static clear_chunks(session)[source]

Delete every chunk row.

Parameters:

session (Session)

Return type:

int

static clear_embedding_states(session)[source]

Delete every embedding-state row.

Parameters:

session (Session)

Return type:

int

static clear_act_relations(session)[source]

Delete every extracted act relation.

Parameters:

session (Session)

Return type:

int

static clear_act_summaries(session)[source]

Delete every persisted act summary.

Parameters:

session (Session)

Return type:

int

search_bm25(query, top_k=None, language=None, filter_conditions=None)[source]

Full-text search using PostgreSQL native ts_rank_cd (BM25-like ranking). Returns subdivisions with relevance scores for hybrid RAG retrieval.

Searches across all active languages (FR + EN) in parallel and merges results by best score, unless a language override is provided via the language parameter or filter_conditions['language'].

Parameters:
  • query (str) – Search query

  • top_k (int | None) – Number of results to return (defaults to config.search.default_limit)

  • language (str | None) – PostgreSQL text-search config override (e.g. ‘french’). When set, only that language is searched. Defaults to multilingual UNION.

  • filter_conditions (Dict[str, Any] | None) – Optional filters (e.g., {“act_id”: 123, “celex”: “32016R0679”}). filter_conditions['language'] restricts results to a single act language.

Return type:

List[Dict[str, Any]]

static get_conversation(session, conversation_id)[source]

Return a conversation session by identifier.

Parameters:
  • session (Session)

  • conversation_id (str)

Return type:

ConversationSQL | None

static create_conversation(session, conversation_id, title, user_id=None)[source]

Create and flush a new conversation session.

Parameters:
  • session (Session)

  • conversation_id (str)

  • title (str)

  • user_id (str | None)

Return type:

ConversationSQL

static list_conversations(session, user_id, limit=50)[source]

List the most recent conversations for one user.

Parameters:
  • session (Session)

  • user_id (str | None)

  • limit (int)

Return type:

List[ConversationSQL]

static delete_conversation(session, conversation_id)[source]

Delete one conversation and report whether a row was removed.

Parameters:
  • session (Session)

  • conversation_id (str)

Return type:

bool

static add_conversation_message(session, message_id, conversation_id, role, content, query_id=None, mode=None, metadata=None)[source]

Append and flush one message inside a conversation.

Parameters:
  • session (Session)

  • message_id (str)

  • conversation_id (str)

  • role (str)

  • content (str)

  • query_id (str | None)

  • mode (str | None)

  • metadata (Dict[str, Any] | None)

Return type:

ConversationMessageSQL

static get_conversation_messages(session, conversation_id, limit=20)[source]

Return the latest conversation messages in chronological order.

Parameters:
  • session (Session)

  • conversation_id (str)

  • limit (int)

Return type:

List[ConversationMessageSQL]

static touch_conversation(session, conversation_id)[source]

Refresh the updated_at timestamp for a conversation.

Parameters:
  • session (Session)

  • conversation_id (str)

Return type:

None

search_bm25_chunks(query, top_k=None, language=None, filter_conditions=None)[source]

Full-text search using PostgreSQL ts_rank_cd over chunk content. Returns chunks with relevance scores for fine-grained retrieval.

Searches across all active languages (FR + EN) in parallel and merges results by best score, unless a language override is provided via the language parameter or filter_conditions['language'].

Parameters:
  • query (str)

  • top_k (int | None)

  • language (str | None)

  • filter_conditions (Dict[str, Any] | None)

Return type:

List[Dict[str, Any]]

lalandre_db_qdrant

Source: packages/lalandre_db_qdrant/lalandre_db_qdrant/__init__.py

Qdrant repository package Provides vector database operations for similarity search

lalandre_db_qdrant.models

Source: packages/lalandre_db_qdrant/lalandre_db_qdrant/models.py

Qdrant vector models Data structures for vector database operations

class lalandre_db_qdrant.models.VectorPoint(*, id, vector, payload)[source]

Bases: BaseModel

Represents a point in vector space with metadata

Abstraction layer over Qdrant’s PointStruct to decouple application code from Qdrant implementation details.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • id (str | int)

  • vector (List[float])

  • payload (Dict[str, Any])

to_qdrant_point()[source]

Convert to Qdrant’s native PointStruct

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class lalandre_db_qdrant.models.SearchResult(*, id, score, payload)[source]

Bases: BaseModel

Result of a vector search

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:
  • id (str)

  • score (float)

  • payload (Dict[str, Any])

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

lalandre_db_qdrant.repository

Source: packages/lalandre_db_qdrant/lalandre_db_qdrant/repository.py

Qdrant repository implementation similarity search and vector retrieval

class lalandre_db_qdrant.repository.QdrantRepository(host=None, port=None, collection_name=None, vector_size=None, api_key=None, use_https=None)[source]

Bases: BaseRepository

Repository for Qdrant vector database operations Handles all low-level operations including collection management and data ingestion

Collection naming: {base}_{model}_{dimension} Examples: - chunk_embeddings_mistral_1024 - chunk_embeddings_gte_large_1024 - chunk_embeddings_bge_m3_1024 - chunk_embeddings_e5_large_1024

This allows multiple embedding models to coexist without confusion.

Parameters:
  • host (str | None)

  • port (int | None)

  • collection_name (str | None)

  • vector_size (int | None)

  • api_key (str | None)

  • use_https (bool | None)

static make_collection_name(base_name, model_name, dimension)[source]

Generate collection name: {base}_{model}_{dimension}

Parameters:
  • base_name (str) – Base collection name

  • model_name (str) – Model name (e.g., ‘thenlper/gte-large’)

  • dimension (int) – Vector dimension (e.g., 1024)

Returns:

Collection name like ‘chunk_embeddings_gte_large_1024’

Return type:

str

Example

>>> QdrantRepository.make_collection_name(
...     'chunk_embeddings',
...     'thenlper/gte-large',
...     1024
... )
'chunk_embeddings_gte_large_1024'
property vector_size: int

Get vector size - auto-detects from existing collection if not set

close()[source]

Close Qdrant client connection

health_check()[source]

Verify Qdrant connectivity

Return type:

bool

collection_exists()[source]

Check if collection exists

Return type:

bool

create_collection(recreate=False, distance=Distance.COSINE)[source]
Parameters:
  • recreate (bool) – If True, delete existing collection and recreate

  • distance (Distance) – Distance metric (COSINE, EUCLID, DOT)

Returns:

True if collection was created, False if already exists

Return type:

bool

classmethod from_embedding_service_with_auto_collection(embedding_service, base_collection_name=None)[source]

Create repository with automatic collection naming based on embedding model

Format: {base}_{model}_{dimension}

Parameters:
  • embedding_service (Any) – EmbeddingService instance

  • base_collection_name (str | None) – Base collection name

Returns:

QdrantRepository with auto-generated collection name

Return type:

QdrantRepository

Example

>>> embedding_service = EmbeddingService(provider="local", model_name="thenlper/gte-large")
>>> repo = QdrantRepository.from_embedding_service_with_auto_collection(embedding_service)
>>> # Collection: 'chunk_embeddings_gte_large_1024'
create_payload_index(field_name, field_schema)[source]

for efficient filtering

Parameters:
  • field_name (str) – Name of the payload field to index

  • field_schema (PayloadSchemaType) – Schema type (KEYWORD, INTEGER, FLOAT, etc.)

Returns:

True if index created successfully

Return type:

bool

setup_standard_indexes()[source]

Create standard payload indexes used for legal document filtering.

Return type:

None

upsert_points(points, batch_size=None)[source]

Insert or update points in the collection with retry on transient transport errors.

Parameters:
  • points (Sequence[VectorPoint | PointStruct]) – List of VectorPoint or PointStruct objects to upsert

  • batch_size (int | None) – Optional batch size for large inserts

Returns:

Number of points upserted

Return type:

int

delete_points(point_ids)[source]

Delete points by IDs from the current collection.

Parameters:

point_ids (List[str | int])

Return type:

int

delete_points_by_filter(query_filter)[source]

Delete points matching a payload filter from the current collection.

Parameters:

query_filter (Dict[str, Any] | Filter)

Return type:

int

search(query_vector, limit=None, score_threshold=None, query_filter=None, hnsw_ef=None, exact=None)[source]

Search for similar vectors :param query_vector: The query embedding :param limit: Maximum number of results (None -> config.search.default_limit) :param score_threshold: Minimum similarity score (0-1) :param query_filter: Metadata filters (e.g., {“document_type”: “directive”}) :param hnsw_ef: ANN breadth parameter for HNSW search (None -> config.search.hnsw_ef) :param exact: Force exact vector search (None -> config.search.exact_search)

Parameters:
  • query_vector (List[float])

  • limit (int | None)

  • score_threshold (float | None)

  • query_filter (Dict[str, Any] | Filter | None)

  • hnsw_ef (int | None)

  • exact (bool | None)

Return type:

List[SearchResult]

retrieve_vectors_by_ids(point_ids)[source]

Retrieve stored embedding vectors by point IDs.

Returns a dict mapping point_id -> vector. Points not found in the collection are silently omitted.

Parameters:

point_ids (List[int])

Return type:

Dict[int, List[float]]

get_collection_info()[source]

Return collection metadata and point counts for the active collection.

Return type:

Dict[str, Any]

get_statistics()[source]

Get vector database statistics

Return type:

Dict[str, Any]