Storage API¶
Note
This page is generated automatically from the repository’s maintained Python module inventory.
PostgreSQL, Qdrant, and Neo4j persistence layers.
lalandre_db_neo4j¶
Source: packages/lalandre_db_neo4j/lalandre_db_neo4j/__init__.py
Neo4j Repository Module Provides graph database functionality for Graph RAG
lalandre_db_neo4j.models¶
Source: packages/lalandre_db_neo4j/lalandre_db_neo4j/models.py
Neo4j Graph Models
- class lalandre_db_neo4j.models.ActNode(*, id, celex, title, act_type, language, adoption_date=None, force_date=None, end_date=None, sector=None, level=None, official_journal_reference=None, eli=None, url_eurlex=None)[source]¶
Bases:
BaseModelRepresents an Act as a node in the knowledge graph
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int)
celex (str)
title (str)
act_type (str)
language (str)
adoption_date (datetime | None)
force_date (datetime | None)
end_date (datetime | None)
sector (int | None)
level (int | None)
official_journal_reference (str | None)
eli (str | None)
url_eurlex (str | None)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_db_neo4j.models.ActRelationship(*, source_act_id, target_act_id, relation_type, effect_date=None, description=None, source_subdivision_id=None, target_subdivision_id=None)[source]¶
Bases:
BaseModelRepresents a relationship between two acts
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
source_act_id (int)
target_act_id (int)
relation_type (str)
effect_date (datetime | None)
description (str | None)
source_subdivision_id (int | None)
target_subdivision_id (int | None)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_db_neo4j.models.EntityNode(*, name, entity_type, description=None)[source]¶
Bases:
BaseModelRepresents a named legal entity (org, concept, jurisdiction, topic) in the knowledge graph. Linked to Acts via MENTIONS relationships.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
name (str)
entity_type (str)
description (str | None)
- to_neo4j_properties()[source]¶
Return the serializable property map for the entity node.
- Return type:
dict[str, Any]
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_db_neo4j.models.CommunityNode(*, id, num_acts=0, num_relations=0, relation_types='{}', central_acts='[]', summary='', modularity=None, resolution=None)[source]¶
Bases:
BaseModelRepresents a detected community of Acts in the knowledge graph. Created by Louvain community detection; linked to Acts via BELONGS_TO.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int)
num_acts (int)
num_relations (int)
relation_types (str)
central_acts (str)
summary (str)
modularity (float | None)
resolution (float | None)
- to_neo4j_properties()[source]¶
Return the serializable property map for the community node.
- Return type:
dict[str, Any]
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_db_neo4j.models.GraphQueryResult(*, nodes=<factory>, relationships=<factory>, metadata=<factory>)[source]¶
Bases:
BaseModelResult from a graph query Contains act nodes and relationships that form a subgraph
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
nodes (list[dict[str, Any]])
relationships (list[dict[str, Any]])
metadata (dict[str, Any])
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_db_neo4j.repository¶
Source: packages/lalandre_db_neo4j/lalandre_db_neo4j/repository.py
Neo4j Repository Handles all interactions with the Neo4j graph database
- class lalandre_db_neo4j.repository.Neo4jRepository(settings=None)[source]¶
Bases:
BaseRepositoryRepository for Neo4j graph operations
Responsibilities: - Manage Neo4j driver and sessions - Create/update Act nodes and relationships - Execute graph traversal queries - Support Graph RAG operations
Note: Subdivisions and Versions are managed in PostgreSQL/Qdrant only.
Initialize Neo4j connection
- Parameters:
settings (GraphConfig | None) – Neo4j connection settings (defaults to global config)
- get_session()[source]¶
Context manager for Neo4j sessions
- Return type:
Generator[Session, None, None]
- validate_read_only_cypher(cypher)[source]¶
Public wrapper — validate that a Cypher statement is read-only.
- Parameters:
cypher (str)
- Return type:
str
- serialize_neo4j_value(value)[source]¶
Public wrapper — convert a Neo4j value to a JSON-friendly payload.
- Parameters:
value (Any)
- Return type:
Any
- execute_read_only_query(cypher, *, params=None, result_limit=None)[source]¶
Execute a validated read-only Cypher query and return JSON-safe rows.
- Parameters:
cypher (str)
params (Dict[str, Any] | None)
result_limit (int | None)
- Return type:
List[Dict[str, Any]]
- create_act_node(act)[source]¶
Create an Act node in the graph
- Parameters:
act (ActNode) – ActNode data
- Returns:
The act ID
- Return type:
int
- upsert_entity_mention(act_celex, entity)[source]¶
Idempotently create (or update) an Entity node and a MENTIONS edge from the Act.
Uses MERGE on (name, type) so duplicate extractions are safe to call repeatedly.
- Parameters:
act_celex (str)
entity (EntityNode)
- Return type:
None
- create_act_relationship(relationship)[source]¶
Create a relationship between two Acts
- Parameters:
relationship (ActRelationship) – ActRelationship data
- Returns:
True if created successfully
- Return type:
bool
- clear_act_relationships()[source]¶
Delete all relationships between Act nodes and return the count removed.
- Return type:
int
- clear_communities()[source]¶
Delete all Community nodes and BELONGS_TO relationships. Returns count deleted.
- Return type:
int
- upsert_community(community)[source]¶
Create or update a Community node.
- Parameters:
community (CommunityNode)
- Return type:
None
- upsert_communities_batch(communities, batch_size=100)[source]¶
Batch upsert Community nodes.
- Parameters:
communities (List[CommunityNode])
batch_size (int)
- Return type:
None
- link_acts_to_communities(partition, batch_size=500)[source]¶
Create BELONGS_TO relationships from Act nodes to Community nodes.
- Parameters:
partition (Dict[int, int])
batch_size (int)
- Return type:
None
- get_communities_for_acts(act_ids)[source]¶
Given seed act IDs, return the Community nodes they belong to.
Returns a list of community dicts with all properties.
- Parameters:
act_ids (List[int])
- Return type:
List[Dict[str, Any]]
lalandre_db_postgres¶
Source: packages/lalandre_db_postgres/lalandre_db_postgres/__init__.py
PostgreSQL Repository Package
lalandre_db_postgres.models¶
Source: packages/lalandre_db_postgres/lalandre_db_postgres/models.py
PostgreSQL SQLAlchemy models
- class lalandre_db_postgres.models.Base(**kwargs)[source]¶
Bases:
DeclarativeBaseBase class for all SQLAlchemy declarative models.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- Parameters:
kwargs (Any)
- class lalandre_db_postgres.models.ActsSQL(**kwargs)[source]¶
Bases:
BasePersisted legislative act.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.ActRelationsSQL(**kwargs)[source]¶
Bases:
BaseDirected relationship extracted between two acts.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.ActMetadataSQL(**kwargs)[source]¶
Bases:
BaseKey-value metadata attached to an act.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.ActSubjectsSQL(**kwargs)[source]¶
Bases:
BaseAssociation table between acts and subject matters.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.SubjectMattersSQL(**kwargs)[source]¶
Bases:
BaseEuroVoc-like subject taxonomy entry.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.VersionsSQL(**kwargs)[source]¶
Bases:
BaseVersion row describing one published state of an act.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.SubdivisionsSQL(**kwargs)[source]¶
Bases:
BaseHierarchical subdivision belonging to an act version.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.ChunksSQL(**kwargs)[source]¶
Bases:
BaseChunks of subdivisions for fine-grained retrieval
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.ConversationSQL(**kwargs)[source]¶
Bases:
BaseMulti-turn conversation session.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.ConversationMessageSQL(**kwargs)[source]¶
Bases:
BaseSingle message within a conversation (human or assistant).
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.EmbeddingStateSQL(**kwargs)[source]¶
Bases:
BaseTracks embedding status per object and model to avoid unnecessary re-embedding
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
- class lalandre_db_postgres.models.ActSummarySQL(**kwargs)[source]¶
Bases:
BaseCanonical or derived summary generated for an act.
A simple constructor that allows initialization from kwargs.
Sets attributes on the constructed instance using the names and values in
kwargs.Only keys that are present as attributes of the instance’s class are allowed. These could be, for example, any mapped columns or relationships.
lalandre_db_postgres.repository¶
Source: packages/lalandre_db_postgres/lalandre_db_postgres/repository.py
PostgreSQL repository implementation
- class lalandre_db_postgres.repository.PostgresRepository(connection_string)[source]¶
Bases:
BaseRepositoryRepository for PostgreSQL database operations for RAG: retrieval, context enrichment, and structured legal data access
- Parameters:
connection_string (str)
- static ensure_embedding_state_table(session)[source]¶
Ensure embedding_state table exists and supports all runtime object types.
- Parameters:
session (Session)
- Return type:
None
- static purge_orphan_embedding_states(session)[source]¶
Delete embedding-state rows referencing missing acts/chunks/subdivisions.
- Parameters:
session (Session)
- Return type:
int
- static count_subdivisions(session)[source]¶
Return the total number of stored subdivisions.
- Parameters:
session (Session)
- Return type:
int
- static count_chunks(session)[source]¶
Return the total number of stored chunks.
- Parameters:
session (Session)
- Return type:
int
- static count_embedded_subdivisions(session, provider, model_name, vector_size)[source]¶
Count subdivisions already embedded for one embedding runtime.
- Parameters:
session (Session)
provider (str)
model_name (str)
vector_size (int)
- Return type:
int
- static count_embedded_chunks(session, provider, model_name, vector_size)[source]¶
Count chunks already embedded for one embedding runtime.
- Parameters:
session (Session)
provider (str)
model_name (str)
vector_size (int)
- Return type:
int
- static get_embedding_state_map(session, object_type, object_ids, provider, model_name, vector_size)[source]¶
Return {object_id: content_hash} for existing embeddings.
- Parameters:
session (Session)
object_type (str)
object_ids (list[int])
provider (str)
model_name (str)
vector_size (int)
- Return type:
dict[int, str]
- static upsert_embedding_states(session, records)[source]¶
Upsert embedding-state rows for a batch.
- Parameters:
session (Session)
records (list[dict[str, Any]])
- Return type:
None
- static list_acts_with_metadata(session)[source]¶
Return acts with eager-loaded metadata, versions, and summaries.
- Parameters:
session (Session)
- Return type:
list[Any]
- static get_act_by_celex(session, celex)[source]¶
Return the act identified by celex, if present.
- Parameters:
session (Session)
celex (str)
- Return type:
Any | None
- static list_subdivisions_for_act(session, act_id)[source]¶
List subdivisions for one act ordered by sequence.
- Parameters:
session (Session)
act_id (int)
- Return type:
list[Any]
- static list_subdivisions_for_act_version(session, act_id, version_id)[source]¶
List subdivisions for an act, optionally restricted to one version.
- Parameters:
session (Session)
act_id (int)
version_id (int | None)
- Return type:
list[Any]
- static get_current_version_for_act(session, act_id)[source]¶
Return the current version row for an act, if any.
- Parameters:
session (Session)
act_id (int)
- Return type:
Any | None
- static get_act_summary(session, *, act_id, language, summary_kind='canonical')[source]¶
Return one summary row for an act/language/kind triplet.
- Parameters:
session (Session)
act_id (int)
language (str)
summary_kind (str)
- Return type:
Any | None
- static upsert_act_summary(session, record)[source]¶
Insert or update one act summary record.
- Parameters:
session (Session)
record (Dict[str, Any])
- Return type:
None
- static list_chunks_for_act(session, act_id)[source]¶
Return chunk rows paired with their subdivisions for one act.
- Parameters:
session (Session)
act_id (int)
- Return type:
list[tuple[Any, Any]]
- static count_acts(session)[source]¶
Return the total number of acts.
- Parameters:
session (Session)
- Return type:
int
- static count_acts_pending_extraction(session)[source]¶
Return the number of acts not yet marked as extracted.
- Parameters:
session (Session)
- Return type:
int
- static reset_extraction_status(session, act_id)[source]¶
Reset extraction state so the act is re-extracted on next run.
- Parameters:
session (Session)
act_id (int)
- Return type:
None
- static reset_all_extraction_statuses(session)[source]¶
Reset extraction state for all acts after a global pipeline purge.
- Parameters:
session (Session)
- Return type:
int
- static reset_stale_extracting_acts(session, timeout_minutes)[source]¶
Reset acts stuck in ‘extracting’ for longer than timeout_minutes.
- Parameters:
session (Session)
timeout_minutes (int)
- Return type:
int
- static count_subdivisions_without_chunks(session, min_content_length)[source]¶
Count eligible subdivisions that still have no generated chunks.
- Parameters:
session (Session)
min_content_length (int)
- Return type:
int
- static list_chunk_ids_for_subdivision(session, subdivision_id)[source]¶
Return all chunk identifiers attached to one subdivision.
- Parameters:
session (Session)
subdivision_id (int)
- Return type:
list[int]
- static list_chunk_ids_for_act(session, act_id)[source]¶
Return all chunk identifiers attached to one act.
- Parameters:
session (Session)
act_id (int)
- Return type:
list[int]
- static subdivision_has_chunks(session, subdivision_id)[source]¶
Return whether a subdivision already has at least one chunk.
- Parameters:
session (Session)
subdivision_id (int)
- Return type:
bool
- static delete_embedding_states_for_chunk_ids(session, chunk_ids)[source]¶
Delete embedding-state rows for the given chunk identifiers.
- Parameters:
session (Session)
chunk_ids (list[int])
- Return type:
int
- static delete_embedding_states_for_act_ids(session, act_ids)[source]¶
Delete embedding-state rows for the given act identifiers.
- Parameters:
session (Session)
act_ids (list[int])
- Return type:
int
- static delete_chunks_for_subdivision(session, subdivision_id)[source]¶
Delete all chunks belonging to one subdivision.
- Parameters:
session (Session)
subdivision_id (int)
- Return type:
int
- static delete_chunks_for_act(session, act_id)[source]¶
Delete all chunks belonging to one act.
- Parameters:
session (Session)
act_id (int)
- Return type:
int
- static insert_chunk_records(session, records)[source]¶
Insert a batch of precomputed chunk rows.
- Parameters:
session (Session)
records (list[dict[str, Any]])
- Return type:
None
- static clear_chunks(session)[source]¶
Delete every chunk row.
- Parameters:
session (Session)
- Return type:
int
- static clear_embedding_states(session)[source]¶
Delete every embedding-state row.
- Parameters:
session (Session)
- Return type:
int
- static clear_act_relations(session)[source]¶
Delete every extracted act relation.
- Parameters:
session (Session)
- Return type:
int
- static clear_act_summaries(session)[source]¶
Delete every persisted act summary.
- Parameters:
session (Session)
- Return type:
int
- search_bm25(query, top_k=None, language=None, filter_conditions=None)[source]¶
Full-text search using PostgreSQL native ts_rank_cd (BM25-like ranking). Returns subdivisions with relevance scores for hybrid RAG retrieval.
Searches across all active languages (FR + EN) in parallel and merges results by best score, unless a language override is provided via the
languageparameter orfilter_conditions['language'].- Parameters:
query (str) – Search query
top_k (int | None) – Number of results to return (defaults to config.search.default_limit)
language (str | None) – PostgreSQL text-search config override (e.g. ‘french’). When set, only that language is searched. Defaults to multilingual UNION.
filter_conditions (Dict[str, Any] | None) – Optional filters (e.g., {“act_id”: 123, “celex”: “32016R0679”}).
filter_conditions['language']restricts results to a single act language.
- Return type:
List[Dict[str, Any]]
- static get_conversation(session, conversation_id)[source]¶
Return a conversation session by identifier.
- Parameters:
session (Session)
conversation_id (str)
- Return type:
ConversationSQL | None
- static create_conversation(session, conversation_id, title, user_id=None)[source]¶
Create and flush a new conversation session.
- Parameters:
session (Session)
conversation_id (str)
title (str)
user_id (str | None)
- Return type:
- static list_conversations(session, user_id, limit=50)[source]¶
List the most recent conversations for one user.
- Parameters:
session (Session)
user_id (str | None)
limit (int)
- Return type:
List[ConversationSQL]
- static delete_conversation(session, conversation_id)[source]¶
Delete one conversation and report whether a row was removed.
- Parameters:
session (Session)
conversation_id (str)
- Return type:
bool
- static add_conversation_message(session, message_id, conversation_id, role, content, query_id=None, mode=None, metadata=None)[source]¶
Append and flush one message inside a conversation.
- Parameters:
session (Session)
message_id (str)
conversation_id (str)
role (str)
content (str)
query_id (str | None)
mode (str | None)
metadata (Dict[str, Any] | None)
- Return type:
- static get_conversation_messages(session, conversation_id, limit=20)[source]¶
Return the latest conversation messages in chronological order.
- Parameters:
session (Session)
conversation_id (str)
limit (int)
- Return type:
List[ConversationMessageSQL]
- static touch_conversation(session, conversation_id)[source]¶
Refresh the
updated_attimestamp for a conversation.- Parameters:
session (Session)
conversation_id (str)
- Return type:
None
- search_bm25_chunks(query, top_k=None, language=None, filter_conditions=None)[source]¶
Full-text search using PostgreSQL ts_rank_cd over chunk content. Returns chunks with relevance scores for fine-grained retrieval.
Searches across all active languages (FR + EN) in parallel and merges results by best score, unless a language override is provided via the
languageparameter orfilter_conditions['language'].- Parameters:
query (str)
top_k (int | None)
language (str | None)
filter_conditions (Dict[str, Any] | None)
- Return type:
List[Dict[str, Any]]
lalandre_db_qdrant¶
Source: packages/lalandre_db_qdrant/lalandre_db_qdrant/__init__.py
Qdrant repository package Provides vector database operations for similarity search
lalandre_db_qdrant.models¶
Source: packages/lalandre_db_qdrant/lalandre_db_qdrant/models.py
Qdrant vector models Data structures for vector database operations
- class lalandre_db_qdrant.models.VectorPoint(*, id, vector, payload)[source]¶
Bases:
BaseModelRepresents a point in vector space with metadata
Abstraction layer over Qdrant’s PointStruct to decouple application code from Qdrant implementation details.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (str | int)
vector (List[float])
payload (Dict[str, Any])
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_db_qdrant.models.SearchResult(*, id, score, payload)[source]¶
Bases:
BaseModelResult of a vector search
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (str)
score (float)
payload (Dict[str, Any])
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_db_qdrant.repository¶
Source: packages/lalandre_db_qdrant/lalandre_db_qdrant/repository.py
Qdrant repository implementation similarity search and vector retrieval
- class lalandre_db_qdrant.repository.QdrantRepository(host=None, port=None, collection_name=None, vector_size=None, api_key=None, use_https=None)[source]¶
Bases:
BaseRepositoryRepository for Qdrant vector database operations Handles all low-level operations including collection management and data ingestion
Collection naming: {base}_{model}_{dimension} Examples: - chunk_embeddings_mistral_1024 - chunk_embeddings_gte_large_1024 - chunk_embeddings_bge_m3_1024 - chunk_embeddings_e5_large_1024
This allows multiple embedding models to coexist without confusion.
- Parameters:
host (str | None)
port (int | None)
collection_name (str | None)
vector_size (int | None)
api_key (str | None)
use_https (bool | None)
- static make_collection_name(base_name, model_name, dimension)[source]¶
Generate collection name: {base}_{model}_{dimension}
- Parameters:
base_name (str) – Base collection name
model_name (str) – Model name (e.g., ‘thenlper/gte-large’)
dimension (int) – Vector dimension (e.g., 1024)
- Returns:
Collection name like ‘chunk_embeddings_gte_large_1024’
- Return type:
str
Example
>>> QdrantRepository.make_collection_name( ... 'chunk_embeddings', ... 'thenlper/gte-large', ... 1024 ... ) 'chunk_embeddings_gte_large_1024'
- property vector_size: int¶
Get vector size - auto-detects from existing collection if not set
- create_collection(recreate=False, distance=Distance.COSINE)[source]¶
- Parameters:
recreate (bool) – If True, delete existing collection and recreate
distance (Distance) – Distance metric (COSINE, EUCLID, DOT)
- Returns:
True if collection was created, False if already exists
- Return type:
bool
- classmethod from_embedding_service_with_auto_collection(embedding_service, base_collection_name=None)[source]¶
Create repository with automatic collection naming based on embedding model
Format: {base}_{model}_{dimension}
- Parameters:
embedding_service (Any) – EmbeddingService instance
base_collection_name (str | None) – Base collection name
- Returns:
QdrantRepository with auto-generated collection name
- Return type:
Example
>>> embedding_service = EmbeddingService(provider="local", model_name="thenlper/gte-large") >>> repo = QdrantRepository.from_embedding_service_with_auto_collection(embedding_service) >>> # Collection: 'chunk_embeddings_gte_large_1024'
- create_payload_index(field_name, field_schema)[source]¶
for efficient filtering
- Parameters:
field_name (str) – Name of the payload field to index
field_schema (PayloadSchemaType) – Schema type (KEYWORD, INTEGER, FLOAT, etc.)
- Returns:
True if index created successfully
- Return type:
bool
- setup_standard_indexes()[source]¶
Create standard payload indexes used for legal document filtering.
- Return type:
None
- upsert_points(points, batch_size=None)[source]¶
Insert or update points in the collection with retry on transient transport errors.
- Parameters:
points (Sequence[VectorPoint | PointStruct]) – List of VectorPoint or PointStruct objects to upsert
batch_size (int | None) – Optional batch size for large inserts
- Returns:
Number of points upserted
- Return type:
int
- delete_points(point_ids)[source]¶
Delete points by IDs from the current collection.
- Parameters:
point_ids (List[str | int])
- Return type:
int
- delete_points_by_filter(query_filter)[source]¶
Delete points matching a payload filter from the current collection.
- Parameters:
query_filter (Dict[str, Any] | Filter)
- Return type:
int
- search(query_vector, limit=None, score_threshold=None, query_filter=None, hnsw_ef=None, exact=None)[source]¶
Search for similar vectors :param query_vector: The query embedding :param limit: Maximum number of results (None -> config.search.default_limit) :param score_threshold: Minimum similarity score (0-1) :param query_filter: Metadata filters (e.g., {“document_type”: “directive”}) :param hnsw_ef: ANN breadth parameter for HNSW search (None -> config.search.hnsw_ef) :param exact: Force exact vector search (None -> config.search.exact_search)
- Parameters:
query_vector (List[float])
limit (int | None)
score_threshold (float | None)
query_filter (Dict[str, Any] | Filter | None)
hnsw_ef (int | None)
exact (bool | None)
- Return type:
List[SearchResult]
- retrieve_vectors_by_ids(point_ids)[source]¶
Retrieve stored embedding vectors by point IDs.
Returns a dict mapping point_id -> vector. Points not found in the collection are silently omitted.
- Parameters:
point_ids (List[int])
- Return type:
Dict[int, List[float]]