Core API¶
Note
This page is generated automatically from the repository’s maintained Python module inventory.
Shared runtime configuration, repositories, HTTP helpers, queue primitives, and utilities.
lalandre_core¶
Source: packages/lalandre_core/lalandre_core/__init__.py
Core shared utilities for Lalandre services.
lalandre_core.config¶
Source: packages/lalandre_core/lalandre_core/config.py
Configuration management for project
- class lalandre_core.config.EnvSettings(_case_sensitive=None, _nested_model_default_partial_update=None, _env_prefix=None, _env_prefix_target=None, _env_file=PosixPath('.'), _env_file_encoding=None, _env_ignore_empty=None, _env_nested_delimiter=None, _env_nested_max_split=None, _env_parse_none_str=None, _env_parse_enums=None, _cli_prog_name=None, _cli_parse_args=None, _cli_settings_source=None, _cli_parse_none_str=None, _cli_hide_none_type=None, _cli_avoid_json=None, _cli_enforce_required=None, _cli_use_class_docs_for_groups=None, _cli_exit_on_error=None, _cli_prefix=None, _cli_flag_prefix_char=None, _cli_implicit_flags=None, _cli_ignore_unknown_args=None, _cli_kebab_case=None, _cli_shortcuts=None, _secrets_dir=None, _build_sources=None, *, APP_CONFIG_FILE=None, APP_CONFIG_OVERRIDE_FILE=None, GATEWAY_ALLOWED_ORIGINS=None, DB_PASSWORD=None, QDRANT_API_KEY=None, NEO4J_PASSWORD=None, LLM_API_KEY=None, SEARCH_INTENT_PARSER_API_KEY=None, MISTRAL_API_KEY=None, MISTRAL_API_KEY_2=None, MISTRAL_API_KEY_3=None, MISTRAL_API_KEY_4=None, MISTRAL_API_KEY_5=None, MISTRAL_API_KEY_6=None, MISTRAL_API_KEY_7=None, MISTRAL_API_KEY_8=None, MISTRAL_API_KEY_9=None, MISTRAL_API_KEY_10=None)[source]¶
Bases:
BaseSettingsEnvironment-backed settings loaded before the YAML application config.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
_case_sensitive (bool | None)
_nested_model_default_partial_update (bool | None)
_env_prefix (str | None)
_env_prefix_target (EnvPrefixTarget | None)
_env_file (DotenvType | None)
_env_file_encoding (str | None)
_env_ignore_empty (bool | None)
_env_nested_delimiter (str | None)
_env_nested_max_split (int | None)
_env_parse_none_str (str | None)
_env_parse_enums (bool | None)
_cli_prog_name (str | None)
_cli_parse_args (bool | list[str] | tuple[str, ...] | None)
_cli_settings_source (CliSettingsSource[Any] | None)
_cli_parse_none_str (str | None)
_cli_hide_none_type (bool | None)
_cli_avoid_json (bool | None)
_cli_enforce_required (bool | None)
_cli_use_class_docs_for_groups (bool | None)
_cli_exit_on_error (bool | None)
_cli_prefix (str | None)
_cli_flag_prefix_char (str | None)
_cli_implicit_flags (bool | Literal['dual', 'toggle'] | None)
_cli_ignore_unknown_args (bool | None)
_cli_kebab_case (bool | Literal['all', 'no_enums'] | None)
_cli_shortcuts (Mapping[str, str | list[str]] | None)
_secrets_dir (PathType | None)
_build_sources (tuple[tuple[PydanticBaseSettingsSource, ...], dict[str, Any]] | None)
APP_CONFIG_FILE (str | None)
APP_CONFIG_OVERRIDE_FILE (str | None)
GATEWAY_ALLOWED_ORIGINS (str | None)
DB_PASSWORD (str | None)
QDRANT_API_KEY (str | None)
NEO4J_PASSWORD (str | None)
LLM_API_KEY (str | None)
SEARCH_INTENT_PARSER_API_KEY (str | None)
MISTRAL_API_KEY (str | None)
MISTRAL_API_KEY_2 (str | None)
MISTRAL_API_KEY_3 (str | None)
MISTRAL_API_KEY_4 (str | None)
MISTRAL_API_KEY_5 (str | None)
MISTRAL_API_KEY_6 (str | None)
MISTRAL_API_KEY_7 (str | None)
MISTRAL_API_KEY_8 (str | None)
MISTRAL_API_KEY_9 (str | None)
MISTRAL_API_KEY_10 (str | None)
- model_config: ClassVar[SettingsConfigDict] = {'arbitrary_types_allowed': True, 'case_sensitive': False, 'cli_avoid_json': False, 'cli_enforce_required': False, 'cli_exit_on_error': True, 'cli_flag_prefix_char': '-', 'cli_hide_none_type': False, 'cli_ignore_unknown_args': False, 'cli_implicit_flags': False, 'cli_kebab_case': False, 'cli_parse_args': None, 'cli_parse_none_str': None, 'cli_prefix': '', 'cli_prog_name': None, 'cli_shortcuts': None, 'cli_use_class_docs_for_groups': False, 'enable_decoding': True, 'env_file': '.env', 'env_file_encoding': 'utf-8', 'env_ignore_empty': True, 'env_nested_delimiter': None, 'env_nested_max_split': None, 'env_parse_enums': None, 'env_parse_none_str': None, 'env_prefix': '', 'env_prefix_target': 'variable', 'extra': 'ignore', 'json_file': None, 'json_file_encoding': None, 'nested_model_default_partial_update': False, 'protected_namespaces': ('model_validate', 'model_dump', 'settings_customise_sources'), 'secrets_dir': None, 'toml_file': None, 'validate_default': True, 'yaml_config_section': None, 'yaml_file': None, 'yaml_file_encoding': None}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.DatabaseConfig(*, host=None, port=None, database=None, user=None, password=None)[source]¶
Bases:
BaseModelPostgreSQL database configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
host (str | None)
port (int | None)
database (str | None)
user (str | None)
password (str | None)
- property connection_string: str¶
Return a PostgreSQL connection string for the configured database.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.VectorConfig(*, host=None, port=None, api_key=None, collection_chunks=None, collection_acts=None, vector_size=1024, timeout=30, use_https=False)[source]¶
Bases:
BaseModelQdrant configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
host (str | None)
port (int | None)
api_key (str | None)
collection_chunks (str | None)
collection_acts (str | None)
vector_size (int)
timeout (int)
use_https (bool)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.GraphConfig(*, uri=None, user=None, password=None, database=None, max_connection_lifetime=3600, max_connection_pool_size=50, connection_timeout=30, strict_mode=False, acts_limit=10, relationships_limit=20, depth=2, cypher_timeout_seconds=30.0, cypher_max_rows=80, ranking_relation_weights=<factory>, ranking_default_relation_weight=0.3, community_relation_weights=<factory>, community_default_relation_weight=0.5, ranking_hop_decay=0.5, ranking_semantic_boost=0.3, ranking_relation_weight_factor=0.25, budget_semantic_share=0.6, budget_graph_share=0.3, budget_relation_share=0.1, map_reduce_threshold=24000, map_reduce_chunk_chars=5000, map_reduce_max_parallel=3, map_reduce_map_timeout=45.0, map_reduce_reduce_timeout=50.0, expansion_relation_types=<factory>, expansion_max_related_per_node=50, expansion_max_relationships_per_node=100, use_graph_in_rag=True, hybrid_enrichment_depth=2, use_communities_in_rag=True, community_central_act_title_chars=60, community_central_acts_display=3)[source]¶
Bases:
BaseModelNeo4j configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
uri (str | None)
user (str | None)
password (str | None)
database (str | None)
max_connection_lifetime (int)
max_connection_pool_size (int)
connection_timeout (int)
strict_mode (bool)
acts_limit (int)
relationships_limit (int)
depth (int)
cypher_timeout_seconds (float)
cypher_max_rows (int)
ranking_relation_weights (Dict[str, float])
ranking_default_relation_weight (float)
community_relation_weights (Dict[str, float])
community_default_relation_weight (float)
ranking_hop_decay (float)
ranking_semantic_boost (float)
ranking_relation_weight_factor (float)
budget_semantic_share (float)
budget_graph_share (float)
budget_relation_share (float)
map_reduce_threshold (int)
map_reduce_chunk_chars (int)
map_reduce_max_parallel (int)
map_reduce_map_timeout (float)
map_reduce_reduce_timeout (float)
expansion_relation_types (list[str])
expansion_max_related_per_node (int)
expansion_max_relationships_per_node (int)
use_graph_in_rag (bool)
hybrid_enrichment_depth (int)
use_communities_in_rag (bool)
community_central_act_title_chars (int)
community_central_acts_display (int)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.TokenLimitsConfig(*, embedding_max_input_tokens=8192, chars_per_token=3.3, embedding_safety_ratio=0.9)[source]¶
Bases:
BaseModelToken limits for API models.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
embedding_max_input_tokens (int)
chars_per_token (float)
embedding_safety_ratio (float)
- property embedding_max_chars: int¶
Max characters for embedding based on token limit.
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.EmbeddingPresetConfig(*, preset_id, provider, model_name, device='cpu', label, enabled=True, indexing_enabled=True, queue_name=None, vector_size=1024)[source]¶
Bases:
BaseModelNamed embedding runtime preset used for indexing and query-time routing.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
preset_id (str)
provider (str)
model_name (str)
device (str)
label (str)
enabled (bool)
indexing_enabled (bool)
queue_name (str | None)
vector_size (int)
- resolved_queue_name()[source]¶
Return the queue name used by the embedding worker for this preset.
- Return type:
str
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.EmbeddingConfig(*, provider=None, model_name=None, batch_size=None, device=None, cache_dir=None, normalize_embeddings=True, enable_cache=True, cache_max_size=10000, redis_socket_timeout=2, cache_ttl_seconds=604800, retry_min_tokens=64, retry_fallback_threshold=96, retry_reduction_factor=0.7)[source]¶
Bases:
BaseModelEmbedding model configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
provider (str | None)
model_name (str | None)
batch_size (int | None)
device (str | None)
cache_dir (str | None)
normalize_embeddings (bool)
enable_cache (bool)
cache_max_size (int)
redis_socket_timeout (int)
cache_ttl_seconds (int)
retry_min_tokens (int)
retry_fallback_threshold (int)
retry_reduction_factor (float)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.ChunkingEmbeddingConfig(*, provider='mistral', model_name='mistral-embed', device='cpu')[source]¶
Bases:
BaseModelEmbedding runtime used internally by the chunking algorithm.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
provider (str)
model_name (str)
device (str)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.SearchConfig(*, default_limit=10, default_mode='rag', default_search_mode='hybrid', default_granularity='all', default_embedding_preset='mistral', fulltext_language=None, bm25_normalization=32, fusion_method=None, lexical_weight=None, semantic_weight=None, candidate_multiplier=None, min_candidates=None, max_candidates=200, semantic_per_collection_oversampling=1.25, hnsw_ef=None, exact_search=False, query_expansion_enabled=True, query_expansion_max_variants=3, query_expansion_min_query_chars=24, intent_parser_enabled=False, intent_parser_provider=None, intent_parser_model=None, intent_parser_base_url=None, intent_parser_api_key=None, intent_parser_timeout_seconds=20.0, intent_parser_temperature=0.0, intent_parser_max_output_tokens=180, rerank_enabled=True, rerank_model='BAAI/bge-reranker-v2-m3', rerank_device='cpu', rerank_batch_size=4, rerank_max_candidates=5, rerank_max_chars=256, rerank_cache_dir=None, rerank_service_url=None, rerank_service_timeout_seconds=15.0, rerank_fallback_to_skip=True, rerank_circuit_failure_threshold=2, rerank_circuit_cooldown_seconds=30.0, score_threshold_default=0.15, relevance_gate_threshold=0.35, max_lexical_query_chars=200, fts_max_lexemes=12, dynamic_fusion_enabled=True, lexical_boost_factor=1.8, lexical_boost_max=0.75, result_cache_ttl_seconds=300, query_router_broad_query_min_chars=220, query_router_global_overview_min_top_k=10, query_router_citation_precision_min_top_k=7, query_router_relationship_focus_min_top_k=8, query_router_contextual_default_min_top_k=6, fusion_rrf_k=60, query_expansion_max_variants_cap=8, query_expansion_abbreviation_weight=0.96, query_expansion_keyword_focus_weight=0.92, query_expansion_reference_focus_weight=0.9, query_expansion_bilingual_weight=0.88, adaptive_score_drop_threshold=0.15, complementary_max_queries=2, complementary_top_k=5, compression_threshold_ratio=1.3, mmr_enabled=True, mmr_max_per_act=2, crag_enabled=False, crag_max_iterations=1, crag_skip_score_threshold=0.82, max_parallel_workers=4, query_parser_max_top_k=40, intent_parser_min_output_tokens=80, summary_min_chars=50)[source]¶
Bases:
BaseModelSearch configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
default_limit (int)
default_mode (str)
default_search_mode (str)
default_granularity (str)
default_embedding_preset (str)
fulltext_language (str | None)
bm25_normalization (int)
fusion_method (str | None)
lexical_weight (float | None)
semantic_weight (float | None)
candidate_multiplier (float | None)
min_candidates (int | None)
max_candidates (int)
semantic_per_collection_oversampling (float)
hnsw_ef (int | None)
exact_search (bool)
query_expansion_enabled (bool)
query_expansion_max_variants (int)
query_expansion_min_query_chars (int)
intent_parser_enabled (bool)
intent_parser_provider (str | None)
intent_parser_model (str | None)
intent_parser_base_url (str | None)
intent_parser_api_key (str | None)
intent_parser_timeout_seconds (float)
intent_parser_temperature (float)
intent_parser_max_output_tokens (int)
rerank_enabled (bool)
rerank_model (str)
rerank_device (str)
rerank_batch_size (int)
rerank_max_candidates (int)
rerank_max_chars (int)
rerank_cache_dir (str | None)
rerank_service_url (str | None)
rerank_service_timeout_seconds (float)
rerank_fallback_to_skip (bool)
rerank_circuit_failure_threshold (int)
rerank_circuit_cooldown_seconds (float)
score_threshold_default (float | None)
relevance_gate_threshold (float | None)
max_lexical_query_chars (int)
fts_max_lexemes (int)
dynamic_fusion_enabled (bool)
lexical_boost_factor (float)
lexical_boost_max (float)
result_cache_ttl_seconds (int)
query_router_broad_query_min_chars (int)
query_router_global_overview_min_top_k (int)
query_router_citation_precision_min_top_k (int)
query_router_relationship_focus_min_top_k (int)
query_router_contextual_default_min_top_k (int)
fusion_rrf_k (int)
query_expansion_max_variants_cap (int)
query_expansion_abbreviation_weight (float)
query_expansion_keyword_focus_weight (float)
query_expansion_reference_focus_weight (float)
query_expansion_bilingual_weight (float)
adaptive_score_drop_threshold (float | None)
complementary_max_queries (int)
complementary_top_k (int)
compression_threshold_ratio (float)
mmr_enabled (bool)
mmr_max_per_act (int)
crag_enabled (bool)
crag_max_iterations (int)
crag_skip_score_threshold (float)
max_parallel_workers (int)
query_parser_max_top_k (int)
intent_parser_min_output_tokens (int)
summary_min_chars (int)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.GenerationConfig(*, provider='mistral', model_name=None, temperature=None, max_tokens=8000, max_context_chars=20000, summarize_max_context_chars=60000, base_url=None, mistral_base_url='https://api.mistral.ai/v1', context_window=32000, api_key=None, timeout_seconds=45.0, lightweight_model_name=None, key_pool_max=10)[source]¶
Bases:
BaseModelLLM generation configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
provider (str)
model_name (str | None)
temperature (float | None)
max_tokens (int)
max_context_chars (int)
summarize_max_context_chars (int)
base_url (str | None)
mistral_base_url (str)
context_window (int)
api_key (str | None)
timeout_seconds (float)
lightweight_model_name (str | None)
key_pool_max (int)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.ChunkingConfig(*, min_chunk_size, max_chunk_size, chunk_overlap=0, subdivision_max_chars=30000, extraction_max_chunk_chars=3200, breakpoint_percentile=90.0, breakpoint_max_threshold=1.0, sentence_window_size=1, embedding_batch_size=32, article_level_chunking=True, embedding=<factory>)[source]¶
Bases:
BaseModelChunking configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
min_chunk_size (int)
max_chunk_size (int)
chunk_overlap (int)
subdivision_max_chars (int)
extraction_max_chunk_chars (int)
breakpoint_percentile (float)
breakpoint_max_threshold (float)
sentence_window_size (int)
embedding_batch_size (int)
article_level_chunking (bool)
embedding (ChunkingEmbeddingConfig)
- resolve_max_chunk_size(token_limits)[source]¶
Cap max_chunk_size so it never exceeds the global embedding token budget.
Each embedding model handles its own per-provider limit at embed time (split + weighted-average for oversized chunks). This guard only prevents chunks from exceeding the largest model’s hard ceiling.
- Parameters:
token_limits (TokenLimitsConfig)
- Return type:
int
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.ContextBudgetConfig(*, rag_max_sources=10, rag_min_chars_per_source=200, rag_relation_lines=8, global_reports_share=0.45, global_sources_share=0.55, global_max_reports=4, global_min_cluster_size=2, global_max_evidence_per_report=3, global_max_source_docs=7, standard_relation_budget_fraction=0.15, global_graph_budget_fraction=0.1, community_top_relation_types=5, community_central_acts=3, content_preview_chars=200, snippet_preview_chars=300, fallback_preview_chars=180, compression_min_chars=3000, compression_min_budget=500)[source]¶
Bases:
BaseModelToken/character budgets used to compose RAG context blocks.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
rag_max_sources (int)
rag_min_chars_per_source (int)
rag_relation_lines (int)
global_reports_share (float)
global_sources_share (float)
global_max_reports (int)
global_min_cluster_size (int)
global_max_evidence_per_report (int)
global_max_source_docs (int)
standard_relation_budget_fraction (float)
global_graph_budget_fraction (float)
community_top_relation_types (int)
community_central_acts (int)
content_preview_chars (int)
snippet_preview_chars (int)
fallback_preview_chars (int)
compression_min_chars (int)
compression_min_budget (int)
Return normalized (reports_share, sources_share) ratios for global mode.
- Return type:
tuple[float, float]
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.GatewayConfig(*, redis_host=None, redis_port=None, rag_service_url=None, embedding_service_url=None, rerank_service_url=None, allowed_origins=None, auto_bootstrap=False, job_ttl_seconds=None, bootstrap_lock_ttl_seconds=None, healthcheck_timeout_seconds=5.0, rag_proxy_timeout_seconds=300.0, rate_limit_query='20/minute', rate_limit_stream='15/minute', rate_limit_search='30/minute', rate_limit_jobs='10/minute', job_chunk_min_content_length=None, job_embed_batch_size=None, job_extract_min_confidence=None, job_extract_skip_existing_default=None)[source]¶
Bases:
BaseModelAPI Gateway configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
redis_host (str | None)
redis_port (int | None)
rag_service_url (str | None)
embedding_service_url (str | None)
rerank_service_url (str | None)
allowed_origins (list[str] | None)
auto_bootstrap (bool)
job_ttl_seconds (int | None)
bootstrap_lock_ttl_seconds (int | None)
healthcheck_timeout_seconds (float)
rag_proxy_timeout_seconds (float)
rate_limit_query (str)
rate_limit_stream (str)
rate_limit_search (str)
rate_limit_jobs (str)
job_chunk_min_content_length (int | None)
job_embed_batch_size (int | None)
job_extract_min_confidence (float | None)
job_extract_skip_existing_default (bool | None)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.ExtractionConfidenceConfig(*, base=0.75, non_cites_bonus=0.03, explicit_resolution_bonus=0.1, alias_resolution_bonus=0.05, normalize_fallback_score=0.0, fuzzy_min_factor=0.75, evidence_min_chars=20, evidence_bonus=0.02, max_confidence=0.95)[source]¶
Bases:
BaseModelTuning knobs for post-extraction confidence scoring.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
base (float)
non_cites_bonus (float)
explicit_resolution_bonus (float)
alias_resolution_bonus (float)
normalize_fallback_score (float)
fuzzy_min_factor (float)
evidence_min_chars (int)
evidence_bonus (float)
max_confidence (float)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.ExtractionConfig(*, llm_provider='mistral', llm_model='mistral-small-latest', llm_base_url='https://api.mistral.ai/v1', llm_timeout_seconds=120.0, llm_temperature=0.0, llm_max_output_tokens=1024, llm_min_output_tokens=80, llm_system_prompt='You are an EU/FR legal relation extractor. Return valid JSON only.', llm_min_evidence_chars=8, llm_min_rationale_chars=24, llm_max_parallel_chunks=2, llm_chunk_cache_size=256, validation_enabled=True, min_evidence_chars=28, min_description_chars=240, entity_linker_fuzzy_threshold=0.89, entity_linker_fuzzy_min_gap=0.03, entity_linker_fuzzy_limit=2, entity_linker_min_alias_chars=6, confidence=<factory>, max_evidence_chars=420)[source]¶
Bases:
BaseModelExtraction LLM behavior configuration.
Two-stage filtering: -
llm_min_*fields apply during raw LLM output parsing (first pass). -min_evidence_charsapplies during post-extraction validation (second pass).Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
llm_provider (str)
llm_model (str)
llm_base_url (str)
llm_timeout_seconds (float)
llm_temperature (float)
llm_max_output_tokens (int)
llm_min_output_tokens (int)
llm_system_prompt (str)
llm_min_evidence_chars (int)
llm_min_rationale_chars (int)
llm_max_parallel_chunks (int)
llm_chunk_cache_size (int)
validation_enabled (bool)
min_evidence_chars (int)
min_description_chars (int)
entity_linker_fuzzy_threshold (float)
entity_linker_fuzzy_min_gap (float)
entity_linker_fuzzy_limit (int)
entity_linker_min_alias_chars (int)
confidence (ExtractionConfidenceConfig)
max_evidence_chars (int)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.WorkersConfig(*, brpop_timeout_seconds=1, chunk_db_commit_batch_size=10, embed_worker_max_batch_size=32, embed_qdrant_upsert_batch_size=1000, auto_embed_reconcile=True, auto_embed_reconcile_interval=300, auto_embed_reconcile_ttl=600, auto_chunk_reconcile=True, auto_chunk_reconcile_interval=300, auto_chunk_reconcile_ttl=600, auto_extract_reconcile=True, auto_extract_reconcile_interval=600, auto_extract_reconcile_ttl=600, extract_metrics_port=9107, embed_metrics_port=9108, chunk_metrics_port=9109, extract_stale_timeout_minutes=60, community_resolution=1.0, community_min_size=2)[source]¶
Bases:
BaseModelWorker runtime tuning.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
brpop_timeout_seconds (int)
chunk_db_commit_batch_size (int)
embed_worker_max_batch_size (int)
embed_qdrant_upsert_batch_size (int)
auto_embed_reconcile (bool)
auto_embed_reconcile_interval (int)
auto_embed_reconcile_ttl (int)
auto_chunk_reconcile (bool)
auto_chunk_reconcile_interval (int)
auto_chunk_reconcile_ttl (int)
auto_extract_reconcile (bool)
auto_extract_reconcile_interval (int)
auto_extract_reconcile_ttl (int)
extract_metrics_port (int)
embed_metrics_port (int)
chunk_metrics_port (int)
extract_stale_timeout_minutes (int)
community_resolution (float)
community_min_size (int)
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class lalandre_core.config.LalandreConfig(*, database=<factory>, vector=<factory>, graph=<factory>, token_limits=<factory>, embedding=<factory>, embedding_presets=<factory>, search=<factory>, generation=<factory>, chunking, context_budget=<factory>, gateway=<factory>, extraction=<factory>, workers=<factory>, models_cache_dir=None)[source]¶
Bases:
BaseModelMain configuration class.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
database (DatabaseConfig)
vector (VectorConfig)
graph (GraphConfig)
token_limits (TokenLimitsConfig)
embedding (EmbeddingConfig)
embedding_presets (list[EmbeddingPresetConfig])
search (SearchConfig)
generation (GenerationConfig)
chunking (ChunkingConfig)
context_budget (ContextBudgetConfig)
gateway (GatewayConfig)
extraction (ExtractionConfig)
workers (WorkersConfig)
models_cache_dir (str | None)
- enabled_embedding_presets()[source]¶
Return the embedding presets that are enabled for runtime use.
- Return type:
list[EmbeddingPresetConfig]
- indexing_enabled_embedding_presets()[source]¶
Return the embedding presets that are enabled for indexing workflows.
- Return type:
list[EmbeddingPresetConfig]
- get_embedding_preset(preset_id)[source]¶
Return one embedding preset by ID, or
Nonewhen it is unknown.- Parameters:
preset_id (str | None)
- Return type:
EmbeddingPresetConfig | None
- get_default_embedding_preset()[source]¶
Return the default enabled embedding preset for query-time operations.
- Return type:
- model_config: ClassVar[ConfigDict] = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- lalandre_core.config.get_env_settings()[source]¶
Get or create the environment settings instance.
- Return type:
- lalandre_core.config.get_config()[source]¶
Get or create global configuration instance.
- Return type:
- lalandre_core.config.reset_config()[source]¶
Invalidate the config singleton so the next call to get_config() reloads from disk.
- Return type:
None
- lalandre_core.config.get_postgres_connection_string()[source]¶
Get PostgreSQL connection string.
- Return type:
str
lalandre_core.embedding_presets¶
Source: packages/lalandre_core/lalandre_core/embedding_presets.py
Helpers for embedding preset resolution across services.
- lalandre_core.embedding_presets.list_embedding_presets(*, enabled_only=False, indexing_only=False)[source]¶
Return configured embedding presets.
- Parameters:
enabled_only (bool)
indexing_only (bool)
- Return type:
list[EmbeddingPresetConfig]
- lalandre_core.embedding_presets.get_embedding_preset(preset_id, *, enabled_only=False, indexing_only=False)[source]¶
Resolve a preset by ID.
- Parameters:
preset_id (str | None)
enabled_only (bool)
indexing_only (bool)
- Return type:
EmbeddingPresetConfig | None
- lalandre_core.embedding_presets.get_default_embedding_preset()[source]¶
Return the configured default embedding preset.
- Return type:
- lalandre_core.embedding_presets.get_default_embedding_preset_id()[source]¶
Return the ID of the default embedding preset.
- Return type:
str
- lalandre_core.embedding_presets.resolve_embedding_preset_or_default(preset_id)[source]¶
Resolve a preset, falling back to the configured default when missing/invalid.
- Parameters:
preset_id (str | None)
- Return type:
- lalandre_core.embedding_presets.resolve_embed_queue_name(preset_id=None)[source]¶
Return the queue name for a preset, defaulting to the configured default preset.
- Parameters:
preset_id (str | None)
- Return type:
str
lalandre_core.http¶
Source: packages/lalandre_core/lalandre_core/http/__init__.py
HTTP helpers shared across Lalandre services.
lalandre_core.http.llm_client¶
Source: packages/lalandre_core/lalandre_core/http/llm_client.py
Shared HTTP client for compact JSON-oriented LLM calls.
- lalandre_core.http.llm_client.coerce_json_object(value)[source]¶
Safely coerce a runtime value to a JSON object with string keys.
- Parameters:
value (Any)
- Return type:
Dict[str, Any] | None
- class lalandre_core.http.llm_client.JSONHTTPLLMClient(provider, model, base_url, timeout_seconds, max_output_tokens, temperature, api_key=None, system_prompt='Return valid JSON only.', error_preview_chars=240)[source]¶
Bases:
objectThin HTTP client for OpenAI-compatible JSON generation.
- Parameters:
provider (str)
model (str)
base_url (str)
timeout_seconds (float)
max_output_tokens (int)
temperature (float)
api_key (str | None)
system_prompt (str)
error_preview_chars (int)
Bases:
objectDispatch JSON HTTP LLM calls through a shared API key pool.
- Parameters:
key_pool (APIKeyPool)
clients_by_key (Mapping[str, JSONHTTPLLMClient])
Build one JSON HTTP client per API key and wrap them in a shared pool.
- Parameters:
key_pool (APIKeyPool)
provider (str)
model (str)
base_url (str)
timeout_seconds (float)
max_output_tokens (int)
temperature (float)
system_prompt (str)
error_preview_chars (int)
- Return type:
Generate one JSON response using the next key selected by the pool.
- Parameters:
prompt (str)
- Return type:
str
lalandre_core.http.middleware¶
Source: packages/lalandre_core/lalandre_core/http/middleware.py
Reusable HTTP instrumentation middleware factory for FastAPI services.
lalandre_core.linking¶
Source: packages/lalandre_core/lalandre_core/linking/__init__.py
Entity linking: resolve legal references to canonical CELEX identifiers.
lalandre_core.linking.entity_linker¶
Source: packages/lalandre_core/lalandre_core/linking/entity_linker.py
Local entity linking utilities for legal acts (UE/France).
- class lalandre_core.linking.entity_linker.ActAliasEntry(celex, title, aliases=(), act_id=None, eli=None, acronyms=())[source]¶
Bases:
objectCanonical act entry and its known alias forms.
- Parameters:
celex (str)
title (str)
aliases (tuple[str, ...])
act_id (int | None)
eli (str | None)
acronyms (tuple[str, ...])
- eli: str | None = None¶
Optional European Legislation Identifier URI for interop with ELI-aware systems.
- acronyms: tuple[str, ...] = ()¶
Short acronyms (DORA, CRR, MAR, …) that bypass
min_alias_chars.
- class lalandre_core.linking.entity_linker.LinkResolution(celex, score, method, matched_text, act_id=None, subdivision_id=None, article_number=None, eli=None)[source]¶
Bases:
objectResolved reference returned by the entity linker.
- Parameters:
celex (str)
score (float)
method (str)
matched_text (str)
act_id (int | None)
subdivision_id (int | None)
article_number (str | None)
eli (str | None)
- eli: str | None = None¶
Canonical ELI URI of the resolved act (propagated from the matching entry).
- class lalandre_core.linking.entity_linker.LegalEntityLinker(entries, *, fuzzy_threshold, fuzzy_min_gap, fuzzy_limit=2, min_alias_chars, article_lookup=None)[source]¶
Bases:
objectResolve legal references to canonical CELEX-like identifiers.
- Parameters:
entries (Iterable[ActAliasEntry])
fuzzy_threshold (float)
fuzzy_min_gap (float)
fuzzy_limit (int)
min_alias_chars (int)
article_lookup (Callable[[int, str], int | None] | None)
- property alias_count: int¶
Return the number of normalized aliases indexed by the linker.
- classmethod derive_acronyms(title)[source]¶
Extract short acronyms from a title’s parenthesised content.
Returns a tuple of strings like
("DORA",)for a title that containsDigital Operational Resilience Regulation (DORA). The caller passes this toActAliasEntry.acronymsso the linker can match them without applyingmin_alias_chars.- Parameters:
title (str)
- Return type:
tuple[str, …]
- classmethod derive_aliases(title, *, eli=None, official_journal_reference=None, form_number=None)[source]¶
Derive stable alias candidates from act metadata fields.
- Parameters:
title (str)
eli (str | None)
official_journal_reference (str | None)
form_number (str | None)
- Return type:
tuple[str, …]
- resolve(reference)[source]¶
Resolve a free-text legal reference to a canonical CELEX identifier.
- Parameters:
reference (str)
- Return type:
LinkResolution | None
- resolve_with_article(reference, article_number=None, *, article_lookup=None)[source]¶
Resolve a reference and optionally enrich with a subdivision_id for an article.
If
article_numberis provided and the act is known, try to resolve the corresponding subdivision id viaarticle_lookup(or the linker’s default one). Falls back to returning the base resolution if the article can’t be resolved.- Parameters:
reference (str)
article_number (str | None)
article_lookup (Callable[[int, str], int | None] | None)
- Return type:
LinkResolution | None
lalandre_core.linking.heuristics¶
Source: packages/lalandre_core/lalandre_core/linking/heuristics.py
Shared heuristics and regex patterns for legal entity linking. Centralized here to keep extraction, RAG, and validation rules in sync. Values are package-local (not config-driven) to avoid leaking concerns.
lalandre_core.linking.ner_client¶
Source: packages/lalandre_core/lalandre_core/linking/ner_client.py
Tiny HTTP client for the dedicated NER service.
The NER service exposes a single POST /detect endpoint that runs GLiNER
in zero-shot mode. The client is deliberately minimal: it does one synchronous
request per call, surfaces a typed result, and never raises on network errors —
it returns an empty span list and lets the caller log/skip.
Designed to be reused outside RAG (extraction pipeline, evaluation scripts).
- class lalandre_core.linking.ner_client.NerClient(base_url, *, timeout_seconds=5.0, default_threshold=0.5, default_entity_types=None)[source]¶
Bases:
objectThin HTTP wrapper around
ner-service /detect.Failure modes (network error, non-2xx, malformed JSON) are swallowed and logged at WARNING level; the call returns
[]in that case so the caller can degrade gracefully.- Parameters:
base_url (str)
timeout_seconds (float)
default_threshold (float)
default_entity_types (Optional[Iterable[str]])
- property base_url: str¶
Return the base URL of the configured NER service.
lalandre_core.llm¶
Source: packages/lalandre_core/lalandre_core/llm/__init__.py
Shared LLM utilities for provider normalization and ChatModel construction.
lalandre_core.llm.langchain¶
Source: packages/lalandre_core/lalandre_core/llm/langchain.py
Unified LangChain ChatModel factory.
- class lalandre_core.llm.langchain.PooledChatModel(models)[source]¶
Bases:
RunnableRound-robin wrapper over multiple ChatModel instances.
- Parameters:
models (List[Any])
- invoke(input, config=None, **kwargs)[source]¶
Invoke the next model in the pool with one request.
- Parameters:
input (Any)
config (Any)
kwargs (Any)
- Return type:
Any
Bases:
RunnableDispatch each LangChain call through a shared API key pool.
- Parameters:
key_pool (APIKeyPool)
models_by_key (Mapping[str, Any])
Invoke the model selected by the shared API key pool.
- Parameters:
input (Any)
config (Any)
kwargs (Any)
- Return type:
Any
Stream a response from the model selected by the shared API key pool.
- Parameters:
input (Any)
config (Any)
kwargs (Any)
- Return type:
Iterator[Any]
Execute a batch call through the model selected by the shared API key pool.
- Parameters:
inputs (List[Any])
config (Any)
kwargs (Any)
- Return type:
List[Any]
- lalandre_core.llm.langchain.build_chat_model(*, provider, model, api_key, base_url='', temperature=0.0, max_tokens=None, timeout_seconds=None)[source]¶
Build a LangChain ChatModel (ChatMistralAI or ChatOpenAI).
Returns the raw ChatModel instance — callers can wrap it (e.g. LangchainLLMWrapper, StrOutputParser) as needed.
- Parameters:
provider (str)
model (str)
api_key (str)
base_url (str)
temperature (float)
max_tokens (int | None)
timeout_seconds (float | None)
- Return type:
Any
lalandre_core.llm.providers¶
Source: packages/lalandre_core/lalandre_core/llm/providers.py
Shared LLM provider utilities: normalization, URL resolution, API key resolution.
- lalandre_core.llm.providers.normalize_provider(provider)[source]¶
Normalize provider name: strip, lowercase, openai → openai_compatible.
- Parameters:
provider (str)
- Return type:
str
lalandre_core.llm.structured¶
Source: packages/lalandre_core/lalandre_core/llm/structured.py
Shared helpers for running PydanticAI structured-output agents.
Extracted from lalandre_rag.agentic.tools so that any package
(extraction, RAG, summaries) can reuse the same FunctionModel bridge
without depending on the RAG layer.
- lalandre_core.llm.structured.json_payload_from_text(raw)[source]¶
Extract a JSON object from potentially noisy LLM text.
- Parameters:
raw (str)
- Return type:
dict[str, Any] | None
- lalandre_core.llm.structured.build_structured_prompt(*, messages, agent_info)[source]¶
Build a single text prompt from PydanticAI messages + output schema.
- Parameters:
messages (list[Annotated[ModelRequest | ModelResponse, Discriminator(discriminator=kind, custom_error_type=None, custom_error_message=None, custom_error_context=None)]])
agent_info (AgentInfo)
- Return type:
str
- lalandre_core.llm.structured.to_text_generator(llm_or_generate)[source]¶
Normalize an LLM object or callable into a simple
str -> strfunction.- Parameters:
llm_or_generate (Any)
- Return type:
Callable[[str], str]
- lalandre_core.llm.structured.run_structured_agent(*, agent, prompt, llm_or_generate, model_name)[source]¶
Run a PydanticAI agent using a FunctionModel bridge to any LLM.
Returns
(output, retries)where retries is the number of output-validation retries triggered.- Parameters:
agent (Agent[Any, T])
prompt (str)
llm_or_generate (Any)
model_name (str)
- Return type:
tuple[T, int]
lalandre_core.logging_setup¶
Source: packages/lalandre_core/lalandre_core/logging_setup.py
Shared logging configuration for Lalandre workers.
lalandre_core.models¶
Source: packages/lalandre_core/lalandre_core/models/__init__.py
Data models
lalandre_core.models.act_metadata¶
Source: packages/lalandre_core/lalandre_core/models/act_metadata.py
Pydantic model for key-value metadata attached to one act.
- class lalandre_core.models.act_metadata.ActMetadata(*, id=None, act_id, key, value, created_at=None)[source]¶
Bases:
BaseModelRepresent one metadata entry linked to a legal act.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
act_id (int)
key (str)
value (str)
created_at (datetime | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.act_relations¶
Source: packages/lalandre_core/lalandre_core/models/act_relations.py
Pydantic model for relationships extracted between legal acts.
- class lalandre_core.models.act_relations.ActRelations(*, id=None, source_act_id, target_act_id=None, target_celex=None, relation_type, source_subdivision_id=None, target_subdivision_id=None, effect_date=None, description=None, evidence=None, rationale=None, resolution_method=None, resolution_score=None, target_reference=None, confidence=None, source=None, validated=False, synced_to_neo4j_at=None, is_resolved=True, created_at=None)[source]¶
Bases:
BaseModelRepresent one typed relationship between two legal acts.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
source_act_id (int)
target_act_id (int | None)
target_celex (str | None)
relation_type (RelationType)
source_subdivision_id (int | None)
target_subdivision_id (int | None)
effect_date (datetime | None)
description (str | None)
evidence (str | None)
rationale (str | None)
resolution_method (str | None)
resolution_score (float | None)
target_reference (str | None)
confidence (float | None)
source (str | None)
validated (bool | None)
synced_to_neo4j_at (datetime | None)
is_resolved (bool | None)
created_at (datetime | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.act_subjects¶
Source: packages/lalandre_core/lalandre_core/models/act_subjects.py
Pydantic model for the act-to-subject association table.
- class lalandre_core.models.act_subjects.ActSubjects(*, act_id, subject_id)[source]¶
Bases:
BaseModelRepresent one link between an act and a subject matter.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
act_id (int)
subject_id (int)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.acts¶
Source: packages/lalandre_core/lalandre_core/models/acts.py
Pydantic model for top-level legal act records.
- class lalandre_core.models.acts.Acts(*, id=None, celex, eli=None, act_type, title, language, adoption_date=None, force_date=None, end_date=None, official_journal_reference=None, sector=None, level=None, form_number=None, url_eurlex=None, created_at=None, updated_at=None, last_synced_at=None, content_hash=None, sync_status='pending', extracted_at=None, extraction_status='pending')[source]¶
Bases:
BaseModelRepresent one legal act stored in the core domain model.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
celex (str)
eli (str | None)
act_type (ActType)
title (str)
language (LanguageCode)
adoption_date (datetime | None)
force_date (datetime | None)
end_date (datetime | None)
official_journal_reference (str | None)
sector (int | None)
level (int | None)
form_number (str | None)
url_eurlex (str | None)
created_at (datetime | None)
updated_at (datetime | None)
last_synced_at (datetime | None)
content_hash (str | None)
sync_status (str | None)
extracted_at (datetime | None)
extraction_status (str | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.chunks¶
Source: packages/lalandre_core/lalandre_core/models/chunks.py
Pydantic model for chunk records derived from subdivisions.
- class lalandre_core.models.chunks.Chunks(*, id=None, subdivision_id, chunk_index, content, char_start, char_end, token_count=None, chunk_metadata=None, created_at=None)[source]¶
Bases:
BaseModelRepresent one persisted chunk of subdivision content.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
subdivision_id (int)
chunk_index (int)
content (str)
char_start (int)
char_end (int)
token_count (int | None)
chunk_metadata (dict[str, Any] | None)
created_at (datetime | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.embedding_state¶
Source: packages/lalandre_core/lalandre_core/models/embedding_state.py
Pydantic model tracking the embedding status of stored objects.
- class lalandre_core.models.embedding_state.EmbeddingState(*, id=None, object_type, object_id, provider, model_name, vector_size, content_hash, embedded_at=None)[source]¶
Bases:
BaseModelRepresent one embedding state snapshot for an act, chunk, or subdivision.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
object_type (Literal['subdivision', 'chunk', 'act'])
object_id (int)
provider (str)
model_name (str)
vector_size (int)
content_hash (str)
embedded_at (datetime | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.subdivisions¶
Source: packages/lalandre_core/lalandre_core/models/subdivisions.py
Pydantic model for hierarchical subdivisions inside one act.
- class lalandre_core.models.subdivisions.Subdivisions(*, id=None, act_id, version_id=None, parent_id=None, subdivision_type, number=None, title=None, content, content_hash=None, sequence_order, hierarchy_path, depth=0, created_at=None)[source]¶
Bases:
BaseModelRepresent one structured subdivision extracted from an act.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
act_id (int)
version_id (int | None)
parent_id (int | None)
subdivision_type (SubdivisionType)
number (str | None)
title (str | None)
content (str)
content_hash (str | None)
sequence_order (int)
hierarchy_path (str)
depth (int)
created_at (datetime | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.subject_matters¶
Source: packages/lalandre_core/lalandre_core/models/subject_matters.py
Pydantic model for EuroVoc subject matter records.
- class lalandre_core.models.subject_matters.SubjectMatters(*, id=None, eurovoc_code, label_en, label_fr=None, parent_code=None)[source]¶
Bases:
BaseModelRepresent one subject matter entry used to classify acts.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
eurovoc_code (str)
label_en (str)
label_fr (str | None)
parent_code (str | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.models.types¶
Source: packages/lalandre_core/lalandre_core/models/types/__init__.py
Type enumerations
lalandre_core.models.types.act_type¶
Source: packages/lalandre_core/lalandre_core/models/types/act_type.py
Enumeration of legal act categories handled by the platform.
lalandre_core.models.types.language_code¶
Source: packages/lalandre_core/lalandre_core/models/types/language_code.py
Enumeration of language codes supported by the core models.
lalandre_core.models.types.relation_type¶
Source: packages/lalandre_core/lalandre_core/models/types/relation_type.py
Enumeration of supported relationship types between legal acts.
lalandre_core.models.types.subdivision_type¶
Source: packages/lalandre_core/lalandre_core/models/types/subdivision_type.py
Enumeration of structured subdivision kinds extracted from acts.
lalandre_core.models.types.version_type¶
Source: packages/lalandre_core/lalandre_core/models/types/version_type.py
Enumeration of act version categories stored by the platform.
lalandre_core.models.versions¶
Source: packages/lalandre_core/lalandre_core/models/versions.py
Pydantic model for version records associated with one act.
- class lalandre_core.models.versions.Versions(*, id=None, act_id, version_number, version_type, version_date, source_url=None, is_current=False, created_at=None)[source]¶
Bases:
BaseModelRepresent one dated version of a legal act.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
id (int | None)
act_id (int)
version_number (int)
version_type (VersionType)
version_date (datetime)
source_url (str | None)
is_current (bool)
created_at (datetime | None)
- model_config: ClassVar[ConfigDict] = {'from_attributes': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
lalandre_core.queue¶
Source: packages/lalandre_core/lalandre_core/queue/__init__.py
Shared Redis queue helpers for chunking, embedding, and extraction workers.
lalandre_core.queue.dispatch_all¶
Source: packages/lalandre_core/lalandre_core/queue/dispatch_all.py
Generic dispatch-all-acts helper shared by workers.
- lalandre_core.queue.dispatch_all.dispatch_all_act_jobs(*, runtime, job_id, queue_name, job_type, label, acts, build_params, skip_filter=None, error_label='Processing')[source]¶
Iterate over acts and enqueue one per-act job with progress tracking.
- Parameters:
runtime (QueueRuntime) – Queue runtime containing the Redis client and TTL policy.
job_id (str) – Parent job identifier used for status updates.
queue_name (str) – Redis queue that receives the child jobs.
job_type (str) – Job type string for the child jobs, such as
"chunk_act".label (str) – Human-readable label used in log and status messages.
acts (list[Any]) – Iterable of act objects exposing a
celexattribute.build_params (Callable[[Any], dict[str, Any]]) – Callable receiving one CELEX value and returning job params.
skip_filter (Callable[[Any], bool] | None) – Optional predicate returning
Truewhen one act should be skipped.error_label (str) – Verb phrase used in failure messages, such as
"Chunking".
- Return type:
None
lalandre_core.queue.job_queue¶
Source: packages/lalandre_core/lalandre_core/queue/job_queue.py
Shared Redis queue helpers for worker services.
Centralizes job enqueue/dedup/status operations used by chunking, embedding, and extraction workers.
- class lalandre_core.queue.job_queue.QueueRuntime(redis_client, job_ttl_seconds)[source]¶
Bases:
objectBase runtime data required by queue helper functions.
- Parameters:
redis_client (Any)
job_ttl_seconds (int)
- class lalandre_core.queue.job_queue.JobPayload[source]¶
Bases:
TypedDictTyped representation of one serialized Redis job payload.
- lalandre_core.queue.job_queue.update_job_status(runtime, job_id, status, message=None, progress=None, ttl=None)[source]¶
Update job status metadata in Redis.
- Parameters:
runtime (QueueRuntime)
job_id (str)
status (str)
message (str | None)
progress (int | float | None)
ttl (int | None)
- Return type:
None
- lalandre_core.queue.job_queue.job_already_queued(runtime, *, queue_name, job_type, celex=None)[source]¶
Check whether a matching job is already queued.
If celex is provided, deduplication is scoped to that CELEX value.
- Parameters:
runtime (QueueRuntime)
queue_name (str)
job_type (str)
celex (str | None)
- Return type:
bool
- lalandre_core.queue.job_queue.enqueue_job(runtime, *, queue_name, job_type, params, dedupe_celex=None)[source]¶
Push a job into queue + initialize status hash, with optional dedupe.
- Parameters:
runtime (QueueRuntime)
queue_name (str)
job_type (str)
params (dict[str, Any])
dedupe_celex (str | None)
- Return type:
str | None
lalandre_core.queue.reconcile¶
Source: packages/lalandre_core/lalandre_core/queue/reconcile.py
shared Redis lock pattern for worker auto-reconcile.
Each worker calls with_reconcile_lock to acquire a distributed Redis lock,
run its domain-specific reconcile check, and release the lock.
- lalandre_core.queue.reconcile.with_reconcile_lock(redis_client, lock_key, lock_ttl, action)[source]¶
Acquire a Redis NX lock, execute action, then release.
If the lock is already held, returns silently.
If Redis is unreachable, logs a warning and returns.
The lock is always released in the
finallyblock.
- Parameters:
redis_client (Any)
lock_key (str)
lock_ttl (int)
action (Callable[[], None])
- Return type:
None
lalandre_core.queue.worker_config¶
Source: packages/lalandre_core/lalandre_core/queue/worker_config.py
Shared config accessor helpers for Redis-based workers.
lalandre_core.queue.worker_loop¶
Source: packages/lalandre_core/lalandre_core/queue/worker_loop.py
worker-loop utilities for Redis-based job workers.
Provides reusable building blocks (functions, not a class hierarchy)
- class lalandre_core.queue.worker_loop.BaseRuntimeParams(redis_client, job_ttl_seconds, brpop_timeout_seconds)[source]¶
Bases:
objectCommon parameters resolved identically by every worker.
- Parameters:
redis_client (Any)
job_ttl_seconds (int)
brpop_timeout_seconds (int)
- lalandre_core.queue.worker_loop.resolve_base_runtime_params(*, redis_host, redis_port, job_ttl_seconds, brpop_timeout_seconds)[source]¶
Resolve Redis client + base tunables shared by all workers.
- Parameters:
redis_host (str | None)
redis_port (int | None)
job_ttl_seconds (int | None)
brpop_timeout_seconds (int)
- Return type:
- lalandre_core.queue.worker_loop.parse_job_payload(job_data)[source]¶
Extract
(job_id, job_type, params)from a raw job dict.Returns empty strings when required fields are missing or have the wrong type so that callers can validate cheaply.
- Parameters:
job_data (dict[str, Any])
- Return type:
tuple[str, str, dict[str, Any]]
- lalandre_core.queue.worker_loop.instrumented_process_job(*, runtime, job_data, dispatch, observe_execution, observe_error)[source]¶
Parse, dispatch, and instrument a single job.
- Parameters:
runtime (Any) – Worker runtime instance passed to the dispatched handlers.
job_data (dict[str, Any]) – Raw deserialized job payload fetched from Redis.
dispatch (dict[str, Callable[[Any, str, dict[str, Any]], None]]) – Mapping of
job_typetohandler(runtime, job_id, params).observe_execution (Callable[[...], None]) – Observer called in
finallywith execution metrics.observe_error (Callable[[...], None]) – Observer called when the handler raises an exception.
- Return type:
None
- lalandre_core.queue.worker_loop.run_worker_loop(*, queue_name, worker_name, redis_client, brpop_timeout_seconds, process_job, reconcile_callback=None, reconcile_interval_seconds=0, reconcile_hour_start=22, reconcile_hour_end=24)[source]¶
Generic BRPOP loop shared by all workers.
- Parameters:
queue_name (str) – Redis list to
BRPOPfrom.worker_name (str) – Human-readable worker name used in log messages.
redis_client (Any) – Synchronous Redis client backing the worker loop.
brpop_timeout_seconds (int) – Polling timeout passed to
BRPOP.process_job (Callable[[dict[str, Any]], None]) – Callback invoked with each deserialized payload.
reconcile_callback (Callable[[], None] | None) – Optional reconciliation callback executed periodically.
reconcile_interval_seconds (int) – Seconds between two reconciliation runs.
reconcile_hour_start (int) – UTC hour at which the reconciliation window opens.
reconcile_hour_end (int) – UTC hour at which the reconciliation window closes.
- Return type:
None
lalandre_core.queue.worker_metrics¶
Source: packages/lalandre_core/lalandre_core/queue/worker_metrics.py
Reusable Prometheus metrics factory for Redis-based workers.
- class lalandre_core.queue.worker_metrics.WorkerMetrics(observe_execution, observe_error)[source]¶
Bases:
objectPre-built Prometheus instruments + observe helpers for a worker.
- Parameters:
observe_execution (Callable[[...], None])
observe_error (Callable[[...], None])
- lalandre_core.queue.worker_metrics.build_worker_metrics(worker_name, valid_job_types)[source]¶
Create Prometheus counters/histograms for a worker and return observe helpers.
- Parameters:
worker_name (str) – Short name used in metric names, such as
"embedding".valid_job_types (set[str]) – Whitelist of known job type labels for normalization.
- Return type:
lalandre_core.redis_client¶
Source: packages/lalandre_core/lalandre_core/redis_client.py
Redis client factory helpers for services.
lalandre_core.repositories¶
Source: packages/lalandre_core/lalandre_core/repositories/__init__.py
Repository Base Abstractions
lalandre_core.repositories.base¶
Source: packages/lalandre_core/lalandre_core/repositories/base/__init__.py
Base repository abstractions
lalandre_core.repositories.base.exceptions¶
Source: packages/lalandre_core/lalandre_core/repositories/base/exceptions.py
Repository exceptions
- exception lalandre_core.repositories.base.exceptions.RepositoryError[source]¶
Bases:
ExceptionBase exception for repository errors
- exception lalandre_core.repositories.base.exceptions.DatabaseConnectionError[source]¶
Bases:
RepositoryErrorRaised when connection to database fails
- exception lalandre_core.repositories.base.exceptions.DatabaseOperationError[source]¶
Bases:
RepositoryErrorRaised when a database operation fails
lalandre_core.repositories.base.repository¶
Source: packages/lalandre_core/lalandre_core/repositories/base/repository.py
Base repository abstraction
lalandre_core.repositories.common¶
Source: packages/lalandre_core/lalandre_core/repositories/common/__init__.py
Common repository helpers.
lalandre_core.repositories.common.payload_builder¶
Source: packages/lalandre_core/lalandre_core/repositories/common/payload_builder.py
Build Qdrant payloads from JSON schemas.
- class lalandre_core.repositories.common.payload_builder.PayloadBuilder(loader=None)[source]¶
Bases:
objectSchema-driven payload builder.
- Parameters:
loader (PayloadSchemaLoader | None)
- build_subdivision_payload(subdivision_data, act_data, version_data=None, metadata=None)[source]¶
Build payload for subdivision embeddings.
- Parameters:
subdivision_data (Dict[str, Any])
act_data (Dict[str, Any])
version_data (Dict[str, Any] | None)
metadata (Dict[str, str] | None)
- Return type:
Dict[str, Any]
lalandre_core.repositories.common.schema_loader¶
Source: packages/lalandre_core/lalandre_core/repositories/common/schema_loader.py
Load JSON payload schemas and render payloads.
- class lalandre_core.repositories.common.schema_loader.PayloadSchemaLoader(schema_file=None)[source]¶
Bases:
objectLoads and applies payload schemas.
Initialize loader (defaults to payload_schemas.json).
- Parameters:
schema_file (str | PathLike[str] | None)
lalandre_core.runtime_values¶
Source: packages/lalandre_core/lalandre_core/runtime_values.py
Small coercion helpers shared across service entrypoints.
- lalandre_core.runtime_values.require_int(value, setting_name)[source]¶
Return an int or raise a clear configuration error.
- Parameters:
value (int | None)
setting_name (str)
- Return type:
int
lalandre_core.utils¶
Source: packages/lalandre_core/lalandre_core/utils/__init__.py
Utility functions Common helpers used across the project
lalandre_core.utils.api_key_pool¶
Source: packages/lalandre_core/lalandre_core/utils/api_key_pool.py
API Key Pool Manager Distributes API calls across multiple keys using round-robin strategy.
- class lalandre_core.utils.api_key_pool.APIKeyPool(keys)[source]¶
Bases:
objectThread-safe container for API keys with round-robin distribution.
- Keys are loaded from environment variables following the pattern:
BASE_VAR, BASE_VAR_2, BASE_VAR_3, …, BASE_VAR_{max_keys}
- Parameters:
keys (List[str])
- classmethod from_env(base_env_var='MISTRAL_API_KEY', max_keys=10, start_index=1)[source]¶
Load keys from environment variables.
Looks for: - {base_env_var} (index 1, main key) - {base_env_var}_2, …, {base_env_var}_{max_keys}
start_index and max_keys control the range: indices [start_index .. max_keys] are loaded. This allows splitting keys between services (e.g. 1-5 for RAG, 6-10 for workers).
- Parameters:
base_env_var (str)
max_keys (int)
start_index (int)
- Return type:
lalandre_core.utils.celex_utils¶
Source: packages/lalandre_core/lalandre_core/utils/celex_utils.py
CELEX Utility Functions
- lalandre_core.utils.celex_utils.normalize_celex(celex)[source]¶
Handles various input formats and normalizes to the standard CELEX format. Removes spaces, handles EUR-Lex format conversions.
Examples
>>> normalize_celex('32016R0679') '32016R0679' >>> normalize_celex(' 32016 R 0679 ') '32016R0679' >>> normalize_celex('(UE) 2016/679') '32016R0679' >>> normalize_celex('(CE) n° 1219/2011') '32011R1219' >>> normalize_celex('Directive 2003/41/CE') '32003L0041' >>> normalize_celex('AMF-RG-L1-20250331') 'AMF-RG-L1-20250331' >>> normalize_celex('AMF-SANCTION-SanctionAMF2026-01-20260112') 'AMF-SAN-2026-01'
- Parameters:
celex (str)
- Return type:
str
- lalandre_core.utils.celex_utils.is_eurlex_celex(celex)[source]¶
Return True iff celex follows the EUR-Lex standard format.
EUR-Lex CELEXes start with a sector digit followed by the 4-digit year and a document-type letter (e.g.
32016R0679). All other sources (AMF-, EBA-, EIOPA-, ESMA-, LEGITEXT…) use alphabetical prefixes.- Parameters:
celex (str)
- Return type:
bool
lalandre_core.utils.collection_utils¶
Source: packages/lalandre_core/lalandre_core/utils/collection_utils.py
Collection utilities. Helpers for de-duplicating lists of dictionaries.
lalandre_core.utils.date_utils¶
Source: packages/lalandre_core/lalandre_core/utils/date_utils.py
Date Utility Functions Centralized date formatting and conversion utilities
- lalandre_core.utils.date_utils.format_date(date_value)[source]¶
Format a date to ISO 8601 string format
Handles multiple input types: - datetime objects - date objects - ISO strings (pass-through) - None (returns None)
- Parameters:
date_value (Any) – Date to format (datetime, date, str, or None)
- Returns:
MM:SS) or None
- Return type:
ISO format string (YYYY-MM-DD or YYYY-MM-DDTHH
Examples
>>> format_date(datetime(2016, 4, 27)) '2016-04-27T00:00:00' >>> format_date(date(2016, 4, 27)) '2016-04-27' >>> format_date("2016-04-27") '2016-04-27' >>> format_date(None) None
- lalandre_core.utils.date_utils.to_timestamp(date_value)[source]¶
Convert a date to Unix timestamp (seconds since epoch)
Handles multiple input types, same like the previous function
- Parameters:
date_value (Any) – Date to convert (datetime, date, str, or None)
- Returns:
Unix timestamp (int) or None
- Return type:
int | None
Examples
>>> to_timestamp(datetime(2016, 4, 27, 12, 0, 0)) 1461758400 # (approximate, depends on timezone) >>> to_timestamp("2016-04-27") 1461715200 >>> to_timestamp(None) None
- lalandre_core.utils.date_utils.convert_dates_to_strings(props, date_fields)[source]¶
Convert datetime objects to strings in a dictionary for database storage
Useful for Neo4j, or other databases that require date strings. Mutates the input dictionary in place.
- Parameters:
props (dict[str, Any]) – Dictionary of properties (will be modified)
date_fields (list[str]) – List of field names that contain dates
- Returns:
Modified properties dict with dates as ISO format strings
- Return type:
dict[str, Any]
Examples
>>> data = {'created_at': datetime(2024, 1, 1), 'name': 'Test'} >>> convert_dates_to_strings(data, ['created_at']) {'created_at': '2024-01-01T00:00:00', 'name': 'Test'}
lalandre_core.utils.metrics_utils¶
Source: packages/lalandre_core/lalandre_core/utils/metrics_utils.py
Shared Prometheus metric helpers reused across services.
- lalandre_core.utils.metrics_utils.status_class(status_code)[source]¶
Collapse an HTTP status code into its class label such as
2xx.- Parameters:
status_code (int)
- Return type:
str
- lalandre_core.utils.metrics_utils.normalize_label(value)[source]¶
Normalize arbitrary metric label values into a lowercase token.
- Parameters:
value (Any)
- Return type:
str
- lalandre_core.utils.metrics_utils.normalize_search_mode(mode)[source]¶
Normalize a search mode label to one of the supported metric values.
- Parameters:
mode (str | None)
- Return type:
str
lalandre_core.utils.mode_aliases¶
Source: packages/lalandre_core/lalandre_core/utils/mode_aliases.py
RAG query mode aliases.
Single source of truth for legacy mode names → canonical mode mapping. Used by both api-gateway and rag-service to resolve mode aliases consistently.
lalandre_core.utils.parse_utils¶
Source: packages/lalandre_core/lalandre_core/utils/parse_utils.py
Generic parsing utilities.
- lalandre_core.utils.parse_utils.extract_json_object(text)[source]¶
Try to extract a JSON object from text (which may contain markdown fences).
Returns the first valid
dictfound, orNone.- Parameters:
text (str)
- Return type:
Dict[str, Any] | None
- lalandre_core.utils.parse_utils.as_dict(value)[source]¶
Return value if it is a
dict, else an emptydict.- Parameters:
value (Any)
- Return type:
Dict[str, Any]
- lalandre_core.utils.parse_utils.as_optional_dict(value)[source]¶
Return value if it is a
dict, elseNone.- Parameters:
value (Any)
- Return type:
Dict[str, Any] | None
- lalandre_core.utils.parse_utils.as_str(value, *, default='')[source]¶
Coerce value to
str.- Parameters:
value (Any)
default (str)
- Return type:
str
- lalandre_core.utils.parse_utils.as_document_list(value)[source]¶
Return only the
dictitems from value (must be a list).- Parameters:
value (Any)
- Return type:
List[Dict[str, Any]]
- lalandre_core.utils.parse_utils.to_optional_int(value)[source]¶
Convert value to
intwhen possible, elseNone.- Parameters:
value (Any)
- Return type:
int | None
- lalandre_core.utils.parse_utils.sanitize_error_text(error, *, max_chars=220)[source]¶
Return a truncated, safe string representation of error.
- Parameters:
error (Exception)
max_chars (int)
- Return type:
str
lalandre_core.utils.regulatory_level¶
Source: packages/lalandre_core/lalandre_core/utils/regulatory_level.py
Regulatory level inference from act metadata.
- EU financial regulation follows a 3-level hierarchy:
1 (L1) — Framework legislation (Regulations, Directives) adopted by Parliament/Council 2 (L2) — Delegated/implementing acts (RTS, ITS) adopted by the Commission 3 (L3) — Supervisory guidance (Guidelines, Q&A, Recommendations) by ESA (EBA/ESMA/EIOPA)
- lalandre_core.utils.regulatory_level.infer_regulatory_level(celex, act_type, title=None, form_number=None)[source]¶
Infer regulatory level from act metadata.
Returns
1(L1),2(L2),3(L3), orNone(outside scope).- Parameters:
celex (str)
act_type (str)
title (str | None)
form_number (str | None)
- Return type:
int | None
lalandre_core.utils.text_utils¶
Source: packages/lalandre_core/lalandre_core/utils/text_utils.py
Text normalization utilities.