The Problem
GQLAlchemy, the Python OGM (Object Graph Mapper) for Memgraph, lacked programmatic access to the ANALYZE GRAPH query functionality. This query is critical for database performance optimization as it calculates property value distributions across nodes, enabling Memgraph to make smarter decisions about which index to use during query execution.
Without this feature, developers had two suboptimal choices:
- Execute raw Cypher queries - Bypassing the type-safe query builder entirely
- Skip graph analysis - Leaving the database to select indexes based solely on node count, potentially leading to slower queries
The ANALYZE GRAPH command generates statistics including:
- num estimation nodes: Nodes used for statistical sampling
- num groups: Distinct property values found
- avg group size: Average clustering per value (used for cost estimation)
- chi-squared value: Distribution uniformity measurement
- avg degree: Average connectivity of indexed nodes
These statistics help Memgraph choose optimal indexes and improve MERGE operation performance by expanding from nodes with lower connectivity.
The Solution
I implemented two new methods on the Memgraph class that provide a clean, Pythonic interface for graph analysis:
analyze_graph() Method
Executes the graph analysis query with optional label filtering:
from gqlalchemy import Memgraph
memgraph = Memgraph()
# Analyze all labels in the graph
results = memgraph.analyze_graph()
# Analyze specific labels only
results = memgraph.analyze_graph(labels=["Person", "Company"])
delete_graph_statistics() Method
Removes previously calculated statistics, with support for label-specific deletion:
# Delete all statistics
memgraph.delete_graph_statistics()
# Delete statistics for specific labels only
memgraph.delete_graph_statistics(labels=["Person"])
The label-specific deletion support was added after reviewing the official Memgraph documentation, which revealed the ON LABELS syntax variant:
ANALYZE GRAPH ON LABELS :Label1 DELETE STATISTICS;
Files Changed
| File | Change Type | Description |
|---|---|---|
gqlalchemy/vendors/memgraph.py | Modified | Added analyze_graph() and delete_graph_statistics() methods |
tests/memgraph/test_analyze_graph.py | New | Unit tests for the new functionality |
docs/reference/gqlalchemy/vendors/memgraph.md | Modified | API reference documentation |
Timeline
| Date | Event |
|---|---|
| April 7, 2023 | Issue #238 opened requesting ANALYZE GRAPH support |
| December 14, 2025 | PR #373 opened with initial implementation |
| December 14, 2025 | Added label-specific deletion support after documentation review |
| December 14, 2025 | Added API reference documentation (3rd commit) |
| December 16, 2025 | PR approved by antejavor and Josipmrden |
| December 16, 2025 | All 6 CI checks passed, PR merged |
Technical Notes
The implementation follows gqlalchemy’s established patterns for vendor-specific methods. The methods construct the appropriate Cypher queries internally and handle the response parsing, returning typed results that integrate with the rest of the OGM’s query builder ecosystem.
This contribution was marked as a “good first issue” with “Effort - Low” classification, making it an approachable entry point for new contributors while delivering meaningful functionality to the library.