Skip to main content
Back to contributions
Pull Request
Merged
287

Add ANALYZE GRAPH query methods

memgraph/gqlalchemy

Added analyze_graph() and delete_graph_statistics() methods to calculate graph statistics for better index selection

The Problem

GQLAlchemy, the Python OGM (Object Graph Mapper) for Memgraph, lacked programmatic access to the ANALYZE GRAPH query functionality. This query is critical for database performance optimization as it calculates property value distributions across nodes, enabling Memgraph to make smarter decisions about which index to use during query execution.

Without this feature, developers had two suboptimal choices:

  1. Execute raw Cypher queries - Bypassing the type-safe query builder entirely
  2. Skip graph analysis - Leaving the database to select indexes based solely on node count, potentially leading to slower queries

The ANALYZE GRAPH command generates statistics including:

  • num estimation nodes: Nodes used for statistical sampling
  • num groups: Distinct property values found
  • avg group size: Average clustering per value (used for cost estimation)
  • chi-squared value: Distribution uniformity measurement
  • avg degree: Average connectivity of indexed nodes

These statistics help Memgraph choose optimal indexes and improve MERGE operation performance by expanding from nodes with lower connectivity.

The Solution

I implemented two new methods on the Memgraph class that provide a clean, Pythonic interface for graph analysis:

analyze_graph() Method

Executes the graph analysis query with optional label filtering:

from gqlalchemy import Memgraph

memgraph = Memgraph()

# Analyze all labels in the graph
results = memgraph.analyze_graph()

# Analyze specific labels only
results = memgraph.analyze_graph(labels=["Person", "Company"])

delete_graph_statistics() Method

Removes previously calculated statistics, with support for label-specific deletion:

# Delete all statistics
memgraph.delete_graph_statistics()

# Delete statistics for specific labels only
memgraph.delete_graph_statistics(labels=["Person"])

The label-specific deletion support was added after reviewing the official Memgraph documentation, which revealed the ON LABELS syntax variant:

ANALYZE GRAPH ON LABELS :Label1 DELETE STATISTICS;

Files Changed

FileChange TypeDescription
gqlalchemy/vendors/memgraph.pyModifiedAdded analyze_graph() and delete_graph_statistics() methods
tests/memgraph/test_analyze_graph.pyNewUnit tests for the new functionality
docs/reference/gqlalchemy/vendors/memgraph.mdModifiedAPI reference documentation

Timeline

DateEvent
April 7, 2023Issue #238 opened requesting ANALYZE GRAPH support
December 14, 2025PR #373 opened with initial implementation
December 14, 2025Added label-specific deletion support after documentation review
December 14, 2025Added API reference documentation (3rd commit)
December 16, 2025PR approved by antejavor and Josipmrden
December 16, 2025All 6 CI checks passed, PR merged

Technical Notes

The implementation follows gqlalchemy’s established patterns for vendor-specific methods. The methods construct the appropriate Cypher queries internally and handle the response parsing, returning typed results that integrate with the rest of the OGM’s query builder ecosystem.

This contribution was marked as a “good first issue” with “Effort - Low” classification, making it an approachable entry point for new contributors while delivering meaningful functionality to the library.