Embedding-Based SQL Query Similarity for Spatiotemporal Data Exploration
Abstract
Estimating the similarity of SQL queries is a fundamental building block for query recommendation systems, workload analysis, and interactive data exploration. We introduce TESS (Tree-Embedding SQL Similarity), a novel query similarity algorithm that constructs a weighted embedding vector from the label tree of a query, capturing both its structural and semantic content. We also present GESS (Global-Embedding SQL Similarity), a lightweight companion algorithm that embeds the full query text as a single vector. Our motivating application is query recommendation for spatiotemporal data exploration, where an analyst’s queries must be compared by semantic intent rather than by syntactic structure alone. We introduce a rigorous evaluation framework and perform a comprehensive evaluation across five datasets and seven metrics. Our experiments show that TESS achieves superior performance across multiple metrics, making it well-suited for interactive recommendation at scale.