sqlexercise

Generate SQL assignments based on specified SQL errors and difficulty levels.

Submodules

Attributes

ERROR_REQUIREMENTS_MAP

Mapping of SQL errors to their requirements.

Exceptions

ExerciseGenerationError

Custom exception for errors during exercise generation.

Classes

DifficultyLevel

Difficulty levels for SQL assignments.

Assignment

A full SQL assignment consisting of a dataset and exercises.

Dataset

A SQL dataset related to a specific domain, including schema creation and data insertion commands.

Exercise

A SQL exercise consisting of a title, request, and solutions.

SchemaConstraint

Base class for schema-related constraints.

QueryConstraint

Base class for query-related constraints.

SqlErrorRequirements

Requirements for generating an assignment likely to trigger a specific error

Functions

random_domain(language)

Select and return a random domain from predefined list.

generate_assignment(errors, db_host, db_port, db_user, ...)

Generate SQL assignments based on the given SQL errors and their corresponding difficulty levels.

Package Contents

class sqlexercise.DifficultyLevel

Bases: enum.IntEnum

Difficulty levels for SQL assignments.

EASY = 1

Minimal cognitive load, the assignments contains only elements related to triggering the error

MEDIUM = 2

Moderate cognitive load, the assignments contains some elements not related to triggering the error

HARD = 3

High cognitive load, the assignments contains elements not related to triggering the error and may require complex reasoning

sqlexercise.random_domain(language)

Select and return a random domain from predefined list.

Parameters:

language (str)

Return type:

str

class sqlexercise.Assignment

A full SQL assignment consisting of a dataset and exercises.

dataset: sqlexercise.assignments.dataset.Dataset

The dataset associated with the assignment.

exercises: list[sqlexercise.assignments.exercise.Exercise]

The exercises included in the assignment.

class sqlexercise.Dataset

A SQL dataset related to a specific domain, including schema creation and data insertion commands.

create_commands: list[str]

SQL commands to create the database schema.

insert_commands: list[str]

SQL commands to insert data into the database.

domain: str

The domain associated with the dataset.

_catalog_cache: sqlscope.Catalog | None = None

Cached SQLScope Catalog for the dataset.

_catalog_cache_commands_hash: int | None = None

Hash of the CREATE TABLE commands used to build the cached Catalog.

property catalog: sqlscope.Catalog

Build and return a SQLScope Catalog from the dataset’s SQL commands. The result is cached for handling multiple accesses efficiently. Cache is properly invalidated if the CREATE TABLE commands change.

Return type:

sqlscope.Catalog

to_sql_no_context()

Generate the SQL commands to create and populate the dataset without schema context.

Return type:

str

to_sql(schema)

Generate the SQL commands to create and populate the dataset within the specified schema.

Parameters:

schema (str)

Return type:

str

static from_sql(sql_str, sql_dialect)

Create a Dataset instance from a raw SQL string containing CREATE TABLE and INSERT INTO commands.

Parameters:
  • sql_str (str)

  • sql_dialect (str)

Return type:

Dataset

static generate(domain, sql_dialect, constraints, extra_details=[], *, db_host, db_port, db_user, db_password, language, max_attempts=5, on_attempt_start=lambda : ...)

Generate a SQL dataset based on the specified parameters.

Parameters:
  • domain (str)

  • sql_dialect (str)

  • constraints (collections.abc.Sequence[sqlexercise.constraints.SchemaConstraint])

  • extra_details (list[str])

  • db_host (str)

  • db_port (int)

  • db_user (str)

  • db_password (str)

  • language (str)

  • max_attempts (int)

  • on_attempt_start (Callable[[], None])

Return type:

Dataset

class sqlexercise.Exercise

A SQL exercise consisting of a title, request, and solutions.

title: str

The title of the exercise.

request: str

The natural language request or question for the exercise.

solutions: list[sqlscope.Query]

The list of SQL query solutions for the exercise.

difficulty: sqlexercise.difficulty_level.DifficultyLevel

The difficulty level of the exercise.

error: sqlerrors.SqlErrors

The SQL error type associated with the exercise.

static generate(error, difficulty, constraints, *, db_host, db_port, db_user, db_password, extra_details, dataset, title, sql_dialect, language, max_attempts=3, on_attempt_start=lambda : ...)

Generate a SQL exercise based on the specified parameters.

Parameters:
  • error (sqlerrors.SqlErrors)

  • difficulty (sqlexercise.difficulty_level.DifficultyLevel)

  • constraints (list[sqlexercise.constraints.QueryConstraint])

  • db_host (str)

  • db_port (int)

  • db_user (str)

  • db_password (str)

  • extra_details (str)

  • dataset (sqlexercise.assignments.dataset.Dataset)

  • title (str)

  • sql_dialect (str)

  • language (str)

  • max_attempts (int)

  • on_attempt_start (Callable[[], None])

Return type:

Exercise

class sqlexercise.SchemaConstraint

Bases: sqlexercise.constraints.base.BaseConstraint

Base class for schema-related constraints.

abstractmethod validate(catalog, tables_sql, values_sql)

Validate if the given table creation and insertion statements satisfy the constraint.

Args:

catalog (Catalog): The catalog representing the database schema. tables_sql (list[exp.Create]): List of CREATE TABLE expressions. values_sql (list[exp.Insert]): List of INSERT INTO expressions.

Raises:

ConstraintValidationError: If the schema does not satisfy the constraint.

Parameters:
  • catalog (sqlscope.Catalog)

  • tables_sql (list[sqlglot.exp.Create])

  • values_sql (list[sqlglot.exp.Insert])

Return type:

None

abstractmethod merge(other)

Merges this constraint with another constraint of the same type.

Parameters:

other (SchemaConstraint)

Return type:

SchemaConstraint

class sqlexercise.QueryConstraint

Bases: sqlexercise.constraints.base.BaseConstraint

Base class for query-related constraints.

abstractmethod validate(query)

Validate if the given SQL query satisfies the constraint.

Args:

query (Query): The SQL query to validate.

Raises:

ConstraintValidationError: If the query does not satisfy the constraint.

Parameters:

query (sqlscope.Query)

Return type:

None

class sqlexercise.SqlErrorRequirements(language)

Bases: abc.ABC

Requirements for generating an assignment likely to trigger a specific error

Parameters:

language (str)

dataset_constraints(difficulty)

Constraints the dataset must satisfy to likely trigger the error.

Parameters:

difficulty (sqlexercise.difficulty_level.DifficultyLevel)

Return type:

list[sqlexercise.constraints.SchemaConstraint]

exercise_constraints(difficulty)

Constraints the exercise must satisfy to likely trigger the error.

Parameters:

difficulty (sqlexercise.difficulty_level.DifficultyLevel)

Return type:

list[sqlexercise.constraints.QueryConstraint]

exercise_extra_details()

Additional details or instructions for the exercise.

Return type:

sqlexercise.translatable_text.TranslatableText

dataset_extra_details()

Additional details or instructions for the dataset.

Return type:

sqlexercise.translatable_text.TranslatableText

sqlexercise.ERROR_REQUIREMENTS_MAP: dict[sqlerrors.SqlErrors, type[base.SqlErrorRequirements]]

Mapping of SQL errors to their requirements.

exception sqlexercise.ExerciseGenerationError

Bases: Exception

Custom exception for errors during exercise generation.

sqlexercise.generate_assignment(errors, db_host, db_port, db_user, db_password, sql_dialect='postgres', *, language='en', domain=None, dataset_str=None, shuffle_exercises=False, naming_func=lambda error, difficulty: ..., max_dataset_attempts=3, max_exercise_attempts=3, max_unique_attempts=3, max_workers=None, on_domain_selection=lambda domain: ..., on_dataset_generation_progress=lambda n, m: ..., on_exercise_generation_progress=lambda n, m: ..., on_dataset_generation_success=lambda : ..., on_exercise_generation_success=lambda e, d: ..., on_exercise_generation_failure=lambda e, d: ...)

Generate SQL assignments based on the given SQL errors and their corresponding difficulty levels.

  • Exercises are returned in the same order as the input errors.

  • Logging happens as soon as possible (during generation), and each message uses the exercise title as its id.

  • Deduplication is global across all generated exercises (thread-safe).

Args:

errors (list[tuple[SqlErrors, DifficultyLevel]]): A list of (error, difficulty) pairs. sql_dialect (str): The SQL dialect to use for generating the dataset and exercises (e.g., ‘postgres’, ‘mysql’). domain (str | None): The domain for the assignments. If None, a random domain will be selected. language (str): The language for the assignment generation (e.g., ‘en’ for English). dataset_str (str | None): Optional SQL string to use as the dataset. If provided, it will be used instead of generating a new dataset. shuffle_exercises (bool): Whether to shuffle exercises to prevent ordering bias (shuffles input order). naming_func (Callable[[SqlErrors, DifficultyLevel], str]): Generates exercise titles. max_dataset_attempts (int): Maximum retries for generating a valid dataset before skipping. max_exercise_attempts (int): Maximum retries for generating a valid exercise before skipping. max_unique_attempts (int): Maximum retries to avoid duplicate solutions per (error, difficulty). max_workers (int | None): Thread pool size. If None, uses ThreadPoolExecutor default. on_domain_selection (Callable[[str], None]): Callback for when a domain is selected. on_dataset_generation_progress (Callable[[int, int], None]): Callback for dataset generation progress (current attempt, max attempts). on_exercise_generation_progress (Callable[[int, int], None]): Callback for exercise generation progress (current attempt, max attempts). on_dataset_generation_success (Callable[[], None]): Callback for successful dataset generation. on_exercise_generation_success (Callable[[SqlErrors, DifficultyLevel], None]): Callback for successful exercise generation. on_exercise_generation_failure (Callable[[SqlErrors, DifficultyLevel], None]): Callback for failed exercise generation.

Returns:

Assignment: The generated assignment (stable order).

Parameters:
  • errors (list[tuple[sqlerrors.SqlErrors, difficulty_level.DifficultyLevel]])

  • db_host (str)

  • db_port (int)

  • db_user (str)

  • db_password (str)

  • sql_dialect (str)

  • language (str)

  • domain (str | None)

  • dataset_str (str | None)

  • shuffle_exercises (bool)

  • naming_func (Callable[[sqlerrors.SqlErrors, difficulty_level.DifficultyLevel], str])

  • max_dataset_attempts (int)

  • max_exercise_attempts (int)

  • max_unique_attempts (int)

  • max_workers (int | None)

  • on_domain_selection (Callable[[str], None])

  • on_dataset_generation_progress (Callable[[int, int], None])

  • on_exercise_generation_progress (Callable[[int, int], None])

  • on_dataset_generation_success (Callable[[], None])

  • on_exercise_generation_success (Callable[[sqlerrors.SqlErrors, difficulty_level.DifficultyLevel], None])

  • on_exercise_generation_failure (Callable[[sqlerrors.SqlErrors, difficulty_level.DifficultyLevel], None])

Return type:

assignments.Assignment