sqlexercise =========== .. py:module:: sqlexercise .. autoapi-nested-parse:: Generate SQL assignments based on specified SQL errors and difficulty levels. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/sqlexercise/assignments/index /autoapi/sqlexercise/constraints/index /autoapi/sqlexercise/llm/index Attributes ---------- .. autoapisummary:: sqlexercise.ERROR_REQUIREMENTS_MAP Exceptions ---------- .. autoapisummary:: sqlexercise.ExerciseGenerationError Classes ------- .. autoapisummary:: sqlexercise.DifficultyLevel sqlexercise.Assignment sqlexercise.Dataset sqlexercise.Exercise sqlexercise.SchemaConstraint sqlexercise.QueryConstraint sqlexercise.SqlErrorRequirements Functions --------- .. autoapisummary:: sqlexercise.random_domain sqlexercise.generate_assignment Package Contents ---------------- .. py:class:: DifficultyLevel Bases: :py:obj:`enum.IntEnum` Difficulty levels for SQL assignments. .. py:attribute:: EASY :value: 1 Minimal cognitive load, the assignments contains only elements related to triggering the error .. py:attribute:: MEDIUM :value: 2 Moderate cognitive load, the assignments contains some elements not related to triggering the error .. py:attribute:: HARD :value: 3 High cognitive load, the assignments contains elements not related to triggering the error and may require complex reasoning .. py:function:: random_domain(language) Select and return a random domain from predefined list. .. py:class:: Assignment A full SQL assignment consisting of a dataset and exercises. .. py:attribute:: dataset :type: sqlexercise.assignments.dataset.Dataset The dataset associated with the assignment. .. py:attribute:: exercises :type: list[sqlexercise.assignments.exercise.Exercise] The exercises included in the assignment. .. py:class:: Dataset A SQL dataset related to a specific domain, including schema creation and data insertion commands. .. py:attribute:: create_commands :type: list[str] SQL commands to create the database schema. .. py:attribute:: insert_commands :type: list[str] SQL commands to insert data into the database. .. py:attribute:: domain :type: str The domain associated with the dataset. .. py:attribute:: _catalog_cache :type: sqlscope.Catalog | None :value: None Cached SQLScope Catalog for the dataset. .. py:attribute:: _catalog_cache_commands_hash :type: int | None :value: None Hash of the CREATE TABLE commands used to build the cached Catalog. .. py:property:: catalog :type: sqlscope.Catalog Build and return a SQLScope Catalog from the dataset's SQL commands. The result is cached for handling multiple accesses efficiently. Cache is properly invalidated if the CREATE TABLE commands change. .. py:method:: to_sql_no_context() Generate the SQL commands to create and populate the dataset without schema context. .. py:method:: to_sql(schema) Generate the SQL commands to create and populate the dataset within the specified schema. .. py:method:: from_sql(sql_str, sql_dialect) :staticmethod: Create a Dataset instance from a raw SQL string containing CREATE TABLE and INSERT INTO commands. .. py:method:: generate(domain, sql_dialect, constraints, extra_details = [], *, db_host, db_port, db_user, db_password, language, max_attempts = 5, on_attempt_start = lambda: None) :staticmethod: Generate a SQL dataset based on the specified parameters. .. py:class:: Exercise A SQL exercise consisting of a title, request, and solutions. .. py:attribute:: title :type: str The title of the exercise. .. py:attribute:: request :type: str The natural language request or question for the exercise. .. py:attribute:: solutions :type: list[sqlscope.Query] The list of SQL query solutions for the exercise. .. py:attribute:: difficulty :type: sqlexercise.difficulty_level.DifficultyLevel The difficulty level of the exercise. .. py:attribute:: error :type: sqlerrors.SqlErrors The SQL error type associated with the exercise. .. py:method:: generate(error, difficulty, constraints, *, db_host, db_port, db_user, db_password, extra_details, dataset, title, sql_dialect, language, max_attempts = 3, on_attempt_start = lambda: None) :staticmethod: Generate a SQL exercise based on the specified parameters. .. py:class:: SchemaConstraint Bases: :py:obj:`sqlexercise.constraints.base.BaseConstraint` Base class for schema-related constraints. .. py:method:: validate(catalog, tables_sql, values_sql) :abstractmethod: Validate if the given table creation and insertion statements satisfy the constraint. Args: catalog (Catalog): The catalog representing the database schema. tables_sql (list[exp.Create]): List of CREATE TABLE expressions. values_sql (list[exp.Insert]): List of INSERT INTO expressions. Raises: ConstraintValidationError: If the schema does not satisfy the constraint. .. py:method:: merge(other) :abstractmethod: Merges this constraint with another constraint of the same type. .. py:class:: QueryConstraint Bases: :py:obj:`sqlexercise.constraints.base.BaseConstraint` Base class for query-related constraints. .. py:method:: validate(query) :abstractmethod: Validate if the given SQL query satisfies the constraint. Args: query (Query): The SQL query to validate. Raises: ConstraintValidationError: If the query does not satisfy the constraint. .. py:class:: SqlErrorRequirements(language) Bases: :py:obj:`abc.ABC` Requirements for generating an assignment likely to trigger a specific error .. py:method:: dataset_constraints(difficulty) Constraints the dataset must satisfy to likely trigger the error. .. py:method:: exercise_constraints(difficulty) Constraints the exercise must satisfy to likely trigger the error. .. py:method:: exercise_extra_details() Additional details or instructions for the exercise. .. py:method:: dataset_extra_details() Additional details or instructions for the dataset. .. py:data:: ERROR_REQUIREMENTS_MAP :type: dict[sqlerrors.SqlErrors, type[base.SqlErrorRequirements]] Mapping of SQL errors to their requirements. .. py:exception:: ExerciseGenerationError Bases: :py:obj:`Exception` Custom exception for errors during exercise generation. .. py:function:: generate_assignment(errors, db_host, db_port, db_user, db_password, sql_dialect = 'postgres', *, language = 'en', domain = None, dataset_str = None, shuffle_exercises = False, naming_func = lambda error, difficulty: f'{error.name} - {difficulty.name}', max_dataset_attempts = 3, max_exercise_attempts = 3, max_unique_attempts = 3, max_workers = None, on_domain_selection = lambda domain: None, on_dataset_generation_progress = lambda n, m: None, on_exercise_generation_progress = lambda n, m: None, on_dataset_generation_success = lambda: None, on_exercise_generation_success = lambda e, d: None, on_exercise_generation_failure = lambda e, d: None) Generate SQL assignments based on the given SQL errors and their corresponding difficulty levels. - Exercises are returned in the same order as the input `errors`. - Logging happens as soon as possible (during generation), and each message uses the exercise title as its id. - Deduplication is global across all generated exercises (thread-safe). Args: errors (list[tuple[SqlErrors, DifficultyLevel]]): A list of (error, difficulty) pairs. sql_dialect (str): The SQL dialect to use for generating the dataset and exercises (e.g., 'postgres', 'mysql'). domain (str | None): The domain for the assignments. If None, a random domain will be selected. language (str): The language for the assignment generation (e.g., 'en' for English). dataset_str (str | None): Optional SQL string to use as the dataset. If provided, it will be used instead of generating a new dataset. shuffle_exercises (bool): Whether to shuffle exercises to prevent ordering bias (shuffles input order). naming_func (Callable[[SqlErrors, DifficultyLevel], str]): Generates exercise titles. max_dataset_attempts (int): Maximum retries for generating a valid dataset before skipping. max_exercise_attempts (int): Maximum retries for generating a valid exercise before skipping. max_unique_attempts (int): Maximum retries to avoid duplicate solutions per (error, difficulty). max_workers (int | None): Thread pool size. If None, uses ThreadPoolExecutor default. on_domain_selection (Callable[[str], None]): Callback for when a domain is selected. on_dataset_generation_progress (Callable[[int, int], None]): Callback for dataset generation progress (current attempt, max attempts). on_exercise_generation_progress (Callable[[int, int], None]): Callback for exercise generation progress (current attempt, max attempts). on_dataset_generation_success (Callable[[], None]): Callback for successful dataset generation. on_exercise_generation_success (Callable[[SqlErrors, DifficultyLevel], None]): Callback for successful exercise generation. on_exercise_generation_failure (Callable[[SqlErrors, DifficultyLevel], None]): Callback for failed exercise generation. Returns: Assignment: The generated assignment (stable order).