Skip to content

Schema

Column

Bases: BaseModel

to_pandera_column()

Convert Column to Pandera Column for Pandera data validation. Note: The 'data_type' attribute should not be used in Pandera Column as we have our own custom data type validation.

Schema

Bases: BaseModel

A Schema is a Pydantic class representing a rAPId schema. It allows you to programmatically define a schema to generate, create and update within rAPId.

Example

A Schema can be created by setting the values literally into the classes like example below::

schema = Schema(
    metadata=SchemaMetadata(
        layer='default',
        domain="domain",
        dataset="dataset",
        sensitivity=SensitivityLevel.PUBLIC,
        owners=[Owner(name="test", version="test@email.com")]
    ),
    columns=[
        Column(
            name="column_a",
            data_type="Float64",
            allow_null=True
        )
    ]
)

The alternative is you can create a schema directly from a Python dictionary specifying the values like in the example below::

schema = Schema(
    **{
        "metadata": {
            ....
        },
        "columns": {
            ....
        }
    }
)

are_columns_the_same(new_columns)

Checks that for a given Schema, does it's columns match the columns being passed into this function.

Parameters:

Name Type Description Default
new_columns Union[List[Column], List[dict]]

The new columns can be passed as either a list of Column defined classes or as a list of Python dictionaries representing the values. If the later is chosen and there is an incorrect value passed the function will raise a rapid.exceptions.ColumnNotDifferentException.

required

Returns:

Name Type Description
bool bool

True If the new columns match the columns in the Schema otherwise False

pandera_validate(df, **kwargs)

Validate a DataFrame using Pandera based on the schema's column definitions and checks.

Parameters:

Name Type Description Default
df

The pandas DataFrame to validate

required
**kwargs

Additional arguments to pass to Pandera's validate method (e.g., lazy=True)

{}

Returns:

Type Description

The validated DataFrame

Raises:

Type Description
SchemaErrors

If validation fails