Wicker Schemas

Every Wicker dataset has an associated schema which is declared at schema write-time.

Wicker schemas are Python objects which are serialized in storage as Avro-compatible JSON files. When declaring schemas, we use the wicker.schema.DatasetSchema object:

from wicker.schema import DatasetSchema

my_schema = DatasetSchema(
    primary_keys=["foo", "bar"],
    fields=[...],
)

Your schema must be defined with a set of primary_keys. Your primary keys must be the names of string, float, int or bool fields in your schema, and will be used to order your dataset.

Schema Fields

Here is a list of Schema fields that Wicker provides. Most notably, users can implement custom fields by implementing their own codecs and using the ObjectField.

class wicker.schema.schema.ArrayField(element_field: wicker.schema.schema.SchemaField, required: bool = True)

Bases: wicker.schema.schema.SchemaField

A field that contains an array of values (that have the same schema type)

class wicker.schema.schema.BoolField(name: str, description: str = '', required: bool = True, custom_field_tags: Dict[str, str] = {})

Bases: wicker.schema.schema.SchemaField

A field that stores a boolean

class wicker.schema.schema.BytesField(name: str, description: str = '', required: bool = True, is_heavy_pointer: bool = True)

Bases: wicker.schema.schema.ObjectField

An ObjectField that uses a no-op Codec for encoding pure bytes

class wicker.schema.schema.DoubleField(name: str, description: str = '', required: bool = True, custom_field_tags: Dict[str, str] = {})

Bases: wicker.schema.schema.SchemaField

A field that stores a 64-bit double

class wicker.schema.schema.FloatField(name: str, description: str = '', required: bool = True, custom_field_tags: Dict[str, str] = {})

Bases: wicker.schema.schema.SchemaField

A field that stores a 32-bit float

class wicker.schema.schema.IntField(name: str, description: str = '', required: bool = True, custom_field_tags: Dict[str, str] = {})

Bases: wicker.schema.schema.SchemaField

A field that stores a 32-bit int

class wicker.schema.schema.LongField(name: str, description: str = '', required: bool = True, custom_field_tags: Dict[str, str] = {})

Bases: wicker.schema.schema.SchemaField

A field that stores a 64-bit long

class wicker.schema.schema.NumpyField(name: str, shape: Optional[Tuple[int, ...]], dtype: str, description: str = '', required: bool = True, is_heavy_pointer: bool = True)

Bases: wicker.schema.schema.ObjectField

An ObjectField that uses a Codec for encoding Numpy arrays

class wicker.schema.schema.ObjectField(name: str, codec: wicker.schema.codecs.Codec, description: str = '', required: bool = True, is_heavy_pointer: bool = True)

Bases: wicker.schema.schema.SchemaField

A field that contains objects of a specific type.

property is_heavy_pointer: bool

Whether to store this field as a separate file

class wicker.schema.schema.RecordField(name: str, fields: List[wicker.schema.schema.SchemaField], description: str = '', required: bool = True, custom_field_tags: Dict[str, str] = {}, top_level: bool = False)

Bases: wicker.schema.schema.SchemaField

A field that contains nested fields

class wicker.schema.schema.StringField(name: str, description: str = '', required: bool = True, custom_field_tags: Dict[str, str] = {})

Bases: wicker.schema.schema.SchemaField

A field that stores a string