Schema
Generate
In order to upload the dataset for the first time, you need to define its schema. This endpoint is provided for your convenience to generate a schema based on an existing dataset.
The first 50MB of the uploaded file (regardless of size) are used to infer the schema. Consider uploading a representative sample of your dataset (e.g.: the first 10,000 rows) instead of uploading the entire large file which could take a long time
Permissions
Any
Path
POST /schema/{sensitivity}/{domain}/{dataset}/generate
Inputs
Parameters | Usage | Example values | Definition |
---|---|---|---|
layer |
URL parameter | default |
layer of the dataset |
sensitivity |
URL parameter | PUBLIC, PRIVATE, PROTECTED |
sensitivity of the dataset |
domain |
URL parameter | land |
domain of the dataset |
dataset |
URL parameter | train_journeys |
dataset title |
file |
File in form data with key value file |
train_journeys.csv |
the dataset file itself |
Outputs
Schema in json format in the response body:
{
"metadata": {
"layer": "default",
"domain": "land",
"dataset": "train_journeys",
"sensitivity": "PUBLIC",
"key_value_tags": {},
"key_only_tags": [],
"owners": [
{
"name": "change_me",
"email": "change_me@email.com"
}
],
"update_behaviour": "APPEND"
},
"columns": [
{
"name": "date",
"partition_index": 0,
"data_type": "date",
"format": "%d/%m/%Y",
"allow_null": false
},
{
"name": "num_journeys",
"partition_index": null,
"data_type": "integer",
"allow_null": false
}
]
}
Upload
When you have a schema definition you can use this endpoint to upload it. This will allow you to subsequently upload datasets that match the schema.
Permissions
DATA_ADMIN
Path
POST /schema
Inputs
Parameters | Usage | Example values | Definition |
---|---|---|---|
schema | JSON request body | see below | the schema definition |
Example schema JSON body:
{
"metadata": {
"layer": "default",
"domain": "land",
"dataset": "train_journeys",
"sensitivity": "PUBLIC",
"key_value_tags": {
"train": "passenger"
},
"key_only_tags": ["land"],
"owners": [
{
"name": "Stanley Shunpike",
"email": "stan.shunpike@email.com"
}
],
"update_behaviour": "APPEND"
},
"columns": [
{
"name": "date",
"partition_index": 0,
"data_type": "date",
"format": "%d/%m/%Y",
"allow_null": false
},
{
"name": "num_journeys",
"partition_index": null,
"data_type": "integer",
"allow_null": false
}
]
}
Outputs
None
Update (new dataset version)
This endpoint is for uploading an updated schema definition. This will allow you to subsequently upload datasets that match the updated schema.
Permissions
Any relevant WRITE
permissions that matches dataset sensitivity level, e.g. WRITE_ALL
, WRITE_PUBLIC
, WRITE_PRIVATE
, WRITE_PROTECTED_{DOMAIN}
.
Path
POST /datasets/{layer}/{domain}/{dataset}
Inputs
Parameters | Required | Usage | Example values | Definition |
---|---|---|---|---|
layer |
True | URL parameter | default |
layer of the dataset |
domain |
True | URL parameter | air |
domain of the dataset |
dataset |
True | URL parameter | passengers_by_airport |
dataset title |
version |
False | Query parameter | 3 |
dataset version |
file |
True | File in form data with key value file |
passengers_by_airport.csv |
the dataset file itself |
Output
If successful returns file name with a timestamp included, e.g.:
{
"details": {
"original_filename": "the-filename.csv",
"raw_filename": "661c9467-5d0e-4ec7-ad05-b8651598b675.csv",
"dataset_version": 3,
"status": "Data processing",
"job_id": "3bd7d98f-2264-4f88-bd65-5a2089161650"
}
}