Skip to content

Schema

Generate

In order to upload the dataset for the first time, you need to define its schema. This endpoint is provided for your convenience to generate a schema based on an existing dataset.

The first 50MB of the uploaded file (regardless of size) are used to infer the schema. Consider uploading a representative sample of your dataset (e.g.: the first 10,000 rows) instead of uploading the entire large file which could take a long time

Permissions

Any

Path

POST /schema/{sensitivity}/{domain}/{dataset}/generate

Inputs

Parameters Usage Example values Definition
layer URL parameter default layer of the dataset
sensitivity URL parameter PUBLIC, PRIVATE, PROTECTED sensitivity of the dataset
domain URL parameter land domain of the dataset
dataset URL parameter train_journeys dataset title
file File in form data with key value file train_journeys.csv the dataset file itself

Outputs

Schema in json format in the response body:

{
  "metadata": {
    "layer": "default",
    "domain": "land",
    "dataset": "train_journeys",
    "sensitivity": "PUBLIC",
    "key_value_tags": {},
    "key_only_tags": [],
    "owners": [
      {
        "name": "change_me",
        "email": "change_me@email.com"
      }
    ],
    "update_behaviour": "APPEND"
  },
  "columns": [
    {
      "name": "date",
      "partition_index": 0,
      "data_type": "date",
      "format": "%d/%m/%Y",
      "allow_null": false
    },
    {
      "name": "num_journeys",
      "partition_index": null,
      "data_type": "integer",
      "allow_null": false
    }
  ]
}

Upload

When you have a schema definition you can use this endpoint to upload it. This will allow you to subsequently upload datasets that match the schema.

Permissions

DATA_ADMIN

Path

POST /schema

Inputs

Parameters Usage Example values Definition
schema JSON request body see below the schema definition

Example schema JSON body:

{
  "metadata": {
    "layer": "default",
    "domain": "land",
    "dataset": "train_journeys",
    "sensitivity": "PUBLIC",
    "key_value_tags": {
      "train": "passenger"
    },
    "key_only_tags": ["land"],
    "owners": [
      {
        "name": "Stanley Shunpike",
        "email": "stan.shunpike@email.com"
      }
    ],
    "update_behaviour": "APPEND"
  },
  "columns": [
    {
      "name": "date",
      "partition_index": 0,
      "data_type": "date",
      "format": "%d/%m/%Y",
      "allow_null": false
    },
    {
      "name": "num_journeys",
      "partition_index": null,
      "data_type": "integer",
      "allow_null": false
    }
  ]
}

Outputs

None

Update (new dataset version)

This endpoint is for uploading an updated schema definition. This will allow you to subsequently upload datasets that match the updated schema.

Permissions

Any relevant WRITE permissions that matches dataset sensitivity level, e.g. WRITE_ALL, WRITE_PUBLIC, WRITE_PRIVATE, WRITE_PROTECTED_{DOMAIN}.

Path

POST /datasets/{layer}/{domain}/{dataset}

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter default layer of the dataset
domain True URL parameter air domain of the dataset
dataset True URL parameter passengers_by_airport dataset title
version False Query parameter 3 dataset version
file True File in form data with key value file passengers_by_airport.csv the dataset file itself

Output

If successful returns file name with a timestamp included, e.g.:

{
  "details": {
    "original_filename": "the-filename.csv",
    "raw_filename": "661c9467-5d0e-4ec7-ad05-b8651598b675.csv",
    "dataset_version": 3,
    "status": "Data processing",
    "job_id": "3bd7d98f-2264-4f88-bd65-5a2089161650"
  }
}