Skip to content

Dataset

Upload

Given a schema has been uploaded you can upload data which matches that schema. Uploading a CSV/Parquet file via this endpoint ensures that the data matches the schema and that it is consistent and sanitised. Should any errors be detected during upload, these are sent back in the response to facilitate you fixing the issues.

Permissions

You will need a relevant WRITE permission that matches the dataset sensitivity level, e.g.: WRITE_ALL, WRITE_PUBLIC, WRITE_PRIVATE, WRITE_PROTECTED_{DOMAIN}.

Path

POST /datasets/{layer}/{domain}/{dataset}

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter default layer of the dataset
domain True URL parameter air domain of the dataset
dataset True URL parameter passengers_by_airport dataset title
version False Query parameter 3 dataset version
file True File in form data with key value file passengers_by_airport.csv the dataset file itself

Outputs

If successful returns file name with a timestamp included, e.g.:

{
  "details": {
    "original_filename": "the-filename.csv",
    "raw_filename": "661c9467-5d0e-4ec7-ad05-b8651598b675.csv",
    "dataset_version": 3,
    "status": "Data processing",
    "job_id": "3bd7d98f-2264-4f88-bd65-5a2089161650"
  }
}

Delete

Use this endpoint to delete all the contents linked to a layer/domain/dataset. It deletes the table, raw data, uploaded data and all schemas. When all valid items in the domain/dataset have been deleted, a success message will be displayed.

Permissions

DATA_ADMIN

Path

DELETE /datasets/{layer}/{domain}/{dataset}

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter raw layer of the dataset
domain True URL parameter land domain of the dataset
dataset True URL parameter train_journeys dataset title

Outputs

If successful returns the dataset has been deleted

{
  "details": "{dataset} has been deleted."
}

Delete Data File

Use this endpoint to delete a specific file linked to a layer/domain/dataset/version. If there is no data stored for the layer/domain/dataset/version or the file name is invalid an error will be thrown.

When a valid file in the layer/domain/dataset/version is deleted, a success message will be displayed.

Permissions

You will need a relevant WRITE permission that matches the dataset sensitivity level, e.g.: WRITE_ALL, WRITE_PUBLIC, WRITE_PRIVATE, WRITE_PROTECTED_{DOMAIN}.

Path

GET /datasets/{layer}/{domain}/{dataset}/{version}/{filename}

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter raw layer of the dataset
domain True URL parameter land domain of the dataset
dataset True URL parameter train_journeys dataset title
version True URL parameter 3 dataset version
filename True URL parameter 2022-01-21T17:12:31-file1.csv previously uploaded file name

Outputs

If successful returns the file has been deleted

{
  "details": "{filename} has been deleted."
}

List

Use this endpoint to retrieve a list of available datasets. You can also filter by the dataset sensitivity level or by tags specified on the dataset.

If you do not specify any filter values, you will retrieve all available datasets.

You can optionally enrich the information returned, this will include values like Last Updated Time, Description and Tags.

Required Permissions

None

Path

POST /datasets/

Inputs

Parameters Required Usage Example values Definition
enriched False Boolean Query parameter True enriches the metadata
query False JSON Request Body Consult the docs the filtering query

Filtering Query

Example 1 - Filtering by tags

Here we retrieve all datasets that have a tag with key tag1 with any value and tag2 with value value2.

{
  "key_value_tags": {
    "tag1": null,
    "tag2": "value2"
  }
}

Example 2 - Filtering by sensitivity

{
  "sensitivity": "PUBLIC"
}

Example 3 - Filtering by tags and sensitivity

{
  "sensitivity": "PUBLIC",
  "key_value_tags": {
    "tag1": null,
    "tag2": "value2"
  }
}

Example 4 - Filtering by key value tags and key only tags

{
  "sensitivity": "PUBLIC",
  "key_value_tags": {
    "tag2": "value2"
  },
  "key_only_tags": ["tag1"]
}

Outputs

Returns a list of datasets matching the query request, e.g.:

[
  {
    "layer": "layer",
    "domain": "military",
    "dataset": "purchases",
    "version": 1,
    "tags": {
      "tag1": "weaponry",
      "sensitivity": "PUBLIC",
      "no_of_versions": "1"
    }
  },
  {
    "domain": "military",
    "dataset": "armoury",
    "version": 1,
    "tags": {
      "tag1": "weaponry",
      "sensitivity": "PRIVATE",
      "no_of_versions": "1"
    }
  }
]

If no dataset exists or none that matches the query, you will get an empty response, e.g.:

[]

List Raw Files

Use this endpoint to retrieve all raw files linked to a specific layer/domain/dataset/version, if there is no data stored for the layer/domain/dataset/version an error will be thrown.

When a valid domain/dataset/version is retrieved the available raw file uploads will be displayed in list format.

Required Permissions

None

Path

GET /datasets/{layer}/{domain}/{dataset}/{version}/files

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter raw layer of the dataset
domain True URL parameter land domain of the dataset
dataset True URL parameter train_journeys dataset title
version True URL parameter 3 dataset version

Outputs

List of raw files in json format, e.g.:

["2022-01-21T17:12:31-file1.csv", "2022-01-24T11:43:28-file2.csv"]

Query

Data can be queried provided data has been uploaded at some point in the past. Large datasets are not supported by this endpoint.

Required Permissions

You will need READ permission appropriate to the dataset sensitivity level, e.g.: READ_ALL, READ_PUBLIC, READ_PRIVATE, READ_PROTECTED_{DOMAIN}.

Path

POST /datasets/{layer}/{domain}/{dataset}/query

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter raw layer of the dataset
domain True URL parameter space domain of the dataset
dataset True URL parameter rocket_launches dataset title
version False Query parameter '3' dataset version
query False JSON Request Body Consult the docs the query object

Outputs

JSON

By default, the result of the query are returned in JSON format where each key represents a row, e.g.:

{
    "0": {
        "column1": "value1",
        "column2": "value2"
    },
    ...
}

CSV

To get a CSV response, the Accept Header has to be set to text/csv, this can be set below. The response will come as a table, e.g.:

"","column1","column2"
0,"value1","value2"
...

Query Large

Data can be queried provided data has been uploaded at some point in the past. This endpoint allows querying datasets larger than 100,000 rows.

The only download format currently available is CSV

Required Permissions

You will need a READ permission appropriate to the dataset sensitivity level, e.g.: READ_ALL, READ_PUBLIC, READ_PRIVATE, READ_PROTECTED_{DOMAIN}.

Path

POST /datasets/{layer}/{domain}/{dataset}/query/large

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter raw layer of the dataset
domain True URL parameter space domain of the dataset
dataset True URL parameter rocket_launches dataset title
version False Query parameter '3' dataset version
query False JSON Request Body Consult the docs the query object

Outputs

Asynchronous Job ID that can be used to track the progress of the query. Once the query has completed successfully, you can query the /jobs/<job-id> endpoint to retrieve the download URL for the query results

Dataset Info

Use this endpoint to retrieve basic information for specific datasets, if there is no data stored for the dataset and error will be thrown.

When a valid dataset is retrieved the available data will be the schema definition with some extra values such as: - number of rows - number of columns - statistics data for date columns

Required Permissions

You will need any READ permission, e.g.: READ_ALL, READ_PUBLIC, READ_PRIVATE, READ_PROTECTED_{DOMAIN}.

Path

GET /datasets/{layer}/{domain}/{dataset}/info

Inputs

Parameters Required Usage Example values Definition
layer True URL parameter raw layer of the dataset
domain True URL parameter land domain of the dataset
dataset True URL parameter train_journeys dataset title
version False Query parameter 3 dataset version

Outputs

Schema in json format in the response body:

{
  "metadata": {
    "layer": "default",
    "domain": "dot",
    "dataset": "trains_departures",
    "sensitivity": "PUBLIC",
    "version": 3,
    "tags": {},
    "owners": [
      {
        "name": "user_name",
        "email": "user@email.email"
      }
    ],
    "update_behaviour": "APPEND",
    "number_of_rows": 123,
    "number_of_columns": 2,
    "last_updated": "2022-03-01 11:03:49+00:00"
  },
  "columns": [
    {
      "name": "date",
      "partition_index": 0,
      "data_type": "date",
      "format": "%d/%m/%Y",
      "allow_null": false,
      "statistics": {
        "max": "2021-07-01",
        "min": "2014-01-01"
      }
    },
    {
      "name": "num_journeys",
      "partition_index": null,
      "data_type": "integer",
      "allow_null": false,
      "statistics": null
    }
  ]
}