Creating JSON Schema
JSON Schema is a tool used to validate data. In Synapse, JSON Schemas can be used to validate the metadata applied to an entity such as project, file, folder, table, or view, including the annotations applied to it. To learn more about JSON Schemas, check out JSON-Schema.org.
Synapse supports a subset of features from json-schema-draft-07. To see the list of features currently supported, see the JSON Schema object definition from Synapse's REST API Documentation.
In this tutorial, you will learn how to create these JSON Schema using an existing data model.
Tutorial Purpose¶
You will create a JSON schema using your data model.
Prerequisites¶
- You have a working installation of the Synapse Python Client.
- You have a data model, see this example data model.
1. Imports¶
from synapseclient import Synapse
from synapseclient.extensions.curator import generate_jsonschema
2. Set up your variables¶
# Path or URL to your data model (CSV or JSONLD format)
# Example: "path/to/my_data_model.csv" or "https://raw.githubusercontent.com/example.csv"
DATA_MODEL_SOURCE = "tests/unit/synapseclient/extensions/schema_files/example.model.csv"
# List of component names/data types to create schemas for, or None for all components/data types
# Example: ["Patient", "Biospecimen"] or None
DATA_TYPE = ["Patient"]
# Directory where JSON Schema files will be saved
To create a JSON Schema you need a data model, and the data types you want to create. The data model must be in either CSV or JSON-LD form. The data model may be a local path or a URL. Example data model.
The data types must exist in your data model. This can be a list of data types, or None to create all data types in the data model.
3. Log into Synapse¶
syn = Synapse()
4. Create the JSON Schema¶
Create the JSON Schema(s)
schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output_directory=OUTPUT_DIRECTORY,
data_type=DATA_TYPE,
data_model_labels="class_label",
synapse_client=syn,
)
Source Code for this Tutorial¶
Click to show me
from synapseclient import Synapse
from synapseclient.extensions.curator import generate_jsonschema
# Path or URL to your data model (CSV or JSONLD format)
# Example: "path/to/my_data_model.csv" or "https://raw.githubusercontent.com/example.csv"
DATA_MODEL_SOURCE = "tests/unit/synapseclient/extensions/schema_files/example.model.csv"
# List of component names/data types to create schemas for, or None for all components/data types
# Example: ["Patient", "Biospecimen"] or None
DATA_TYPE = ["Patient"]
# Directory where JSON Schema files will be saved
OUTPUT_DIRECTORY = "./"
syn = Synapse()
syn.login()
schemas, file_paths = generate_jsonschema(
data_model_source=DATA_MODEL_SOURCE,
output_directory=OUTPUT_DIRECTORY,
data_type=DATA_TYPE,
data_model_labels="class_label",
synapse_client=syn,
)
print(schemas[0])