BioSamples Metadata Model
The BioSamples repository stores and displays metadata about samples to enable their discovery and re-analysis. Each sample receives a unique sample accession, which can be referenced in other archives such as ENA, EGA, and PRIDE, increasing findability and interoperability through cross-referencing.
Samples for BioSamples only have a few mandatory fields.
sample name
release date (publication date for the sample)
organism (must be in NCBI Taxonomy)
Partners should submit rich metadata where possible as this will enable discovery and reuse of registered samples. Submitters may add as many custom metadata attributes as desired, which will be indexed and searchable in BioSamples.
Sample Checklists
To increase standardisation and ensure that each sample is registered with at least a minimum amount of metadata, ENA provides Genomics Standards Consortium (GSC) Sample Checklists. These each provide a minimum set of mandatory attributes which are required for a particular environment for an ENA submission. There are also recommended and optional attributes available. It is possible to update your samples with the appropriate metadata later. If you cannot provide a value for a mandatory field, please see Reporting Missing Values for the appropriate values.
Note
Registering a BioSample with an ENA checklist is a requirement for submitting data related to this sample to ENA.
These checklists are developed in collaboration with different research communities to ensure that they are relevant and realistic for their context. When registering a sample, it is important to choose the most relevant sample checklist available and provide the most metadata possible.
Checklists are maintained in collaboration with the ENA team and are available in the JSON Schema Store. Submissions are automatically validated against their selected checklist via bioValidator at the time of submission or curation. This ensures that key fields are present and consistent.
Sample Relationships in BioSamples
Sample relationships describe the relationship between two BioSamples. The relationships can be submission, technical, or biological relationships. It links different samples together and supports relationship-based graph searches. The sample relationship is submitted to BioSamples by providing the source, type, and target. Below is an example of sample relationships in BioSamples.
Please note that the direction of relationships should always start from the source to the target. For example, if adding a sample relationship to a sample with accession SAME123456, the ‘source’ should always be SAME123456.
"relationships" : [ {
"source" : "SAMEA1111111",
"type" : "derived from",
"target" : "SAMEA2222222"
}, {
"source" : "SAMEG00000",
"type" : "has member",
"target" : "SAMEA1111111"
} ]
When the submitter provides relationship information in one sample, the reverse relationships in corresponding samples will be generated automatically. BioSamples does not validate the type, direction, or the logic of the relationships. BioSamples currently supports four types of sample relationships
Relationship types |
Reverse relationships |
Description |
|---|---|---|
|
|
|
|
|
Sample A is the same as Sample B. This can be used to link duplicated samples. |
|
|
|
|
|
Sample A is the negative control of Sample B. e.g. |
Sample Dates
BioSamples keeps records of different dates related to the sample lifecycle. The dates can be generated either by the data repositories or by the data submitters for data exchange or experiment purposes.
Date type |
Description |
|---|---|
|
The earliest date at which valid metadata has been provided by the submitter. This attribute is generated by BioSamples and other INSDC partners. |
|
The user-supplied date at which the sample metadata is made publicly available for the first time. |
|
The date at which a new curation object has been created or automatic curation pipelines have been run on the sample metadata. This field is only present if at least one curation object has been added by the curation pipelines. The “last reviewed” date is updated when the curation objects are reviewed—even if they are found still valid and unmodified—and indicates that the sample is compliant with the latest BioSamples curation rules. See Submit curation object. This attribute is generated by BioSamples. |
|
You might see additional dates or timestamps in the sample’s |
Reporting Missing Values
The International Nucleotide Database Collaboration (INSDC) has a standardised missing/null value reporting language to be used where a value of an expected format for sample metadata reporting can not be provided.
The controlled vocabulary takes into account different types of constraints. Submitters are strongly encouraged to always provide true values. However, if a missing/null value reporting is required, submitters are asked to use a term with the finest granularity for their situation. See the table below for accepted missing value reporting terms.
Value |
Definition |
|---|---|
|
Information was not given because it has not been collected, and will always be missing. |
|
Information may have been collected but was not provided with the submission. It may be added later. |
|
Information exists but cannot be released openly because of privacy or confidentiality concerns. |
Important: Any other placeholder values (such as n/a, na, n.a, none, unknown, --, ., null, missing, not reported, not requested, not applicable, not specified, and not known) should not be used and must be removed from submissions. If included, these will be eliminated during automatic curation.