Skip to content

How do I use SMS?

Submitting a SNP file

  • The service requires a GLH to programmatically upload a VCF file containing SNP genotypes for each patient sample in a referral (there is no Graphical User Interface).

  • GLHs must create a client to create and submit VCFs to the API. See Sample Matching Service – Access and Connectivity Test.

  • GLHs may choose to name their local VCF file at their discretion since the service requires the VCF content to be part of the request body.

The service supports the following methods:

PUT – for creating or updating a VCF file
  • A PUT VCF will be stored until replaced by a future PUT action for a sample.
  • A successful PUT is returned only when the format of the syntax is validated, the content is not checked.
GET – for retrieving the current uploaded VCF file which matches the specified identifiers within the parameters
  • This could be used by a GLH to make a record of the current VCF prior to a PUT request.
DELETE – remove the current uploaded VCF file which matches the specified identifiers within the parameters
  • The VCF is not backed up so once a DELETE is successful that VCF is no longer available.
  • This delete is a “soft delete” with the exception of withdrawals for audit purposes

Access and Connectivity Testing

VCF requirements

VCF requirements

Files should be submitted using the VCF 4.2 specification.

  • Each VCF must represent one sample
  • Each VCF must contain genotype information for a minimum of 24 SNPs, including SNP calls to the reference base.
  • Upon submission of the VCF (via a secure REST web service) the following parameters will also be included as path parameters:
    • The human readable NGIS Referral ID (12 characters)
    • The human readable NGIS Patient ID (12 characters)
    • The dispatched_lab_sample_id from GeL1001 CSV
  • It is preferred that GLHs upload SNP VCF files with the ALT allele populated even at homozygous reference (wild type) positions (i.e. not missing or “.”) to avoid any potential for ambiguity in the VCF comparison. If this is not possible, the SMS can handle homozygous reference ALT alleles specified as missing (“.”).
  • Additionally, it is preferred all VCF entries are biallelic i.e. only have a single ALT allele specified. The service will attempt to decompose multi- allelic sites into the biallelic components but it is not expected they will be provided.

The Sample Matching Service will verify that the IDs provided are valid in TOMS.

  • Variants must utilise the GRCh38 genome reference.

  • GLHs should represent their alleles and positions using the same reference file as Genomics England

  • Chromosome names should be prefixed with "chr". The mitochondrial genome is prefixed as "chrM". There following values are permitted: {chr1, chr2, chr3, chr4, chr5, ..., chr21, chr22, chrX, chrY, chrM}.

The header shall include the following tags:

##fileformat=VCFv4.2
##fileDate=<date the file was produced: eg. 20090805>
##source=<data source: e.g. mygenotypingplatformV3.4>
##reference=<name of the fasta file used>
##contig=<as per the specification>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

The SMS does not check that the field in the header of the VCF is the same as the “dispatched_sample_lsid” field, but it is advised that this is recorded here for provenance.

However, the SMS does take the “dispatched_sample_lsid” value from the url parameters from the “PUT /sample_vcf” request and checks this against the TOMs API to confirm that there has been a GEL1001 csv entry for the same “dispatched_sample_lsid”.

For example, a VCF header where 123456789 is the dispatched_sample_lsid value for provenance:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 123456789

  • The source field must reflect the standardised names of the assays used for genotyping so that the assay and version used are easily identifiable.
  • Genotypes must contain the GT
  • It is highly desirable, but not mandatory, to include the Genotype Likelihood PL field where possible. This allows a more sophisticated concordance check.
  • It is desirable, but not mandatory, to add a genotype quality
  • Only biallelic SNVs are permitted. The ALT field must not contain more than one value.
  • Only PASS variants will be used.
  • Variants should be normalised, i.e. parsimonious and left-aligned.
  • Variants should be sorted by reference contig name and position.
Dealing with duplicate submissions

Dealing with duplicate submissions

If genotypes for the same sample are re-submitted either on purpose or by mistake, the latest genotype record will overwrite the previously stored variant calls.

It is possible to resubmit after the case has gone to DSS (the case will not be dispatched to the DSS in the first place without a SMS match), but if for any reason the GLH wants to resubmit a new set of SNPs for a patient, the new set will override what was sent previously .

If the SMS check fails, the case will appear in the "Identity Check Failures" tab in the Interpretation Portal, however the DSS link-out button to the case in the Interpretation Portal will still be active - please be aware of this when resubmitting a case.

Warning

Although there is a minimum requirement for 24 SNPs to be included in the VCF uploaded, the comparison is based on a probabilistic model. Therefore, in order to pass the SMS check the combination of uploaded sample SNPs have to reach a specific probability threshold to rule out that there has been a sample mix up.

If the GLH SNP set is available, the SMS will perform the check that the genotypes match, which is triggered by the CIP-API, prior to dispatch to DSS.

For more information on how a “match” between the GLH generated SNP set and the Genomics England generated SNP set is determined, please see Sample Matching Methodology.

Manually overriding the SMS

"How do I manually override the SMS?"

Where the GLH cannot upload a complete set of SNP VCF files for a referral (please contact the Genomics England Service desk - www.bit.ly/ge-servicedesk - if this is the case), a “no_snp_sample” query parameter can be used on the PUT request on all eligible patients in the referral to enable the referral to keep flowing to the Decision Support System. This query parameter can also be used where the SNP sample set submitted by the GLH has not passed, but the GLH wishes for the sample to continue to the DSS.

If the GLH has investigated, performed a risk assessment to determine the likelihood of a sample mismatch, is content to continue with the data analysis without SNP genotyping comparison data and has approval form NHS England, then the GLH can request to override the Sample Matching Check by contacting Genomics England service to progress the case into Decision Support Services.

GLHs must manually compare the the request in the portal IF the GLH VCF file was not submitted before the case enters the Interpretation Portal.

For specific detail on the APIs and syntax related to their calls, please see the swagger documentation generated from the openapi.yaml

The API will respond with standard HTTP error codes as described in the swagger documentation. No comparison is presented via the API.

Currently the SMS expects to be programmatically used to upload the GLH SNP VCF files. Passed, Pending and Failed SMS results will impact the status of the case in the Interpretation Portal (e.g. only referrals where all SMS checks pass will be sent to the Decision Support Systems (DSS)).

For further information please refer to the GMS Interpretation Portal User Guide


Last update: 2022-11-15
Back to top