Skip to content

What is SMS?

Introduction

The Sample Matching Service (SMS) provides an identity check for DNA samples undergoing WGS in the GMS.

As part of the Genomics England bioinformatics pipeline, comparisons are made between the clinical data reported for a referral and the corresponding data inferred from the genomic sequencing data to identify inconsistencies (Genomic and Data checks). SMS adds an additional check as the Genomic and Data checks do not cover all scenarios where there could be a possible sample swap in the sequencing and analysis process (for example, between singletons or siblings of the same sex).

  • Through SNP genotyping, SMS ensures that Rare Disease samples processed by Illumina and Genomics England match patient samples at the GLH, i.e. the SMS will identify any sample swaps or data inconsistencies that arise in between the sample leaving the GLH and the result being returned to the GLH.

  • GLHs perform a SNP genotyping experiment on a locally stored DNA sample, then provide the SMS with the genotypes that it then compares against the WGS data for the same sample.

  • Based on the genotypes provided, SMS calculates the probability of there being a sample mix up.

  • If a patient sample passes the SMS check the probability that there has been a sample mix up is < TBC, and a check-circle symbol will be shown next to the sample in the patient information. If all samples in a referral pass SMS check then the referral automatically dispatches to DSS.

  • If a patient sample fails the SMS check the probability of a sample mix up is higher, this will be shown as a close-circle symbol and the referral moved to the Identity Check Failures tab of the Interpretation Portal.

  • Samples which have yet to have had a GLH SNP VCF file uploaded or require a manual trigger of the SMS check will be given the pending or sync symbol.

Workflow of the process

The requesting GLH should submit a VCF containing at least 24 SNPs for each patient sample in a referral through the SMS API (information on how to submit SNP VCFs can be found here). The Genomics England WGS bioinformatics pipeline uses "forced genotyping" (see forced genotyping in here for more info) on the WGS VCFs over a similar set of SNPs for each sample.

Once the referral reaches the CIP-API, a comparison between the GLH submitted SNPs and the Genomics England generated SNPs is triggered.

If the SMS check passes the comparison, the case will move into the process of dispatch to the DSS as normal.

If SMS check fails the comparison, then the case is not sent to the DSS and moves to the "identity check failures" tab.

If SNP VCFs are not submitted by the GLH, the SMS is “pending” and the referral will not be sent to DSS. GLHs can trigger the comparison manually in the Interpretation Portal if this is the case, once they have submitted their SNP VCF.

If a referral raises a query in the Genomic and Data checks, Genomics England will trigger the SNP check and If the SNP check fails, the sample is reported in the Sample Failure report.

image

How does the check happen?

The Whole Genome Sequencing pipeline at Genomics England creates a SNP VCF for each sample in the referral, which will be compared to the GLH submitted SNP VCF.

SMS comparison can be triggered by one of two events:

  1. When a referral successfully loads into the CIP-API, the CIP-API triggers the comparison.
  2. When a GLH user logs into the Interpretation Portal, then they can manually trigger a comparison (for example, as part of a resubmission of an erroneous SNP set to correct a failure) by clicking on the icon beside the sample ID in the Rare Disease case page.

The check will be performed against the version of the genotype file currently stored on the SMS at time the comparison was requested i.e. the submitted SNP VCF will be compared to the SNP VCF generated by Genomics England when the referral first arrives in the CIP-API and if a GLH triggers a comparison manually.

The CIP-API performs only one compare query to the SMS at the time the referral is initially ingested into the CIP-API. This means that if any SNP files uploaded to the SMS are done after a case is available in the GeL Interpretation Portal, a user must manually trigger the comparison in the Interpretation Portal for the latest results to be displayed.

Sample Matching Methodology

Image title

Probability of match by someone else

  • Assuming the two individuals are from the same ancestry.
  • Probability for a single biallelic SNP.
  • P(match): The total probability of a match given matching against another individual is the product of these probabilities per SNP.

Probability of matching to another individual

As a rule of thumb, for well balanced SNPs (MAF=0.5), 1/probabilities of matching against a sibling as a function of the number of SNPs are shown in the table below.

N siblings unrelated
10 184 18,184
15 2,489 2.45E+06
20 33,723 3.31E+08
25 456,993 4.46E+10
30 6,192,861 6.01E+12

Estimates for matching to all 24 SNPs for siblings and for unrelated individuals for various ancestries is shown in the table below.

Pops sibling unrelated
GNOMAD_GENOMES-EAS 106,250 4.34E+09
GNOMAD_GENOMES-AMR 158,610 7.83E+09
GNOMAD_GENOMES-ASJ 131,144 5.79E+09
GNOMAD_GENOMES-FIN 134,919 6.01E+09
GNOMAD_GENOMES-NFE 170,439 8.67E+09
GNOMAD_GENOMES-AFR 161,553 7.97E+09

Comparison Method

  • Compare those SNPs that are common in both the WGS and the SNP typing data.
  • Calculate probabilities for all ethnicities according to formula (Formula 1.1 - highlighted in grey) for siblings and unrelated individuals. Fails if siblings probability of matching by chance for any ethnicity is > 1 in ?TBC 50,000.
  • It will return the probabilities for all ethnicities and whether it passes or not according to the threshold above.

image


Last update: 2022-11-16
Back to top