Loading...
 

dbSNP, rsID, NIH, etc

dbSNP is a central database maintained by the USA National Institute of Health (NIH). In their own words:
dbSNP contains human single nucleotide variations, microsatellites, and small-scale insertions and deletions along with publication, population frequency, molecular consequence, and genomic and RefSeq mapping information for both common variations and clinical mutations.
As can be seen by the above definition, this includes more than just a strict definition of SNPs but also InDels, STRs and similar smaller-scale variations.

In contrast, here is NIHs definition of their companion dbVar database:
dbVar is NCBI's database of human genomic Structural Variation — large variants >50 bp including insertions, deletions, duplications, inversions, mobile elements, translocations, and complex variants
Large scan variations often involved in translocations are covered in their companion dbVar database.

Most variations are defined in one or the other of these two databases.

rsIDs are defined by the dbSNP database and is a short acronym for reference SNP ID's. A specific chromosome and position within it (per defined reference model) is often used to define an rsID. Along with an ancestral and one or more derived values. When more than one base-pair is involved, or it is an InDel, then the location specified in a file can be dependent on who wrote the file. But is fixed in the reference database here. Most commonly, the multi-base-pair sequence is considered left aligned in the forward strand. Meaning the start or lowest count value in the sequence of base-pairs is used.

The rsID is commonly used in microarray file formats and annotated VCFs along with the chromosome name and position within. Only the position is usually unique and different depending on the reference model being used. Most basic VCF files delivered by WGS test companies have not annotated the VCF to include the rsID nor possible gene region it may reside in.

Studied rsIDs are given descriptive SNP names and often located within a named gene. Names used to be defined only in curated, refereed journal publications that introduced or first identified them. Now, especially in the yDNA haplogroup arena, it turned into a wild-west race to see who could grab the most territory for defined names. Leading to falsely claimed names of what are not really variants (or reliable areas to read the genome). And the action of naming a variant before it is even submitted and curated in dbSNP with an rsID.

Everything associated with genetics is often under the purview of the USA National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), that is part of the National Institute of Health (NIH) that reports into the Department of Health and Human Services (HHS).

A near parallel activity, that acts in concert with the NIH, is the European Variant Archive (EVA) centered with the European Bioinformatics Institute (EBI) at the Eurpoean Molecular Biology Laboratory (EMBL). Although currently EU funded, this organization is located on the Welcome Genome Campus located near Cambridge University and the Sangar Institute. EVA is part of the ELIXIR hub of cooperating repositories of public data surrounding the biological sciences. ENSEMBL is another arm of EBI that handles the reference genome models.

External References