Loading...
 

Genetic Marker

Note: This page has some duplication and inconsistency. A clean-up pass is needed.

Genetic Markers are what we test for in DNA. They are areas where there are known differences in DNA among the human population. For genetic genealogy, we only generally test for two main types. Single-Nucleotide Polymorphisms (or SNPs) and Simple Tandem Repeat (or STRs). With the caveat that some more complex InDel markers are tested and reported but lumped in with the former two.

Each diploid cell in our body contains over 6 billion base-pairs in the DNA with an estimated 20 million of those base-pairs varying between any two of us. Over 600 million markers are identified in the dbSNP SNP Database. On average, a person will have 3-6 million simple variants when compared to the reference model. On average, about 10% of the returned values from a microarray test will be derived (not ancestral from the reference model). There have not been enough people full-sequenced and compared to fully understand the extent of the real differences between humans. Although work indicates there can be some large variation. (Nor is the sequencing technology developed enough to generate full, accurate sequences of all your DNA.) In fact, the constantly "updating" Genome Build still has gaps and questionable areas. Biology is messy even though scientists try to bring order and classification into it. Some variances may be large (millions of base-pairs in size).

Not all markers exist in all people. Nor are they all necessarily tested for in each test. Or actually found in each individual even with the same test from the same company. Partly due to the testing process and partly because of what changes have happened near the marker that may disrupt finding the marker. Also partly due to the statistical nature of the testing process.

Biological science now coalesces around the term variant to describe all markers (or basically any detected differences). Historically, they were termed mutations . Most often, this variance is as compared to a human genome reference model. But can be simply when comparing two individuals to each other and determining how they vary from each other; without reference to a common model. Lets put some numbers to it. They widely report that humans share 99.5% of their DNA with each other. As we each have nearly 6.4 billion base pairs, this implies we have 32 million base pairs of difference. In reality, we tend to measure around 4 to 6 million differences between any two individuals from the around 30 million identified variants of people from the reference model or majority of the population. Why dbSNP has over 600 million in its catalog, representing 10x this observed variance, is a mystery at this time.

Variant's, as delivered from NGS testing, come in 4 general forms and sometimes separate files or combined in the same:
  1. Single Nucleotide Variants (SNV's) that are generally limited to a change in one or a few values of a single base-pair and what we have been calling SNP's here
  2. Insertions and Deletions (InDel's) of up to 5 base-pairs (something we have not talked much about but comprise nearly 1% of the values in a microarray file)
  3. Copy-Number Variants (CNV's) of which STRs are a special case, and
  4. Structural Variants (SV's) where there are very large sections of major changes that are consistent among certain populations (and the source of some alternate contiguous regions, or alt-contigs, in the alignment / analysis versions of the human genome reference model).

General biology distinguishes between these many types of variants that are observed. The SNPs and STRs generally tested for in genetic genealogy are simply special forms of SNVs and CNVs defined in the genetics field. SNPs are simple, changes in a single base-pair. Or at most 5 base-pairs STRs are simply observed areas of repeated patterns where the repeat length of the pattern is varied between people. Beside these general Single Nucleotide Variants (SNVs) and Copy Number Variants (CNVs), there are also Structural Variants (SVs) that represent larger (often greater than 1,000 base-pairs) changes in a DNA segment. A special category are Insertion / Deletions (or InDels) when smaller than 5 base-pairs in size. They are often simply reported as SNPs and defined in those databases. Much larger InDels may be termedSVs but have further clarifying properties like Translocation and Inversion. But these SVs really only apply to health analysis. For the most part in genetic genealogy, SNPs with a few InDels are the dominant form checked and reported; with STRs a special case used in yDNA testing for haplogroup determination.

The marker result may be not-tested (inconclusive) but also can be a "null" value. The testing process can often have limitations to finding the marker when there is additional variance in and around the markers location in the DNA strand. Sometimes the test process can detect this and report it as a "null" instead of not-tested.

SNPs are generally a single base-pair change in value and reported as the base-pair value (or allele). Sometimes as the name for the SNP followed by a plus sign (+); implying "positive for change" or a derived value. It will have the opposite paired base-pair value from the "negative for change" (-), ancestral base-pair value. For example, L20+ implies "positive for change" ('T' instead of ancestral 'A') for the SNP named as L20 on the yDNA chromosome located at base-pair position 12,110,586 on the GRCh38 model of the human genome reference model. Often, if it could not be measured, the allele is reported as a dash ('-'). Note that a dash in place of 'A' or 'T', not as an annotation on the name. Occasionally, the name will have a '.n' trailer which means this change is another occurrence in the population when tracked and placed on the phylogenetic tree. Similarly, a '.n' specification can be used to reference someone who is ancestral in their value but, based on their placement on the phylogenetic tree, they represent a group that had the SNP change back. See the YCC page for more detailed naming. We have focused more on the yDNA naming here as autosomes have longer, more complicated names like the "rsID"s..

STRs are the repeat pattern and reported as a count of instances. If null or 0, it could not be measured. Multi-copy markers are named in groups and reported with multiple values. With no distinguishing indication of which value applies to which copy of the marker on the genome.

See Also

SNPs, STRs, Null Allele

External Resources