Short Tandem Repeat (STR)

Short Tandem Repeat (STR) is a marker used to measure the difference in the DNA of one individual from the other.  Currently, the STR markers are all defined on chromosomes; none exist in the mitochondria.  An STR is a repeat of 1 to 5 base-pair values and also known as a Microsatellite. Larger numbers of base-pair repeat sequences go by another term.  The number of repetitions (or count) defines the marker’s value. Examples are ...AAAAAAAAAA... for a count of 10 or maybe ...CATACATACATACATA... for a count of 4.

STR markers in the Y chromosomes (yDNA) are what we care about in Genetic Genealogy and play a vital role in surname studies such as this one.  For it is the yDNA which gets passed down each generation; relatively unchanged. Thus providing a useful patrilineal identification technique that happens to also match the surname inheritance in most European populations.

A Haplotype is a collection of all the STR marker counts a tester has. Common Haplotypes between testers usually imply a common ancestor; unless there has been a re-convergence of STR values between different patrilineal lines. So SNP values should be checked to see if they match before confirming there are common haplotypes. If different SNP values, then the STR values re-converged.

SNP values define a Haplogroup and are not to be confused with STR values and the Haplotype they define.

STR markers on the autosomal chromosomes are used for forensic and criminal identification and NOT Genetic Genealogy.  The US Government’s CODIS database is a catalog of 12 to 20 STR markers on the autosomes.  There is no overlap between Genetic Genealogy Testing and criminal, government tracking (at this time).

Most STR markers are in the inter-gene (i.e. non-coded or possibly “junk-DNA”) region; as are most base pairs in general.  

By definition (or design when a marker is found and chosen for inclusion), the STR markers tested change count infrequently.  STR markers that change randomly every generation are considered too noisy and not used.  A number of the yDNA STR markers, commonly defined in the range 26 to 37 by FTDNA, are on the fringe of usefulness. But they are still included in testing panels and results.  STR markers that rarely change (say, much less than every five thousand years) are considered too much like SNP marker values and not of use either.  Rarely changing marker values will tend to be shared by too many people.  Frequently changing do not track with the surname or patrilineal line. So scientists look for just the right “frequency” of change in a marker to make it useful for Genetic Genealogy purposes.  The more varied the marker value among the general population then the more valuable the marker. Once confirmed haplotypes and genealogy converge between multiple testers, slight variances in marker values can be used to differentiate family lines in a genealogical time frame.

The STR markers are processed by a testing lab in “panels”.  Usually 12 or so markers in each panel.  The markers are labeled in various ways; usually by the discoverer.  The more common, adopted nomenclature is DYS and then a number. The middle letter, if seen as a number, is defining a non-Y STR marker value and not one on the Y chromosome. We sometimes see them labeled by a single numeric value; often simply the order FamilyTreeDNA added them into their testing product and report on them.  FTDNA started with a yDNA 12 marker panel, expanded to 25 markers in two panels, then 37 markers in three panels.  FTDNA are now up to 111 markers in an estimated 9 to 10 panels.

There are well over 450 STR markers identified in the Y chromosome to date. yFull, when processing an NGS test like BigY, can extract most of the values for most of these markers (see the LobSTR tool description).  This technique of extracting values does not appear as accurate as the specific panel technique done by FTDNA. We are analyzing the accuracy of yFull STR extractions in this project. It appears the Compound (or duplicative or multi-copy) markers are the ones most difficult to determine from the BAM file analysis.

STRs are known as Simple Sequence Repeats (SSRs) in some communities and Microsatellites in general. This is distinctive from SNPs.

External Resources