Consanguinity

Consanguinity simply means kinship with someone else; a blood relationship in historical terms. Someone you share an ancestor with (that is, you share a MRCA). In legal and religious circles, they look to define a measure of how close a relationship is to specify laws and rules governing prohibited behavior with close blood relations. This measure of closeness is called the degree of relationship (aka the degree of kinship) and becomes a mathematical measure of consanguinity.

"Coefficient of relatedness" by Citynoise - Own work. Licensed under CC BY-SA 4.0 via Commons - https://commons.wikimedia.org/wiki/File:Coefficient_of_relatedness.png#/media/File:Coefficient_of_relatedness.png

Coefficient of Relatedness
adapted from Citynoise on Wikimedia

This degree of relationship is based on the number of generations or meiosis events from each side to the (MRCA). But in genetic genealogy, we use a slightly different measure. The genetic genealogy measure reflects the fact that full siblings are inheriting DNA from both parents. As a result, they appear one step closer.

The diagram to the right better illustrates this DNA-based measure showing a distance of 1 for parents, children and full siblings. And then expanding out in waves from there. Each wave represents 1/2 as much atDNA inherited as you only get 1/2 from each parent DNA. Do note that after the first distance of 1, their are statistical variances from the simple model values here. The simple model is more the expected average without further refinement.

Wikimedia also has a Table of Consanguinity by Sg647112c which utilizes a more traditional genealogical chart to express this degree of relationship in a path form. That table is expanded and adapted here. Changed to show the genetic genealogy degree of relationship measure (or generation count). As well as other clarifying material. A dotted line shortcut is added between full siblings in the chart and the distance to them is thus short-cut and reduced by one. Subsequent generations below are then one closer as well.

For Half siblings (and any with a relationship to a common ancestor through half siblings as the first step below the common ancestor), you remove the shortcut and add one to the degree of relationship shown here. This because half-siblings only share a single parent and thus 1/2 the DNA that full siblings do. (Full-siblings share 50% of their DNA with each other, half-siblings only 25%.)

The grayed boxes represent areas where the detectable matching of autosomal DNA between cousins is more unlikely than likely to occur. Generally, below 0.1% total matching across the autosomal DNA or less than a 7cM length for a matching, longest segment. This corresponds to roughly 10 generation events and just beyond 4th cousins. Note that you may still match out as far as 8th cousins or so. Just the likelihood is greatly reduced each further generation out. You are only guaranteed, by empirical study, to match 2nd cousins or closer.

The light yellow boxes represent generations greater than 3 away and thus ones you are not likely able to test and compare against using the recently available DNA test technology. But, sometimes generations can quickly get separated between different family lines. We have a case of a sticky-segment match between 1st cousins, 5x removed. In summary, for the most part, the colored (non-grey) boxes represent those you will likely match with using autosomal DNA testing technology.

The traditional Table of Consanguinity with the degree of relationship modified to for Genetic Genealogy illustration purposes.. v2 updated after 4 years to add color to make it easier to match the match strength to the table entries.

Table of Consanguinity
adapted from Sg647112c on Wikimedia

Many in the community have since developed charts similar to what we developed above (DNA Detectives, DNA Adoption, etc.). Data from a few of these various sources has been put together into the Shared cM Project Tool by the DNA Painter developer. We mention it because, although the data underlying the tool has issues and warts, the tool is overall more helpful and explanatory to the lay user and thus many of our newer members may find it easier to use. Most of the caveats described here apply to the tool and other charts as well. See the references section at the end for more on the tool and its underlying data. They especially do a nice job of reporting the overlapping likelihoods as taken from the Ancestry white paper. And of making more explicit in the chart the half sibling relationships that we simply mention in words here.

Key to remember is that a half sibling relationship only matters when situated immediately below the common ancestor. That is, you share a single common ancestor parent (first) and not two parents. A possibly better way to think of it and represent it in the chart than the dotted line short cut, is to show two lines up to the parents and two down from the parents. This shows the doubling of DNA strength represented by the fact that their are two parents in common; not just the one depicted in the chart. For a half sibling, one of those lines gets dropped depending on which parents are represented.

Average versus Actual shared DNA

The chart estimate above is based on a predicted, Mendel-style "perfect" dilution of the autosomes. It ignores the fact that there are 44 discrete autosomes of greatly varying length. Which disrupts this perfect estimate. Remember that Chromosome 1 is five times larger than chromosome 22 and thus which chromosome passes down from which ancestor contributes to a wide variance in matching strength. Additionally, there is something called cross-overs or recombination that occurs during meiosis and causes a mixing of the DNA strands one inherits from the grand-parents through your parent. This actually tempers or lessens the variance seen with the discrete chromosome strand inheritance by making chromosome strands in the child be more of a mix of the strands from the grandparents. Finally, the perfect dilution is also based on a new, outside (not related through other ancestors) individual contributing 1/2 their DNA each generation. Sharing more than a single ancestor is termed Endogamy and discussed elsewhere.

Although matching after 10 generations is more unlikely, discernible matching may still be found up to 15 generation events out. This is due to some matching segments of a chromosome being less susceptible to breaking up during recombination in meiosis. The shorter the matching segment, the less chance it has of being split. Once a segment gets down to 20 cM or less in length, it can be passed down intact for many generations. The farther beyond 10 generations, the rarer a matching segment from a common ancestor remains simply due to statistical chance. The chance the segment is not included in the 1/2 of DNA passed from a parent. Up until a 10 meiosis events count, you are more likely than not to detect a segment match in the chromosomes between related individuals.

Shared DNA has a lot of statistical variance through the generations of inheritance. Although a parent and child will always share (roughly) 50%, there are no guarantees beyond that. Although highly, highly improbable, realize that two full siblings can possibly NOT share ANY matching DNA segments in their chromosomes (1-23). So even the possible variance with siblings matching is 50% plus or minus 50%. But the actual variance, as measured by the standard deviation, is +- 5% and very rarely will the match strength be outside the range of double that standard deviation.

When adding up the length of all autosomal matching segments (defined by at least 500 SNPs and a minimum length of 5cM) to form a total matched length, one can estimate the degree of kinship from the resultant value.

So far, we have been talking about actual average shared DNA. But now we need to start translating that into what autosomal genetic genealogy tools measure and report as well as deviations from that average. For a reported 200cM or more of shared DNA (corresponding to roughly 2nd cousins and closer or a degree of kinship), you can determine the actual degree of kinship with a pretty strong certainty. Below 150 cM, roughly, it is much more of a guess as to what the degree of kinship is just knowing the total match strength. When below a 40 cM total match strength, it is a total guess as to the degree of kinship as 5 or more relationships are equally likely. So if you have a match strength of 30cM, your degree of relationship is just as likely to be 6, 7, 8, 9, or even 10.

This variance in possible relationships is because the standard deviation from 8 overlaps the expected average of the other values (and vice-versa). Not to mention other effects like false segment matching that creeps in. (A false match is a percentage of matching DNA when no real relationship between the matching testers exists. Mostly due to the measure of DNA that cannot phase the results.) By the simple model represented here, the nominal value is known. But because the variances overlap with the other values for farther out degree of kinship values, there are many possible relationships based on the amount of DNA shared; often all with a similar probability. Even at 80cM of match strength, there are multiple just-as-probable relationships. This is covered in an Ancestry white paper (see reference below). This is why you are seeing the range of possible relationships reported by the various testing companies for a given match strength.

The table above uses a rough base of 7,200cM (or 3,600cM for a half-identical, 50% match) to represent all of the autosomes. This is approximately what is reported with GEDMatch and 23andMe. FTDNA's value is about 3,400cM (roughly) for a half-identical match. The centiMorgan (cM) is not an exact measure like the count of base-pairs and hence different sources compute the value in different ways. If one is to add the X chromosome into the total shared measure (which 23andMe does), then this would add about 90cM to the 3,600cM half-identical match total (180cM in total for biological females). The Y chromosome and mtDNA strand are never included in the total matching length or shared DNA percentage count. mtDNA is so small anyway as to not be significant. And due to very few changes, there are not really segments of matching areas to measure. Beside the Y chromosome only appearing in males, the |Y matching is measured not in segments but by looking at the individual, measured differences (or markers).

Full Identical versus Half-Identical

This is really a tool and testing issue and not one related to the actual, underlying biology like the rest of this page above deals with. But we emphasize it here and describe it more fully elsewhere.

Generally, only full siblings have full-identical match regions. A full-identical match region is one where each chromosome of your pair of chromosomes is matching the respective other chromosome of the pair of chromosomes in that region for the other tester. That is, the same region on both chromosomes in both testers matches a similar region in both chromosomes of the other tester. Note that the chromosomes full-identical match regions may not be identical in size and placement Full-identical matching is simply representing where one sibling matches the other on both the paternal and maternal contributed autosomal chromosome. (There are some special cases of endogamous populations with related parents that introduces full-identical regions in those related more distantly than siblings. This can even be a measure of the tester to have their chromosomes match themselves in an identical area on both chromosomes. This usually indicates their parents are related.

Because SNP results are reported as un-ordered pairs of values (un-phased) from microarray tests and only full siblings generally have full-identical match regions, most tools simply look for and report on half-identical matching. This simplifies the analysis for them but is a lazy approach.

Half-Identical matching under-reports the total matching between full siblings.

The average match for full siblings is reported as just under 40% by half-identical tools and charts; as opposed to the actual and expected 50% like between parents and children. Furthermore, it only shows a tighter variance of about 2% from that average as opposed to the measured and expected 5% variance. You cannot reliably use half-identical matching / reporting tools on full siblings unless you use corresponding phased results, measure between each phase independently, and sum the two independently measured match strengths.

Phasing takes the un-ordered pairs of values and assigns each value in the pair to a maternal or paternal contributed chromosome. Assuming parents are not related or there is no real endogamy, only full siblings will have ANY full-identical match regions. Similarly, half-siblings will have a match strength more like an Uncle/Aunt/Niece/Nephew and have NO full-identical match regions. Thus a test for the existence of any full identical match regions between siblings is a way to determine if they are full or half-siblings. Confused? Just realize that most tools and charts claim full siblings only share around 2900cM of DNA. When in reality, if measured properly by full identical techniques, they should report the same as the parent-child or about 3600cM and possibly even higher. Or as a trick, if the tool allows it, compare yourself to yourself. Technically, you should match 100% like a certain type of full-identical twin would. Tools based on only half-identical match techniques will only report you are a 50% match to yourself as seen in a parent-child relationship. 23andMe is the only company to do full-identical matching.

As a result of all this possible variance, investigations continue to look into whether the longest matching segment is a better measure of the likely degree of relationship between two testers. It has its own warts and wide distribution of values as the match becomes more distant. But it appears initially to have less overlapping variance for more of the closer matches. Stay tuned as the research is developed further.

Another interesting factor is the recombination is more frequent in biological females than males. A grandmother will usually match stronger than 25% and a grandfather less. The two added will always be 50% but they generally are not evenly split as implied in the chart above.

External References

¹ Autosomal Match Spread from Ancestry.com help topic "How do we estimate relationships?"
² White Paper on Matching from Ancestry.com (see specifically Figure 5.2 around page 40)
³ ISOGG Autosomal DNA Statistics page on ISOGG
⁴ Match Probability spreadsheet from Tim Janzen giving the probability that a given total matching segment length is related to a given number of generation events.
⁵ DNA Inheritance by Angela Cone (with a nice crossover explanation)
⁶ CentiMorgan in ISOGG Wiki
⁷ Limits of Predicting Relationships Using DNA blog post by Leah Larkin (where she extracted Ancestry table 5.2 mentioned above into an excel chart; developed later than this page but a nice clarification)
⁸ Shared cM Project Tool as developed by the DNA Painter group

Backlinks

Structures

Average versus Actual shared DNA

Full Identical versus Half-Identical

External References