Loading...
 

yDNA Haplogroup Comparisons

The question constantly comes up.
Why do different phylogenetic tree of haplogroups come up with different leaf haplogroups for my NGS test?

There can be many reasons. And the reasons change over time. But lets try to cover the most common ones in bullet point form before delving into details through an example.
  1. The trees and haplogroups are made after comparing test samples. Different trees have access to different samples and so create different haplogroups and branching when forming their tree.
  2. The trees are constructed independently and in parallel Each has a different convention for naming a haplogroup block / branch based on the SNPs contained within it. The list of haplogroups and SNPs within each will change over time.
  3. The trees define different areas of the Y chromosome from which to find valid SNPs to define the haplogroups.
  4. The trees are finding new SNPs daily and in parallel. And sometimes they create different names for the same SNP (where an SNP is defined uniquely by its position in the Y chromosome and its derived value).
Now some of this happened even before the "gold rush grab of mining claim" SNP names started a few years ago. When things were much slower and SNPs came out in academic papers that took years to publish. So the issues are not necessarily new. Just accelerated and commercialized.

The two main yDNA Phylogenetic Trees of Haplogroups are from FamilyTreeDNA and yFull. Minor other ones come from yTree.net and yDNA-Warehouse. Let's focus on the two big players for now. Historically, YCC collected peer-review publications into a tree. And then ISOGG took over. This academic tree formed the basis for most trees today. But with widespread consumer testing, samples became much more prevalent much more quickly than those sponsored through small-scale academic study, And hence the commercialization of tree formation and curation.

To help explain some of the details of why two trees are different, it can help to walk through an example. A recent yFull group Facebook post went as such:
FTDNA gave me R-FTA24153, and YFull gave me R1b-YFS154845. How do I compare these?
(Paraphrased for clarity here.) We have chosen this example because it points out a new differencs that has crept in recently. We will explain that later but let's get to the fundamentals.

Part 1: (Semi) Public Tree Analysis

YHaplo Comp FTA24153
FTDNA's R-FTA24153
It is first useful to simply search for each SNP in the others tree. Haplogroups are named by an )((SNPP) that defines it; there are often many. But different tree companies choose different SNPs from that list to name the haplogroup. Even if the list of SNPs is identical to each other. Maybe the two haplogroup names are aliases for the same SNP? So worth a check. You can simply look at the list of SNPs in each given terminal haplogroup for the SNP name from the other tree. But as the haplogroups are rare;ly equivalent blocks, searching the whole tree is sometimes quicker overall. In this particular case, neither SNP naming the haplogroup given is in the other tree. (Or so I thought.)

Each service has a different way to search for an SNP in the tree. On FTDNAs BigY Block Tree, they only allow searching by haplogroup names; the "Go to Branch Name" in the upper right. Note that you do have to enter the major letter category the SNP appears within on their site. As branches are named by SNP and their major letter branch of the tree. So, in this case, you would search for R-YFS154845 (note: dropping the "1b" given in the question). If you use FTDNAs public tree, select "View by Variants" in the upper left, then you can use their "search by variant" feature as another way. (This is useful if you do not have a BigY result at FTDNA from which to view the otherwise private BigY Block Tree.) In yFull, you can simply use the "Search" button in the upper right of the tree to search by SNP name. And you do not need to know the major letter in the tree the SNP may reside in (like with FTDNA). Which is helpful, as the top-level tree in yFull is now much more complex with deep branching into the R1b haplogroup existing in the top level tree. FTDNA does not have a top level tree easily accessible in their public tree. Both services did this top-level tree mod because they would have to load the whole tree — in it's entirety. R1b has almost as many branches below it as the rest of the tree combined. So making you pick a deeper branch from the start helps improve the performance of loading the tree page later.

So if the search for the SNP that names the haplogroup fails, then a next step may be to look at additional SNPs defined in the haplogroup and search for those in the other tree. Often, you will look for older, more established SNPs. Lower numeric value, non FT/FA/FB/FTT prefixed named SNPs, and similar. Most of the SNPs found here in the FTDNA leaf haplogroup are very new additions. There is one Y95348 that looks like an excellent candidate to find in yFull. SNPs with a Y prefix were first named and identified by yFull. But alas, this SNP is not in the yFull tree! So in this case, looking at the SNPs in the haplogroup does not help either. We did not mention yet, but the yFull identified SNP given in the question is not even the name of a haplogroup in the yFull tree. But more on that shortly.

YHaplo Comp YSearch
Searching yFull for YFS154845
So entering the yFull-named SNP in the FTDNA haplogroup does not yield a match at yFull. Confusing. Sometimes, if you do not test positive / derived for the SNP that names the haplogroup, they will provide you the name of an SNP for that haplogroup you are positive for. Doing the search for the SNP provided by yFull in their tree yields the result shown to the right. That it exists in a haplogroup named R-P89.2.

YHaplo Comp P89.2
yFull p89.2 Haplogroup
Clicking on the green, defined haplogroup yields this next image of the actual yFull tree for that haplogroup. Initially, and even if you do a browser "find in" search, you will still not notice the SNP you are looking for. You have to click on the "+15 SNPs" grey box to get a floating pop-up which then has the SNP of interest (YFS154845) in it. Whew. So the terminal haplogroup on yFull is R-P89.2 and not R1b-YFS154845. You can get more information on R1b-YFS154845 in the yBrowse DB for YFS154845

Note: Not to add confusion, but often "YFS" prefixed SNPs cannot be found with the search in yFull. This is because it is often a temporary, unique name for a novel by a tester and generally not placed in the tree yet. Lately, we are seeing more of these placed in the tree without being renamed. Not sure why.

So we are left with starting a manual tree search and pattern match to try and find out how these seemingly two disparate haplogroups compare. Luckily, the BigY Block Tree and yFulls YTree both have the "path" from the root down to the found haplogroup displayed across the top. In FTDNAs case, the list across the top is often partial and may be empty; depending on the size of your display. (Using Desktop Mode on a mobile display really helps to get past this issue on FTDNA.)

Capturing each path of haplogroup paths and then manually comparing their names is now the task at hand. Here is a first step comparison of the paths to the two SNPs in each respective tree; as done for our example above in the trees available mid-February 2022.
YHaplo Comp Initial Match
Initial Simple Matching of FTDNA and yFull Paths to each respective SNP

(Click on the image to see it larger)

Do not worry. We know the chart may be unreadable even when clicked. You can try to download and zoom in further. But we simply look at it in sections and have a blow-up of each as we go. So proceed along without worry.

YHaplo Comp Leftmost
Leftmost matching of Haplogroup Paths
First, at the very beginning, we notice that yFull keeps the top level, simple naming (actually, old style long path naming) while FTDNA has used the defining SNPs or newer, short-style YCC naming. So R in yFull compares to R-M207 in FTDNA. And so on.

Next, we notice that right at the top, yFull has an extra branch named R-Y482 in the path between FTDNAs R-M207 and R-M173. We happen to know this because we know R-M173 is the definition of R1. R-Y482 has a number of aliases (see the yBrowse browser for R-Y482 for the many names). But none appear in the FTDNA definition for R-M207 or R-M173. Essentially, this SNP is not in FTDNA's tree yet. So why, at such a basic root of the tree, are we seeing different branching already?

YHaplo Comp Rsplit
yFull R Branch Top Level
This has to do with the different sources of samples used to create the tree. While FTDNA has many more current (living) test results making up their tree (and has many more branches as such), yFull has many more ancient DNA (aDNA) samples loaded which can often lead to early additional branching in the tree. (Note: when you see a branch in FTDNAs tree that does not seem explained by a sample, it is usually an aDNA sample as well. Just FTDNA does not ever display ones they add. Also, when a similar thing happens in yFull, it is often a Nebula Genomics tester who transferred their result but never paid for the analysis and to be shown permanently in the tree. Lot's of little nuances like this in each tree.) And, in fact, looking at the yFull tree detail here, we see an ancient DNA sample that was negative for a number of SNPs that are positive for R1 and R2 haplogroup members below it.

We should take this moment to point out something about yFulls tree as well. They provide the alias names for most SNPs; if such aliases exist. Next to each haplogroup is the list of SNPs that make up that group. Different SNPs are separated by an asterisk (*). Aliases for the same SNP are separated by forward slashes (/). Often, only a few SNPs are listed with the rest in a pop-up as shown earlier. FTDNAs BigY Block Tree only shows a single name for each SNP of the visible haplogroups shown (as seen in the blue box of haplogroup R-FTA24153 earlier.

YHaplo Comp Leftmiddle
Next 9 Haplogroups on Paths
Getting back to our overview of the haplogroup path comparison, we look at the next 9 SNPs. We see a pretty strong 1:1 comparison with the same names used by both trees. This is mostly because it is an older part of the tree mostly flushed out by academic papers earlier. So they both chose to keep the known haplogroup names. We do have one hiccup where FTDNA has a haplogroup R-P310 compared to yFull having two haplogroups: R-L52 and R-PF6538.

YHaplo Comp P310
FTDNAs R-P310
Quickly looking at the FTDNA BigY Block Tree, we can see that haplogroup R-P310 consists of 6 SNPs in total. One of which is L52 — the name in the yFull path we had not yet associated here. So, in the end, we simply have another "insertion" of a haplogroup in the yFull path as compared to the FTDNA one. So far, a pretty good comparison and match down the line for the first 12 haplogroups of FTDNAs path starting with haplogroup R.

YHaplo Comp Middle
Middle 10 Haplogroups on Paths
Now things get really messed up. We have 10 haplogroup branches on the FTDNA path but only 3 non-matching branches in yFull. What happened? Well, most likely, this area had significant test results from many BigY testers that never transferred to yFull. And non-FTDNA testers at companies like Full Genomes, ySeq, Dante, Nebula and others were not very dominant in this area (and also transferred into yFull. There are a lot of SNPs in this area of the tree and so lots of opportunity for haplogroups to be split to create these extra steps in the path.

YHaplo Comp Middle Resolved
Aliases for 3 SNPs
Well, it is pretty quickly resolved by looking at the yFull tree. The three yFull haplogroups of R-S263, R-S264 and R-S497 are aliases for SNPs with the names R-Z381, R-Z156, and R-Z306. So as shown here, there is a direct match-up. Leaving a group of 3 haplogroups inserted at the start of this chain and 4 added at the end. So this time, the insertions are by FTDNA.

But we are not done yet as we want to point out one other item. See that FTDNA haplogroup using an SNP named R-FTT8. If you have been following the breaking news of 2021-22 (see our article here), then you know about the recent work of the Telomere to Telomere (T2T) consortium. (FTDNA)) has found some new SNPs in previously unmapped areas of the Y chromosome and actually added them into their tree. These do not appear in the Build38 model the tree is based on. Only are identified when samples were remapped to this new T2T model of the Y put out in late 2021. So already, their tree has become a hybrid. You can only check your validity for this SNP by remapping your original BAM results to this new model.

YHaplo Comp Rightmiddle
Next 4 Haplogroups on Path
Moving along, we look at the next 4 haplogroups in the FTDNA path and compare to the yFull path down to their leaf haplogroup for this tester. Here again we have a strong correspondence with only a single insertion from yFull.

This is a good time to mention that just because haplogroups have the same name does not mean they are identical. In fact, where there are these insertions, most likely the SNPs to form the newly inserted haplogroup were pulled from a neighboring haplogroup. So the list of SNPs in each haplogroup may be different in each tree.

YHaplo Comp Rightmost
Last 6 Haplogroups on Path
So now we are down to the last 6 haplogroups in the FTDNA defined path for this tester. Of the 6, only one is a direct match from the yFull tree with only an additional one on either side of it to possibly match to FTDNA. So, like in the middle, it appears we may have much more branching defined in the FTDNA BigY tree as compared to the yFull yTree.

YHaplo Comp Rightmost Resolved
Placement of yFull path
Looking more closely at the FTDNA BigY tree in this area, we actually see that R-P89 is defined there and a parallel branch to R-BY111812. And, in fact, in the sub-branch of R-P89 on FTDNA, we have haplogroup branch R-FGC13314 which contains YFS154845. Not sure why it did not show up when searching by SNP before. (Hopefully, I did not do a branch search only in the BogY Block Tree. Or use the haplogroup name with the leading R- when searching by variant in the public tree.)

The yFull tree, although much less populated, has only R-P89.2 below R-Z27559 but then somewhat mimics the FTDNA tree further down. If the tester is really positive for R-P89 and the SNPs in the FTDNA tree below that, then they would not be placed in the parallel branch R0FTA24153. If you recall, yFull named them not by the branch name R-P89.2 but by an SNP within the branch.

So what has to happen to resolve this further, is the actual SNPs being shown in both trees have to be looked up individually and their value in the tester determined. This "manual" analysis will be required to more properly understand the real placement in what is represented by both trees. More than likely, because yFull has less structure in their tree here, the placement by FTDNA in their more detailed tree onto the sister branch is the correct answer. yFull just needs more testers in this area to refine their tree further.

YHaplo Comp R Z27559
FTDNA R-Z237559


So if we had done the search for each SNP correctly to start, we would have gotten to the above diagram of the FTDNA BigY Block Tree and looking at it compared to the yFull tree more quickly. But going down and doing the path analysis was a useful exercise to point out how to compare the branching in the trees (by path). yFull is missing the whole right side of branches below R-Z27559.

YHaplo Comp FTDNA Vs YFull
Comparing FTDNA to yFull (in similar format)


Part 2: Deep Dive SNP Look-up