Microarray File Formats (aka RAW)

On This Page:

Features and Variances
External Links
Actual File Formats

Documented here are the various Microarray file formats representing the comprehensive SNP test "RAW data". These file formats are generally the result of doing a DNA Microarray Testing lab procedure. They are simplifications of the "RAW" VCF Sequencing File Format already familiar to many in the genetics community and more generally, simply termed, a TSV (or CSV) format file as termed in the computer industry. The vast majority of the content (and sometimes only in FTDNA's case) is covering the Autosomes only and hence why some call them Autosomal tests and formats. But that is a misnomer because the laboratory test and its results include SNPs from all the DNA: the Allosomes and mtDNA as well. It is the Autosomal content, also being much larger in content, that tends to be most used in a unique matching segment mechanism between testers.

Note that the microarray file formats are somewhat independent of their content and the company generating them. Meaning, they do not specify what company or version of test from that company the file contains. Or even necessarily what reference genome model was used. Making it more difficult for the development of automated analysis tools. Only with developed heuristics is a tool able to determine the source of the file and its content.

Although more than just autosomes are included in the file, the microarray file formats are not the method of documenting specific test results from most yDNA, mtDNA or NGS targeted tests. The latter use the Sequencing File Formats well known to geneticists. With that said, these microarray file formats include xDNA, yDNA and mtDNA SNP values as provided by most of the testing companies. Check the companion page SNP Databases to learn more about the content itself (for example, how an rsID compares to a common SNP name such as P312).

Some common features between the formats (and even the VCF standard) exist. For example, all use a form of Tab-Separated Value (TSV) columnated textual format. Something that grew out of the simple textual table and then follow-on spreadsheet processing. The extended Comma-Separated Value (CSV) form is a more robust format that is still textual but not really directly readable by a human. It is a superior format for capturing a variety of information but must usually be computer processed. Spreadsheets support both TSV nd CSV forms in files they read and write. Another common feature of the microarray file formats is most have one or more lines of header information. This header is free form but each line starts with the hash ("#") symbol to distinguish its content. The hash was Introduced in the 1970's by the UNIX Shell to indicate a comment line in a script file that should not be processed. These hash lines that form a header are not defined as part of the spreadsheet formats and, as such, make the microarray file formats described here a little more unique and not 100% compatible with spreadsheet programs.

Both file forms in the spreadsheet world tend to be given a .csv file extension or suffix. But in this industry, the TSV file is often given a .txt file name extension. VCF happens to also use a simple, free-form TSV design. Often the miroarray file format .csv/.txt files are compressed to save space as they contain 600 to 700 thousand diploid SNP values and can get quite large. Standard ZIP format is most often used to compress them and so the files are commonly delivered with a .zip container suffix.

These microarray file formats have one row or text line for each Probe result — most often an SNP. They identify the Probes by rsIDs as well as the chromosome (sequence) name and position within it. Some yDNA and mtDNA sole-content files use SNP names in place of the rsID. Key is, at least one of three forms is needed to identify the SNP row: (1) Chromosome name and position, (2) rsID, or (3) SNP name. Usually more than one identification form is present. Most microarray file formats use both the rsID and the chromosome name and position to identify each SNP. And sometimes the two forms conflict with each other; leading to more confusion. The chromosome coordinate position is often defined for the forward / 3'-5" / positive direction. Occasionally, and without indication, they are using a backward / 5"-3' / negative direction (and then often a complimented value). Thus leading to further confusion.

The microarray file formats are a simple form of a RAW, annotated VCF. Basic, unannotated VCFs do not include the rsID. Normal VCFs only contain the derived variants whereas microarray file formats contain all tested values — whether they are derived or ancestral. A RAW VCF, before filtering, may include ancestral value SNPs. Hence likely why you often see the term "RAW Data File" associated with these microarray file formats.

There are a number of tools that read and process the microarray file formats files. See the Third Party Analysis Tools page for more details. In particular the DNA Kit Studio allows the manipulation and even merger of various microarray test result files. Felix Immanuel was the first to provide such tools early on. Even bcftools from the Broad Institute can read the TSV files and generate basic VCF ones due to the popularity of this free form mechanism with early, simple DNA testing results.

Features and Variances

Let us first cover some of the basic features and variances of the files between vendors. And then delve into examples of the actually file formats.

Feature Comparison Quick Summary

A table below summarizes the major features / content from each company. This is taken from our original chart at the bottom of the Genetic Genealogy Testing page that we introduced back in 2014.

Feature	23-v3	23-v4	23-v5	Anc-v1	Anc-v2	FTDNA	NGG Geno2.0	NGG Geno2.0+ NextGen	MyHer	LivDNA
Approx Size (MB, compressed)	8	5	6	6	6	6.5	1	6.5	6.4	6
Build Type	37	37	37	37	37	36/37	37	37	37	38
DNA Type(s)	Auto,X, Y,MT	Auto,X, Y,MT	Auto,X, Y,MT	Auto,X, Y	Auto,X, Y	Auto, X	Auto,X, Y,MT	Auto,X, Y,MT	Auto, X	Auto,X, Y,MT
SNP ID	RS#, i	RS#, i	RS#, i	RS #, i	RS #, i	RS #, i	RS #, SNP²	RS #, SNP²	RS #, i	RS #,i SNP²
Auto Probes 1-22	930,281	577,382	614,007	682,549	650,647	690,715	126,306	698,192	702,442	603,129
X/23	26,007	19,487	16,530	?	25,250	17,478	3,803	17,812	17,892	15,511
Y/24	1,766	2,329	3,734	885	1,668	-	11,978	13,533	482	382²
MT/26¹	2,459	3,154	4,273	-	262	-	44²	41²	-	21²

¹ Ancestry has the X/Y PAR as chr 25. v2 has MT as chr 26. 23andMe simply includes Y PAR values in the X result (as the 2nd value in the un-ordered pair for males).
² LivingDNA and NGG only report positive-for-change (derived, positive, changed) SNPs for Y and Mt. So it is not clear how many they are actually testing nor how many are ancestral (negative, un-changed). They also only report these values as a list of SNP names and not by rsID or position. Each is in a separate file.

File Sizes and Versions

It is not enough to know the vendor of your test. You need to know which version of the chip microarray (CMA) they used in the lab on your sample. And even, as it turns out, which minor version of file format they have provided your data in. Note that some of these minor versions were coding errors and later fixed. You can sometimes get an updated, corrected file simply by re-downloading a new RAW file. If you have a file that does not fit into the metrics of the chart below, please let us know so we can catalog another minor version.

A quick and dirty way to figure out your particular test company and file version is to count the number of lines in your file. The number of header lines is always under two dozen and so does not really affect the rounded-to-thousands count. This count method is more reliable if you know the test company source as well. As some of the test company files for a particular version are very similar in size. The data rows / lines contain the result for one Probe or marker result from the test.

On any Unix or BASH shell, one can simply execute the command

Copy to clipboard

zcat <microarray>.zip | wc -l

On Win10 Powershell, the command is (using a 7Zip 64 bit installation):

Copy to clipboard

7z.exe e -so <microarray>.zip | Measure-Object -Line

This assumes you were given a compressed file. If not compressed, us the "wc" or "Measure-Object" command directly on the file by placing the file name after the command. Note that some text editors, like Notepad++, can load the files and will report on the number of lines within the tool. Spreadsheet programs and most text editors cannot otherwise load such a large file.

Vendor	Ver sion	Start Date	End Date	File Size (K lines)	ISOGG Table (K SNPs)	WGS Extract (K SNPs)	HGR Model⁰	Microarray Chip Used
23andMe	API	-	Sep 2018		-	1,498		Supported API interface SNP list (now researcher access only)
23andMe	v2 ⁷	late 2007	-		571	-	NCBI36	Illumina Hap550+ (Human BeadChip)
23andMe	v3	Nov 2010	Nov 2013	961	956	959		Illumina Omniexpress (Human BeadChip)
23andMe	v4	Nov 2013	Aug 2017	602, 611 (599) ⁶	605	602		Illumina Infinium HTS iSelect HD
23andMe	v5	Aug 2017	-	639	630	638		Illumina GSA
Ancestry	v1	Jan 2012	May 2016	701 ⁴	700	701		Illumina Omniexpress (Genotyping BeadChip)
Ancestry	v2 a-b	May 2016	May 2018	669 / 650 ⁵	-	669		Illumina Omniexpress+ (Genotyping BeadChip)
Ancestry	v2 c-d	May 2018	-	664 / 678 ⁵	662?	-		Illumina Omniexpress+ (Genotyping BeadChip)
FTDNA	v1	-	Feb 2011	564 (550)	-	548	HG16 / NCBI34	Affymetrix Axiom xxx ¹ (No Y, MT)
FTDNA	v2	Feb 2011	Apr 2019	725 (708 / 716), 720 ⁹	725 (v1) ⁸	720		Illumina OmniExpress (Microarray Chip) (No Y, MT)
FTDNA	v3	Apr 2019	-		630 (v2) ⁸	614		Illumina GSA (No Y, MT)
LivingDNA	v1	Sep 2016	Oct 2018	619	619	619		Illumina GSA
LivingDNA	v2	Oct 2018	-	692 (660 Fem)¹¹	699	699		Affymetrix¹² Axiom Sirius
MyHeritage	v1	Nov 2016	Mar 2019	721	720	721		Illumina OmniExpress (Microarray Chip)
MyHeritage	v2	Mar 2019	-		607	610		Illumina GSA
TellMeGen	v?	?	-	780 (609 / 678)	-	-		Illumina GSA
MHTFR Genetics	v?		-	640	-			UK (no male Y sample)
Genera	v?		-	640	-			BR
meuDNA	v1	-	Dec 2021	632	-			BR
meuDNA	v2	Jan 2022	-	654	-			BR
SelfDecode	v?	?	?	687	-		GRCh38	USA
Reich Lab	v1	??? 2015	-	598	-			Affymetrix¹² Human Origins v1
Reich Lab			-	1,233	-			1240K panel (Allen Ancient DNA Resource - AADR)
NGG Geno	v2 ²	Oct 2012	Nov 2015	142	-
NGG Geno	v2+ ²	Nov 2015	May 2019 ³	730	-		NCBI36	Illumina custom GenoChip
WGS Extract CombinedKit	v2	Nov 2019	Jun 2020	2,080	-	2,080	HG19 / GRCh37	WGS Extract's "CombinedKit"¹⁰ (Superkit on Steriods) option from WGS Results

note⁰: Build is 37 unless otherwise noted (most are 36 otherwise)
note¹: FTDNA retested all FamilyFinder v1 samples using the new v2 Illumina chip and replaced the output files
note²: National Geographic Genographic files are separated by chromosome type and use SNP names and not rsIDs to identify the yDNA and mtDNA entries. v2 is mainly a haplogroup test. v2+ is better known as NextGen.
note³: After Nov 2016, this is only for non-North-America orders (non-Helix, still FTDNA) till the shutdown of testing in Nov 2019.
note⁴: During Fall 2015 (Sep-Nov), Ancestry put out their RAW files with a truncated header not giving version numbers and other information
note⁵: 669 is the norm that was started with. Winter 2018 (650) and Summer 2018 (664) saw smaller sizes that were often "fixed" on request; Feb 2019 began to see larger (678) v2a-b have minor variations and similar between v2c-d. But v2b to v2c saw over 150K entries dropped and another ~150K different entries added. (SNPedia picked up on this and calls them variations 2c and 2d although we see a 4th we call 2b that they do not mention.)
note⁶: All of 2015 and beyond saw file sizes of 611K lines for the 23andMe v4 test with a few 599K ones scattered that year (both sexes). 602K was for 2013 and first half of 2014. The variance between kit versions is on the order of 20K entries or less.
note⁷: 23andMe v1 used the same chip as v2. But we have found no data about its output size and characteristics. So have left it out of the table.
note⁸: ISOGG chose to ignore / skip the original Affymetrix FamilyFinder test and starting numbering from 1. Most others do not follow this convention.
note⁹: 720K is the final standard (v2d). Pre-2015 are all reprocessed to 725K entries (v2a) (and may be really v1 kits originally?). Some 716K entry sizes (v2c) are seen in 2016 (both sexes). The earliest and single occurrence we saw of 708K entries (v2b) in 2015 is still being investigated. Note these are based on build 37 model and the Auto+X download. v2a seems to be an almost exact superset of v2b-d.
note¹⁰: There are possibly as many as 10k InDels in these files that are not currently properly handled and called correctly. Genetic Genealogy sites ignore InDels so this is generally not a problem.
note¹¹: LivingDNA supplies the yDNA and mtDNA in separate positive-for-change SNP name lists only. Main files are atxDNA only like for FamilyTreeDNA.
note¹²: Thermo-Fisher Scientific acquired Affymetrix and their Axiom microarray product line in 2016
Table Sources: Reference below and Randy's 80+kits covering most versions and companies.

Note that TellMeGen, SelfDecode, MHTFR Genetics, Genera and meuDNA are not genetic genealogy focused companies. But are expanding into that area as they expand the market for their consumer DNA test product. Their result files can be used in Third Party Analysis Tools just like the others. Just as all the traditional genetic genealogy focused result files can be used on other sites that provide health, wellness and trait analysis.

Minor variations in major versions

Minor Variations in Microarray Files

The chart here provides a few more details on the variations of microarray file formats within a major version. Only Ancestry made a very significant change.

ISOGG has since created comparison tables of the various test kits. Their covered SNP counts often vary considerably from our measured values. We have not yet determined the reason. The companies vary the outputs within a version and time period; as is shown in the table above. But this does not seem to account for ISOGGs generally lower counts.

Not incorporated in the above is an article detailing some variations in the 23andMe files for mitochondria over time. This is mostly found in files downloaded before 2012. If the file is re-downloaded, it is often corrected. Similar documented and undocumented changes occurred in 23andMe, Ancestry and FTDNA file content within major versions over time.

UCSC Templates

We have discovered "templates" for many of the microarray chips on the UCSC server. Not clear why they are there and what they use them for. They do not appear to have the vendor introduced variations. (Illumina and others let larger customers customize around 50k entries on a microarray chip. This is how NGG was able to have around 13K Y SNPs defined.) Here is the template listing found when we visited the site in 2020.

Affy5	Affy6, Affy6SV	Affy250Nsp, Affy250Sty
Illumina1M	Illumina1MRaw	IlluminaGDA
Illumina300	IlluminaHuman660W_Quad	IlluminaHuman660W_QuadRaw
Illumina550	IlluminaHumanCytoSNP_12	IlluminaHumanCytoSNP_12Raw
Illumina650	IlluminaHumanOmni1_Quad	IlluminaHumanOmni1_QuadRaw

See also SNP Genotyping Arrays, Recombination Hotspots for Genotyping Arrays, Recombination Arrays for Genotyping Arrays, and Formatting of Data (Genotyping Arrays) for more information on what these various files are used for.

Study of Available Arrays

Long after we compiled the information for this page, a study has come out of the utility of the various genotyping arrays. Part of the study does include the data and analysis of the various arrays. Some of which we capture in the list below. (Sizes are in thousands of entries.) Showing much more diversity than we expected. And larger counts than expected as we thought most arrays were 1,000 x 1,000 at most (limiting the result to around 1 million entries).

Array	Size	Array	Size	Array	Size	Array	Size	Array	Size
Affymetrix¹² 6.0	932	Axiom AveraNTR	671	Axiom GW ASI	630	Axiom GW CHB2	658	Axiom GW EUR	675
Axiom GW LAT	818	Axiom GW PanAFR	2,268	Axiom PRNA	920	Axiom UKB WCSG	842	CytoSNP 850k_b	850
Drug Dev Consortium 15073507 A1	475	GSA 24v3 A1	653	GSA MD 24v1-0 20011747 A4	693	Human 660W quad v1	591	Human Core 12v1-0 a	298
Human CytoSNP 12v2-1 H	295	Human Omni 2.5-4v1 h	2,434	Human Omni 5-4v1 c	4,269	Human OmniExpress 12v1-1 b	718	Human OmniZhongHua 8v1-0 c	899
Infinium Exome 24v1-1 A1	245	Infinium Immuno Array 24v2-0 a	252	Multi-Ethnic AMR AFR 8v1-0 A1	1,425	Multi-Ethnic EUR EAS SAS 8v1-0 A1	1,474	Multi-Ethnic Global A1	1,761
Onco Array 500K B	498	PMDA hg19	918	Psuch Array B	570

* Names will be improved once the papers are thoroughly reviewed

External Links

Xcode.life
SNPedia Company Testing Overlap with SNPedia / ClinVar, and individual Ancestry-FTDNA-23andMe entries
Louis Kessler's summary of file formats he studied. Unfortunately, does not identify the different versions of kits and thus the different chips used. This may explain some of his discrepancies.
Rebekkah Canada's Exploring Microarray Chips compendum articles on her haplogroup.org site (only available from archive.org now)
Enlis Genomics blog posts
See also the more recent works that were found (sometimes years) after writing the above:
- "A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research" by the openSNP group
- https://countingchromosomes.com/blog/70-analysis-of-ancestrydna-tests-processed-from-june-2016-to-august-2019
ISOGG Autosomal Comparison Chart — less accurate than most sources for unknown reasons. But a top level summary of the overlap seen by them between different kits. Based on Louis Kessler's work mentioned above
NGG FAQ (archive copy)
Tim Jansen's Rootsweb email email 30 Jan 2013 (archive copy)
MAGE-TAB (doc)] Specification from Functional Genomics Data Society (FGED) (now defunct) which is the closest relative to a RAW file format specification we can find. More really more complex results and maybe mostly after the vendors settled on their own format. Work now really in the Genomics Standards Consortium (GSC). The GSC can be difficult to comprehend from their website but the product is more easily discovered at Fair-Sharing (GSC page). Many of the documents are more descriptions of services and sites than interchange formats and standards.

Actual File Formats

So lets get onto describing the actual file formats themselves. A reminder that all files share a few common features. For example, being a TSV or CSV format file, having headers of one or more lines that often start with a hash ("#"); but not always. And so on. Most are Build 37 delivered results and sorted in an expected order of chromosomes 1-22, X, Y and MT. But variations exist and are indicated below.

We start with a summary table and then introduce each of the formats. All vendors and the summary table are available one at a time by clicking the named "tab". Hit the Tab for the file format of interest. Or hit the "No Tabs" button to the far right and see all at once. Which is useful if you want to print this page.

No Tabs

Summary table of formats

Vendor	File Ext	File Form	File Line End	Chr Labels & Entry Order	Allele Form	Allele Values	Ref Build	IDs	Header	Notes
23andMe	.txt	TSV	\r\n	1-22, X, Y and MT	AG	ACGT, DI, —	37	rsID, iNNN	~20 # lines including last column title row	Single value in X, Y and MT (for males); dash always homozygous. Female Y is all double dash.
Ancestry	.txt	TSV	\r\n	1-22, 23 (X), 24(Y), 25 (PAR), 26 (M)	T C	ACGT, ID, zero ; any order	37	rsID	~18 # lines followed by column title row starting with "rsid"	Always double values; zero always homozygous. Female Y is zeros but PAR is heterozygous in both. DI only in v2c and beyond
FTDNA	.csv	true CSV	\r\n	1-22, X (if selected), (XY)	AT	ACGT,( DI,) —	37 (36 v1)	rsID, VG, (seq-rsID. kgp, 2010-, GSA, LDLR, IDS, DY, CF, DrGene, FAM, HPS, PEX, 1SNP, indel, ...)	Single row column-definition starting RSID (only unquoted value row except v3 is quoted)	v3 has wide variety of IDs; v2 only first two. Early v2 ONLY generated separate 1-22 and X files or concatenated them so header appears in middle again; InDel only in v2b and v3; chr/pos 0 in v2a and v3; only v3 has XY
LivingDNA	.txt	TSV	\n	1-22, X	AT	ACGT, — ; any order	37	rsID, AX, AFFX (, 1:, exm2, JHU, var, kgp, 1kg, SNP. gw)	~11 # lines of header including last column title row	Y and MT in separate files listing only derived SNPs; v1 has the large variance in names; v2 has >2 allele values. Often two sets of similar sequences (two inserts?) but not always (insert and delete?); longest is 21x2
MyHeritage	.csv	true CSV	\n	1-22, X, Y	AT	ACGT, DI, —	37	rsID (,VG)	~7-12 # lines followed by column title row starting RSID (only unquoted value row)	only v1 has VG ID's; only v2 has ID alleles and only on X ; early v2 had no quotes EXCEPT on chromosome 17 where they quoted coordinates and inserted commas as thousand separator
TellMeGen	.csv	TSV	\r\n	1, 10, 11 ... 22, 3, ... 9, MT, X, XY, Y	TA	ACGT, ID, — ; Any order	37	rsID, chr1, dupseq, ilmnseq_rs, GSA_rs, seq-rs, TOP, ...	Single row column-definition starting "# rsid"	Very large assortment of names including just a single dot
MTHFR Gen	.txt	TSV	\r\n	1-22, MT, X(, Y?)	TA	ACGT, ID, — ; Any order	37	rsID	Single row column-definition starting with "rsid"	No male sample obtained yet; One RSnnn (cap)
meuDNA	,csv	CSV unquoted	\n	0-22, X, Y, (XY, )MT	AT	ACGT, DI, —	37	rsID, 2010-, GSA, ... (similar LivingDNA v1 but no AFF(x)	Single row column-definition starting with RSID	782 0,0 entries ; no quotes ; no XY in v2; diff mix of IDs between v1 and v2
Genera	.csv	CSV unquoted	\n	0-22, X, Y, MT	AT	ACGT, —	37	rsID, GSA, ilmseq, MTR, 2006, ...	single row column-definition starting with RSID	Y and MT is single value ; template only so cannot tell if InDels
Self Decode	.txt	TSV	\n	1-22, X, Y, MT	TA	ACGT	38	rsID, GSA, ilmseq, exm, seq, 1:, JHU, MFN, variant, indel, BOT, chr1:, newrs, ...	8 lines including single row of column definitions	X, Y and MT single value (in males)
Reich 1240K	.txt	TSV	\r\n	1-22, X, Y, MT	TA	ACGT, —	37	rsID, snp_, Affx_, 1kg, Y SNP names	two lines including single row of column definitions	No format defined. So utilize 23andMe one.
Reich HumOrig	.txt	TSV	\r\n	1-22, 23(X), 24(Y)	TA	ACGT ; Any order	37	rsID, snp_, Affx_	two lines including single row of column definitions	No format defined. So utilize 23andMe one with minor exceptions.
NGGeno	.csv	TSV					36			Handled by FTDNA till near the end. Near identical files and formats.

*Note: unless otherwise specified, (1) heterozygous InDel alleles exist, (2) two values always exist, (3) Increasing order alleles only.

Sometimes the PAR region is split out from either X or Y. The PAR1 region is the same position in X and Y for build 38; the X is 50k shifted in build 37. The PAR2 region starts at ~95 million on X in build 37 and ~99 million on build 38. Any alleles defined in a PAR region of X or Y cannot be reliable distinguished as to the source. The Pesudo-Autosomal Regions for the two builds are:

Region	Build	Chr	Start	Stop	Length
PAR1	37	X	60,001	2,699,520	2.639.519
PAR1	37	Y	10,001	2,649,520	2.639.519
PAR1	38	X or Y	10,001	2,781,479	2,771,478
PAR2	37	X	154,931,044	155,260,560	329,516
PAR2	37	Y	59,034,050	59,363,566	329,516
PAR2	38	X	155,701,383	156,030,895	329,512
PAR2	38	Y	56,887,903	57,217,415	329,512

23andMe

File formats from all versions are the same. But the SNPs reported change between versions of Microarray Testing chips used. To date, all versions use Illumina products.

20 lines of header
Tab separated (TSV) pseudo RAW-VCF file
Column definition included
Chromosomes labeled 1-22, X, Y and MT
Genotype values: A, G, C, T, I, D, - (I and D are for Insert and Delete. InDels are not really SNPs but reported as such here)
Both genotype values together (unordered pair); always increasing alphabetic order (AG but not GA)
No calls: --
Single value in X, Y and MT but still double dash for no call (double value for X in females; Y in females is all no call)

Sample:

# This data file generated by 23andMe at: Thu Dec 17 14:11:20 2015
#
# This file contains raw genotype data, including data that is not used in 23andMe reports.
# This data has undergone a general quality review however only a subset of markers have been 
# individually validated for accuracy. As such, this data is suitable only for research, 
# educational, and informational use and not for medical or other use.
# 
# Below is a text version of your data.  Fields are TAB-separated
# Each line corresponds to a single SNP.  For each SNP, we provide its identifier 
# (an rsid or an internal id), its location on the reference human genome, and the 
# genotype call oriented with respect to the plus strand on the human reference sequence.
# We are using reference human assembly build 37 (also known as Annotation Release 104).
# Note that it is possible that data downloaded at different times may be different due to ongoing 
# improvements in our ability to call genotypes. More information about these changes can be found at:
# https://www.23andme.com/you/download/revisions/
# 
# More information on reference human assembly build 37 (aka Annotation Release 104):
# http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=9606
#
# rsid	chromosome	position	genotype
rs12564807	1	734462	AA
i3001395          MT      15530     --

From their Downloads page FAQ; a change log to the format:

July 27, 2017: As part of our continuous efforts to improve the quality of data present in your raw data download, the number of SNPs available in your download may have changed.
July 22, 2015: We updated call filtering in the downloaded file so it matches filtering in the Raw Data tool. Some customers may see "--" (a "no call") as their genotype for some SNPs on the X chromosome, Y chromosome, or in their MT DNA, where their downloaded data file previously showed a "D" call.
July 28, 2014: Analysis of our data has allowed us to improve the interpretation of over 10,000 SNPs genome-wide on the V4 chip. In the next couple of days, V4 customers will see calls for SNPs that previously did not appear in their raw data.
August 9, 2012: We updated our database to report SNP positions using the NCBI Build 37 (also known as Annotation Release 104) genome assembly. Users will see changes in their raw data positions.
September 29, 2011: Analysis of our data has allowed us to improve the interpretation of several SNPs. In the next week, customers may see changes in their raw data.
January 13, 2011: We updated our database to incorporate data from a more recent build of dbSNP. Some rsids have changed location and/or flanking sequence in dbSNP such that our probes are no longer meaningful to assay them. The names of these rsids have been changed in the raw data to internal ids starting with "i499...". We have also improved the interpretation of a number of SNPs and removed others that had poor data quality. In the next couple of days, customers may see changes in calls for those SNPs.
March 25, 2010: Analysis of our data has allowed us to improve the interpretation of several dozen SNPs. A portion of the SNPs are on the mitochondrial chromosome. In the next couple of days, customers may see changes in calls for those SNPs.
October 8, 2009: Analysis of our data has allowed us to improve the interpretation of over 1500 SNPs. A portion of the SNPs are on the mitochondrial chromosome. In the next couple of days, customers may see changes in calls for those SNPs.
June 4, 2009: Analysis of our data has allowed us to improve the interpretation of over 500 SNPs. Most of these SNPs are on the Y chromosome. In the next couple of days, customers will see calls for SNPs that previously had a no-call or appeared not genotyped.
April 9, 2009: Analysis of our data has allowed us to improve the interpretation of 10 SNPs: rs4420638, rs34276300, rs3091244, rs34601266, rs2033003, rs7900194, rs9332239, rs28371685, rs1229984, and rs28399504. In the next couple of days, some customers will see calls for SNPs that previously had a no-call or appeared not genotyped.

AncestryDNA

16 lines of header intro, 17th line is column headers.
Tab separated. NoCalls appear as '0' (zero) and always appear in pairs.
Allele's in separate columns (but still unordered); can be any alphabetic order (A T and T A)
Chromosomes labeled 1-22, 23 for X, 24 for Y, 25 for X/Y PAR region values (not sure if position from X or Y), and 26 for M (later kits only)

Sample:

#This file was generated by AncestryDNA at: 06/27/2015 09:23:22 MDT
#Data was collected using AncestryDNA array version: V1.0
#Data is formatted using AncestryDNA converter version: V1.0
#Below is a text version of your DNA file from Ancestry.com DNA, LLC.  THIS 
#INFORMATION IS FOR YOUR PERSONAL USE AND IS INTENDED FOR GENEALOGICAL RESEARCH 
#ONLY.  IT IS NOT INTENDED FOR MEDICAL OR HEALTH PURPOSES.  THE EXPORTED DATA IS 
#SUBJECT TO THE AncestryDNA TERMS AND CONDITIONS, BUT PLEASE BE AWARE THAT THE 
#DOWNLOADED DATA WILL NO LONGER BE PROTECTED BY OUR SECURITY MEASURES.
#
#Genetic data is provided below as five TAB delimited columns.  Each line 
#corresponds to a SNP.  Column one provides the SNP identifier (rsID where 
#possible).  Columns two and three contain the chromosome and basepair position 
#of the SNP using human reference build 37.1 coordinates.  Columns four and five 
#contain the two alleles observed at this SNP (genotype).  The genotype is reported 
#on the forward (+) strand with respect to the human reference.
rsid	chromosome	position	allele1	allele2
rs4477212	1	82154	T	T

FamilyTreeDNA

FTDNA started using an Affymetrix Microarray Testing but moved to an Illumina one very quickly after introduction.

CSV with commas and each field surrounded by double quotes (true CSV)
Single column-header definition header; no other header information
Build37 or Build36 (selected at download time; cannot tell which by header content)
Separate file for Auto and X (or now combined if desired)
Chromosomes numbered 1-22; X if X file or combined file
Both genotype values together (un-ordered pair)
No calls: "--"

Sample:

RSID,CHROMOSOME,POSITION,RESULT
"rs4477212","1","72017","AA"

LivingDNA

For Autosomal & X: 10 rows of header, then single row for column headers. Tab-separated (TSV) columns in pseudo RAW VCF style. RSid identifiers.

TSV with dual column alleles
Build 37
Separate file for Y and MT with derived, named SNPs only
Chromosomes labeled 1-22, X
Both alleles together; unordered pair
Alleles are rsID, AX, or AFFX

Sample (Auto/X):

# Living DNA customer genotype data download file version: 1.0.1
# File creation date 11-29-2017
# The content of this file is subject to updates and changes depending on the time of download.
# This genotype data should be treated as personal information.
# This genotype data is not suitable for clinical/medical research or diagnosis.
# The user assumes all responsibility for the security of this file.
# Please refer to the Living DNA Terms and Conditions on our website (www.livingdna.com) for more information.
# Human Genome Reference Build 37 (GRCh37.p13).
# Genotypes are presented on the forward strand.
#
# rsid	chromosome	position	genotype
rs9283150	1	565508	AA
1:726912	1	726912	AA
rs116587930	1	727841	GG

For Y: Simple list of only derived (positive, changed) SNP names. So not clear how many tested nor any that are ancestral (negative, unchanged). Sample file has 382 entries. Each row is an SNP. Variant names appear to be given on the same row with intervening slashes (/).
Sample (Y):

AM00847/AMM008/B65
AM01921.2/S475.2/Z2983.2
CTS10083
CTS10085/M1250/PF5948

For MT, simple list of only derived (positive, changed) SNP locations. So not clear how many tested nor any that are ancestral (negative, unchanged). Sample has 21 entries (which is similar to the changed value list typical in 23andMe's test). The derived value is given attached to the position number.
Sample (MT):

263G
462T
482C

MyHeritage

6 lines of header, single line of column headings. Comma separated list of entries enclosed in double quotes (") (note: early v2 is not quoted but some tools will not accept that)., 1-22, X, Y (no MT). Double allele values. All rsID names (except v1 has some VG)

Sample:

# MyHeritage DNA raw data. 
# This file was generated on 2018-06-18 14:06:02 
# For each SNP, we provide the identifier, chromosome number, base pair position and genotype.The genotype is reported on the forward (+) strand with respect to the human reference build 37. 
# THIS INFORMATION IS FOR YOUR PERSONAL USE AND IS INTENDED FOR GENEALOGICAL RESEARCH 
# ONLY. IT IS NOT INTENDED FOR MEDICAL OR HEALTH PURPOSES. PLEASE BE AWARE THAT THE 
# DOWNLOADED DATA WILL NO LONGER BE PROTECTED BY OUR SECURITY MEASURES.
RSID,CHROMOSOME,POSITION,RESULT
"rs4477212","1","82154","AA"
"rs3094315","1","752566","AG"

TellMeGen

Near identical to 23andMe format. Using Illumina GSA. Only difference is they label it a CSV file by extension but deliver a TSV like 23andMe. No header except the one line column header. Unique in that (1) is the only one with Unix-style line endings (\n only; not \r\n of DOS or \r only of MacOS), and (2) deliver a TSV format with a .csv file extension. As a result of the line endings, it broke some tools.

Sample:

# rsid	chromosome	position	genotype
rs12564807	1	734462	AA
i3001395          MT      15530     --

Tab separated (TSV) pseudo RAW-VCF file with .csv file name extension (UNIQUE)
Column definition included as only header row
Chromosomes labeled 1-22, MT, X, and Y (in that order)
Genotype values: A, G, C, T, I, D, - (I and D are for Insert and Delete. InDels are not really SNPs but reported as such here)
- Both genotype values together (unordered pair)
- No calls: --

meuDNA

Genera

MTHFR Genetics

Self Decode

We only have a single sample to go by that was delivered in June 2023. That sample was delivered in Build 38.

NGG Geno2.0

Comma separate list. First row is header title. rsID or "kgp" (1000 Genomes Project); no positions. 130,110 entries in Autosomal/X file. SNP names in Y file with 11,978 rows of values (in one example). Y file has DD and II values. ~45 MT file values so likely only variants (but from what model?)

note: A combined ALL file is also delivered that has the three files mashed up together.

Sample (Geno2.0 Autosomal and X single file):

SNP,Chr,Allele1,Allele2
kgp10004422,12,A,G
kgp10025979,7,C,C
kgp22732377,X,A,A
kgp22734373,X,C,C
rs10000081,4,T,T
rs10000092,4,T,T
rs1000014,16,G,G

Sample (Geno2.0 Y file):

SNP,Chr,Allele1,Allele2
CTS100,Y,C,C
CTS10004,Y,G,G

Sample (Geno2.0 mt File):

SNP,Chr,Allele1,Allele2
73,Mt,A,A
195,Mt,A,A
225,Mt,A,A

NGG Geno2.0+ (NextGen)

Comma separate list. First row is header titles. rsID's and position like all the others for Autosomal and X file; unlike NGG 2,0. In one sample example, 698,194 rows in Autosomal file, 17,813 in X, 13,534 in Y, xx in M (only simple list of derived value SNPs; not all tested). Typical pair of values: two from ATC or G along with I (Insert), D (Delete) and '--' (no call). Y file is like older Geno2.0 and has SNP names and no coordinates. Aliases for some SNP's given by underscore in name.

note: not clear if this is always the case but files we anecdotally saw are sorted by line and not specific columns. As SNP names come first, there is an alphabetic sort on them with chromosomes totally intermixed. A combined ALL file is also delivered that has the three files mashed up together.

Sample (Geno2.0+, separate Autosomal and X files with same format):

RSID,CHROMOSOME,POSITION,RESULT
rs3748597","1","878522","TC
rs13303106","1","881808","AA
rs28415373","1","883844","--
rs13303010","1","884436","AG

Sample (Geno2.0+, Y file):

SnpName","Chromosome","Result
CTS6704","Y","AA
CTS5286","Y","GG
BY1786","Y","GG
Y5543_Z20122","Y","CC
M3153_S7535","Y","AA
M245","Y","II

Sample (Geno2.0+, mt file with~40 entries; variants only):

Chromosome","Position","Result
mt","2885","T 
mt","16230","A 
mt","11719","G

Backlinks

Structures

Microarray File Formats (aka RAW)

Features and Variances

Feature Comparison Quick Summary

File Sizes and Versions

Minor variations in major versions

UCSC Templates

Study of Available Arrays

External Links

Actual File Formats

Summary table of formats

23andMe

AncestryDNA

FamilyTreeDNA

LivingDNA

MyHeritage

TellMeGen

meuDNA

Genera

MTHFR Genetics

Self Decode

NGG Geno2.0

NGG Geno2.0+ (NextGen)