VCF Bulk Export

This form provides filtering of existing VCF files and export into common formats. Most of the filter criteria and many of the formats are provided by VCFtools+.
1
Choose your VCF File.
The following table contains all the available VCF Files. Choose the one you would like to filter and export by selecting the circle at the beginning of the appropriate row.
NameAssemblyNumber of SNPs
AGILE LDP Exome Capture SNP SetLc2.0578,890
2
Specify filter criteria.
If you check this checkbox, only SNPs with 2 alleles across all individuals will be kept. For example, in the example data below, SNP Chr3p34567 would be removed.
Only include SNP calls that have at least the specified number of reads to support the call. For example, if you specify 5 for this filter then for SNP Chr2p25678 in the example table below, only the call for Germplasm4 will be set to missing data.
Only include SNP positions with a minor allele frequency greater than or equal to this value. Allele frequency is defined as the number of times an allele appears over all individuals at that site, divided by the total number of non-missing alleles at that site. For example, if your enter 45% in this filter then SNPs with a minor allele frequency lower than 45% could be removed (SNP Chr1p12344 in the example data below).
Exclude SNPs with more than this number of missing genotypes over all individuals/germplasm. For example, if you enter 1 for this filter then SNPs with more than 1 missing genotype would be removed (SNP Chr4p48765 in the example data below).
Exclude SNPs based on the proportion of missing data. For example, if you enter 25% for this filter then SNPs with a missing data frequency higher than 25% would be removed (SNP Chr4p48765 in the example data below).
Example Table: Example Data for Filter Explanation.
SNP NameSNP BackboneSNP PositionGerm1Germ2Germ3Germ4Germ5Germ6
Chr1p12344Chr112344AA:5TT:12TT:15AT:19TT:15
Chr2p25678Chr225678GG:7GG:13GG:5TT:2GG:22GT:24
Chr3p34567Chr334567AA:5CC:12AC:7TT:15CC:19TC:23
Chr4p48765Chr448765CC:12AC:7CC:19AA:23

* The above example will be referred to in the description of each filter criteria to aid in the explanation of how it will affect your data. NOTE: the cell for each SNP by germplasm combination contains the call and the read depth seperated by a colon (:). For example, AA:5 means a call of AA with a read depth of 5.

3
Pick your Export format.
Select one of the formats listed below and the filtered VCF will be converted accordingly. Keep in mind that if you choose a format with no quality information, you should have been stringent with your filtering criteria to ensure you are working with good data.
FormatHas Quality Info?Description
ABH FormatNoAlleles are coded as A if they match the maternal parent, B based on the paternal parent, whereas H represents a heterozygous call and "-" as missing.
NOTE: This format is only suitable for biparental crosses and any SNPs in which the parents are missing, heterozygous, or the same genotypic call will be excluded!
Genotype MatrixNoVariant by Germplasm matrix of Genotype per call in a tab-separated values(TSV) format.
Quality MatrixYesVariant by Germplasm matrix of Read Depth per call.
Variant Call Format (VCF)YesA variant by germplasm matrix with each cell containing a combination of SNP call and quality information. See the Specification for more information.
Haplotype Map (Hapmap)NoA Hapmap file is a tab-separated values(TSV) format for storing genotypic data. Hapmap format is easier to edit and handle but less informative than VCF format.
NOTE: This format is only suitable for SNPs and any INDELS will be removed.
Bgzipped VCFYesAn archive containing a bgzipped VCF file and a Tabix file. This combination is required by various programs such as the R package VariantAnnotation. See the tabix manual for more information.
+ The Variant Call Format and VCFtools, Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, Eric Banks, Mark A. DePristo, Robert Handsaker, Gerton Lunter, Gabor Marth, Stephen T. Sherry, Gilean McVean, Richard Durbin and 1000 Genomes Project Analysis Group, Bioinformatics, 2011.