

It also tells you the “Quality” – which isīasically a measure of how confident Samtools is that there really isĪ SNP there (higher is better) – and whether or not that SNP passed Nucleotide position, and alternate alleles detected in our dataset Lists the chromosome, position, ID, reference allele at that For a detailed description of the VCF format, seeįow now, let’s look at the first few lines of the VCF file. The resulting VCF file has a lot of information about your Mis-match is due simply to a sequencing error, or because of a true

Sequence quality data, and the expected sequencing error rates, and itĮssentially figures out whether it’s more likely that the observed Incorporates different types of information, such as the number ofĭifferent reads that share a mis-match from the reference, the Try and figure out whether the mis-match is because of a real SNP. Mis-match from the reference genome, it does some fancy statistics to How does samtools detect SNPs? Every time a mapped read shows a Parameter based on the kind of coverage you have in your dataset, Reads that map to this location in the reference are actually fromĭuplicated sites in your sample you can–and should–change this Represent variation between variable copy number repeats, i.e., the Trust SNPs at sites with super high coverage, because they might be Pipe that into with the varFilter -D100 option, whichįilters out SNPs that had read depth higher than 100 (we don’t want to Rather than a binary, making it a lot easier to view), and then we This line converts the BCF file into a VCF file (a flat text file
