API

vcf.Reader

class vcf.Reader(fsock=None, filename=None, compressed=False, prepend_chr=False)

Reader for a VCF v 4.0 file, an iterator returning _Record objects

fetch(chrom, start, end=None)

fetch records from a Tabix indexed VCF, requires pysam if start and end are specified, return iterator over positions if end not specified, return individual _Call at start or None

filters = None

FILTER fields from header

formats = None

FORMAT fields from header

infos = None

INFO fields from header

metadata = None

metadata fields from header

next()

Return the next record in the file.

vcf.Writer

class vcf.Writer(stream, template)

VCF Writer

write_record(record)

write a record to the file

vcf._Record

class vcf.parser._Record(CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT, sample_indexes, samples=None)[source]

A set of calls at a site. Equivalent to a row in a VCF file.

The standard VCF fields CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO and FORMAT are available as properties.

The list of genotype calls is in the samples property.

aaf[source]

The allele frequency of the alternate allele. NOTE 1: Punt if more than one alternate allele. NOTE 2: Denominator calc’ed from _called_ genotypes.

alleles = None

list of alleles. [0] = REF, [1:] = ALTS

call_rate[source]

The fraction of genotypes that were actually called.

end = None

1-based end coordinate

genotype(name)[source]

Lookup a _Call for the sample given in name

get_hets()[source]

The list of het genotypes

get_hom_alts()[source]

The list of hom alt genotypes

get_hom_refs()[source]

The list of hom ref genotypes

get_unknowns()[source]

The list of unknown genotypes

is_deletion[source]

Return whether or not the INDEL is a deletion

is_indel[source]

Return whether or not the variant is an INDEL

is_monomorphic[source]

Return True for reference calls

is_snp[source]

Return whether or not the variant is a SNP

is_sv[source]

Return whether or not the variant is a structural variant

is_sv_precise[source]

Return whether the SV cordinates are mapped to 1 b.p. resolution.

is_transition[source]

Return whether or not the SNP is a transition

nucl_diversity[source]

pi_hat (estimation of nucleotide diversity) for the site. This metric can be summed across multiple sites to compute regional nucleotide diversity estimates. For example, pi_hat for all variants in a given gene.

Derived from: “Population Genetics: A Concise Guide, 2nd ed., p.45”

John Gillespie.
num_called[source]

The number of called samples

num_het[source]

The number of heterozygous genotypes

num_hom_alt[source]

The number of homozygous for alt allele genotypes

num_hom_ref[source]

The number of homozygous for ref allele genotypes

num_unknown[source]

The number of unknown genotypes

samples = None

list of _Calls for each sample ordered as in source VCF

start = None

0-based start coordinate

sv_end[source]

Return the end position for the SV

var_subtype[source]

Return the subtype of variant. - For SNPs and INDELs, yeild one of: [ts, tv, ins, del] - For SVs yield either “complex” or the SV type defined

in the ALT fields (removing the brackets). E.g.:

<DEL> -> DEL <INS:ME:L1> -> INS:ME:L1 <DUP> -> DUP

The logic is meant to follow the rules outlined in the following paragraph at:

http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41

“For precisely known variants, the REF and ALT fields should contain the full sequences for the alleles, following the usual VCF conventions. For imprecise variants, the REF field may contain a single base and the ALT fields should contain symbolic alleles (e.g. <ID>), described in more detail below. Imprecise variants should also be marked by the presence of an IMPRECISE flag in the INFO field.”

var_type[source]

Return the type of variant [snp, indel, unknown] TO DO: support SVs

vcf._Call

class vcf.parser._Call(site, sample, data)[source]
called

True if the GT is not ./.

data

Dictionary of data from the VCF file

gt_bases[source]

The actual genotype alleles. E.g. if VCF genotype is 0/1, return A/G

gt_type[source]

The type of genotype. hom_ref = 0 het = 1 hom_alt = 2 (we don;t track _which+ ALT) uncalled = None

is_het[source]

Return True for heterozygous calls

is_variant[source]

Return True if not a reference call

phased[source]

A boolean indicating whether or not the genotype is phased for this sample

sample

The sample name

site

The _Record for this _Call

Project Versions

Table Of Contents

Previous topic

Introduction

Next topic

Filtering VCF files

This Page