hapsburg.PackagesSupport.h5_python.h5_functions

Functions

`load_h5`(path[, output])	Load HDF5 from path and return hdf5 object
`save_data_h5`(gt, ad, ref, alt, pos, rec, samples, path)	Create a new HDF5 File with Input Data.
`to_vcf`(chrom, pos, ref, alt, gt, iids, vcf_path[, ...])	Saves VCF. If Genotype Likelihoods given (pl), save them too.
`add_gt_data`(df, gt[, pl, iids, m_sym])	Add Genotype and Allele Depth Fields [l,n,2] for iids to pandas dataframe df.
`hdf5_to_vcf`(path_h5, path_vcf[, iids, markers, chrom, ...])	Load HDF5 from path_h5, extract iids and
`ad_to_gentoypeL`(ad[, error])	Convert Allele Depth Fields to Genotype Likelihoods.
`gl_to_pl`(gl)	Convert Genotype Probabilities to normalized PHRED scores
`merge_in_ld_map`(path_h5, path_snp1240k[, chs, write_mode])	Merge in MAP from eigenstrat .snp file into
`bring_over_samples`(h5_original, h5_target[, field, dt])	Bring over field from one h5 to another. Assume field does not exist in target
`merge_chr_hdf5`(fs[, path_combined_h5, chs])	Combine Genotype hdf5s from different chromosomes into one
`concat_fields`(f, f2, field1, field2[, axis])	Concatenate two hdf5 fields and return data
`combine_hdf5s`(f, g, path_new)	Combine Genotype hdf5s and save at path_new
`mpileup2hdf5`(path2mpileup, refHDF5[, iid, s, e, ...])	Function to convert Pileup to HDF5 format.
`mpileups2hdf5`([iid, chs, mpileup_path, out_path, ...])	Function to transfrom Pileups from several chromosomes to hdf5s.
`pull_down_pileup`([path_bam, iid, chs, processes, ...])	Produces a Pull Down File from a .bam file, output a pileup file in the
`bam_to_hdf5`([iid, chs, processes, path_bam, path_bed, ...])	Converts a bam file to a HDF5 file.

Module Contents

hapsburg.PackagesSupport.h5_python.h5_functions.load_h5(path, output=True): Load HDF5 from path and return hdf5 object

hapsburg.PackagesSupport.h5_python.h5_functions.save_data_h5(gt, ad, ref, alt, pos, rec, samples, path, gp=[], af=[], ch=[], compression='gzip', ad_group=True, gt_type='int8'): Create a new HDF5 File with Input Data. gt: Genotype data [l,k,2] ad: Allele depth [l,k,2] ref: Reference Allele [l] alt: Alternate Allele [l] pos: Position [l] ch: Chromosome [l] only numerical values (int8) allowed m: Map position [l] af: Allele Frequencies [l] samples: Sample IDs [k] Save genotype data as int8, readcount data as int16. ad_group: whether to save allele depth gt_type: What genotype data type save

hapsburg.PackagesSupport.h5_python.h5_functions.to_vcf(chrom, pos, ref, alt, gt, iids, vcf_path, header=[], pl=[]): Saves VCF. If Genotype Likelihoods given (pl), save them too.

hapsburg.PackagesSupport.h5_python.h5_functions.add_gt_data(df, gt, pl=[], iids=[], m_sym='.'): Add Genotype and Allele Depth Fields [l,n,2] for iids to pandas dataframe df. Return modified Data Frame”. If pl (Genotype Likelihoods) given, add them too.

hapsburg.PackagesSupport.h5_python.h5_functions.hdf5_to_vcf(path_h5, path_vcf, iids=[], markers=[], chrom=0, pl_field=False): Load HDF5 from path_h5, extract iids and (if given) markers by position and save vcf to path_vcf. pl: If True, also save Genotype Likelihoods! chrom: Value for chromosome (otherwise load from h5) iids: Which Individuals to match and save. If none given: Save all!

hapsburg.PackagesSupport.h5_python.h5_functions.ad_to_gentoypeL(ad, error=0.001): Convert Allele Depth Fields to Genotype Likelihoods. ad: [l,n,2] contains allele contains readcounts (integers) error: Flip Error for Read return: Genotype Probabilities (Pr(G|RC)) [l,n,3] for 00/01/11

hapsburg.PackagesSupport.h5_python.h5_functions.gl_to_pl(gl): Convert Genotype Probabilities to normalized PHRED scores gl: [l,n,3] Probabilities Pr(G|RC) (not logscale) return: [l,n,3] vector

hapsburg.PackagesSupport.h5_python.h5_functions.merge_in_ld_map(path_h5, path_snp1240k, chs=range(1, 23), write_mode='a'): Merge in MAP from eigenstrat .snp file into hdf5 file. Save modified h5 in place path_h5: Path to hdf5 file to modify. path_snp1240k: Path to Eigenstrat .snp file whose map to use chs: Which Chromosomes to merge in HDF5 [list]. write_mode: Which mode to use on hdf5. a: New field. r+: Change Field

hapsburg.PackagesSupport.h5_python.h5_functions.bring_over_samples(h5_original, h5_target, field='samples', dt='S32'): Bring over field from one h5 to another. Assume field does not exist in target h5_original: The original hdf5 path h5_target: The target hdf5 path field: Which field to copy over

hapsburg.PackagesSupport.h5_python.h5_functions.merge_chr_hdf5(fs, path_combined_h5='', chs=[]): Combine Genotype hdf5s from different chromosomes into one hdf5 and and save at path_new fs: List of hdf5s. path_combined_h5: Where to save the new masive hdf5 For now only save Allele Depths an GT - but not GP -IMPLEMENT UPDATE chs: list of chromosomes

hapsburg.PackagesSupport.h5_python.h5_functions.concat_fields(f, f2, field1, field2, axis=0): Concatenate two hdf5 fields and return data

hapsburg.PackagesSupport.h5_python.h5_functions.combine_hdf5s(f, g, path_new): Combine Genotype hdf5s and save at path_new f,g: Genotzpe hdf5s. g will be appended to f path_new: Where to save the new masive hdf5

hapsburg.PackagesSupport.h5_python.h5_functions.mpileup2hdf5(path2mpileup, refHDF5, iid='', s=-np.inf, e=np.inf, outPath='', output=True): Function to convert Pileup to HDF5 format. Outputs HDF5 file at outPath in format

hapsburg.PackagesSupport.h5_python.h5_functions.mpileups2hdf5(iid='', chs=range(1, 23), mpileup_path='', out_path='', refh5_path='', s=-np.inf, e=np.inf, output=True, processes=1): Function to transfrom Pileups from several chromosomes to hdf5s. Effectively a wrapper of mpileup2hdf5 Assumes input pileups are in format: IID.chrX.mpileup. Produce output in standard file name chrX.hdf5 iid: Name of Indivdiual to run. Used in encoding of input and as name in hdf5 chs: List of Chromosomes to run. Used in input and output file names. mpileup_path: Where to find the actual out_path: Where to find the output files. Folder in form /PATH/ refh5_path: Reference HDF5s, in format /PATH/chr processes: How many Processes to run in parallel

hapsburg.PackagesSupport.h5_python.h5_functions.pull_down_pileup(path_bam='', iid='', chs=range(1, 23), processes=4, path_bed='', out_path='', q=30, Q=30, output=True): Produces a Pull Down File from a .bam file, output a pileup file in the standard format IID.chrX.mpileup path_bam: From which bam to pulldown. chs: Which chromosomes to run [LIST] path_bed: Path to BED file of SNP set to pulldown. format PATH.chr out_path: Where to pulldown to. processes: How many Processes to run

hapsburg.PackagesSupport.h5_python.h5_functions.bam_to_hdf5(iid='', chs=range(1, 23), processes=4, path_bam='', path_bed='', pileup_path='', outh5_path='', refh5_path='', q=30, Q=30, output=False): Converts a bam file to a HDF5 file. Goes via samtools pulldown file as intermediate. Produce output in standard file name chrX.hdf5. Runs multiple chromosomes, and can be parallelized (into multiple Processes) Parameters: iid: What IID to save to [STRING] chs: Which chromosomes to run [LIST] path_bam: Complete path to bam to pulldown. path_bed: Complete path to BED file of SNP set to pulldown. Format PATH.chr but not X.bed pileup_path: Where to pulldown to. Folder in form /PATH/ outh5_path: Where to put the hdf5 output files. Folder in form /PATH/ refh5_path: Reference HDF5s, in format /PATH/chr processes: How many Processes to run in parallel