hapsburg.PackagesSupport.h5_python.h5_functions =============================================== .. py:module:: hapsburg.PackagesSupport.h5_python.h5_functions .. autoapi-nested-parse:: Contains various Functions to operate with h5 files: Loading h5, as well as converting to VCFs @ Author: Harald Ringbauer, 2019, All rights reserved Functions --------- .. autoapisummary:: hapsburg.PackagesSupport.h5_python.h5_functions.load_h5 hapsburg.PackagesSupport.h5_python.h5_functions.save_data_h5 hapsburg.PackagesSupport.h5_python.h5_functions.to_vcf hapsburg.PackagesSupport.h5_python.h5_functions.add_gt_data hapsburg.PackagesSupport.h5_python.h5_functions.hdf5_to_vcf hapsburg.PackagesSupport.h5_python.h5_functions.ad_to_gentoypeL hapsburg.PackagesSupport.h5_python.h5_functions.gl_to_pl hapsburg.PackagesSupport.h5_python.h5_functions.merge_in_ld_map hapsburg.PackagesSupport.h5_python.h5_functions.bring_over_samples hapsburg.PackagesSupport.h5_python.h5_functions.merge_chr_hdf5 hapsburg.PackagesSupport.h5_python.h5_functions.concat_fields hapsburg.PackagesSupport.h5_python.h5_functions.combine_hdf5s hapsburg.PackagesSupport.h5_python.h5_functions.mpileup2hdf5 hapsburg.PackagesSupport.h5_python.h5_functions.mpileups2hdf5 hapsburg.PackagesSupport.h5_python.h5_functions.pull_down_pileup hapsburg.PackagesSupport.h5_python.h5_functions.bam_to_hdf5 Module Contents --------------- .. py:function:: load_h5(path, output=True) Load HDF5 from path and return hdf5 object .. py:function:: save_data_h5(gt, ad, ref, alt, pos, rec, samples, path, gp=[], af=[], ch=[], compression='gzip', ad_group=True, gt_type='int8') Create a new HDF5 File with Input Data. gt: Genotype data [l,k,2] ad: Allele depth [l,k,2] ref: Reference Allele [l] alt: Alternate Allele [l] pos: Position [l] ch: Chromosome [l] only numerical values (int8) allowed m: Map position [l] af: Allele Frequencies [l] samples: Sample IDs [k] Save genotype data as int8, readcount data as int16. ad_group: whether to save allele depth gt_type: What genotype data type save .. py:function:: to_vcf(chrom, pos, ref, alt, gt, iids, vcf_path, header=[], pl=[]) Saves VCF. If Genotype Likelihoods given (pl), save them too. .. py:function:: add_gt_data(df, gt, pl=[], iids=[], m_sym='.') Add Genotype and Allele Depth Fields [l,n,2] for iids to pandas dataframe df. Return modified Data Frame". If pl (Genotype Likelihoods) given, add them too. .. py:function:: hdf5_to_vcf(path_h5, path_vcf, iids=[], markers=[], chrom=0, pl_field=False) Load HDF5 from path_h5, extract iids and (if given) markers by position and save vcf to path_vcf. pl: If True, also save Genotype Likelihoods! chrom: Value for chromosome (otherwise load from h5) iids: Which Individuals to match and save. If none given: Save all! .. py:function:: ad_to_gentoypeL(ad, error=0.001) Convert Allele Depth Fields to Genotype Likelihoods. ad: [l,n,2] contains allele contains readcounts (integers) error: Flip Error for Read return: Genotype Probabilities (Pr(G|RC)) [l,n,3] for 00/01/11 .. py:function:: gl_to_pl(gl) Convert Genotype Probabilities to normalized PHRED scores gl: [l,n,3] Probabilities Pr(G|RC) (not logscale) return: [l,n,3] vector .. py:function:: merge_in_ld_map(path_h5, path_snp1240k, chs=range(1, 23), write_mode='a') Merge in MAP from eigenstrat .snp file into hdf5 file. Save modified h5 in place path_h5: Path to hdf5 file to modify. path_snp1240k: Path to Eigenstrat .snp file whose map to use chs: Which Chromosomes to merge in HDF5 [list]. write_mode: Which mode to use on hdf5. a: New field. r+: Change Field .. py:function:: bring_over_samples(h5_original, h5_target, field='samples', dt='S32') Bring over field from one h5 to another. Assume field does not exist in target h5_original: The original hdf5 path h5_target: The target hdf5 path field: Which field to copy over .. py:function:: merge_chr_hdf5(fs, path_combined_h5='', chs=[]) Combine Genotype hdf5s from different chromosomes into one hdf5 and and save at path_new fs: List of hdf5s. path_combined_h5: Where to save the new masive hdf5 For now only save Allele Depths an GT - but not GP -IMPLEMENT UPDATE chs: list of chromosomes .. py:function:: concat_fields(f, f2, field1, field2, axis=0) Concatenate two hdf5 fields and return data .. py:function:: combine_hdf5s(f, g, path_new) Combine Genotype hdf5s and save at path_new f,g: Genotzpe hdf5s. g will be appended to f path_new: Where to save the new masive hdf5 .. py:function:: mpileup2hdf5(path2mpileup, refHDF5, iid='', s=-np.inf, e=np.inf, outPath='', output=True) Function to convert Pileup to HDF5 format. Outputs HDF5 file at outPath in format .. py:function:: mpileups2hdf5(iid='', chs=range(1, 23), mpileup_path='', out_path='', refh5_path='', s=-np.inf, e=np.inf, output=True, processes=1) Function to transfrom Pileups from several chromosomes to hdf5s. Effectively a wrapper of mpileup2hdf5 Assumes input pileups are in format: IID.chrX.mpileup. Produce output in standard file name chrX.hdf5 iid: Name of Indivdiual to run. Used in encoding of input and as name in hdf5 chs: List of Chromosomes to run. Used in input and output file names. mpileup_path: Where to find the actual out_path: Where to find the output files. Folder in form /PATH/ refh5_path: Reference HDF5s, in format /PATH/chr processes: How many Processes to run in parallel .. py:function:: pull_down_pileup(path_bam='', iid='', chs=range(1, 23), processes=4, path_bed='', out_path='', q=30, Q=30, output=True) Produces a Pull Down File from a .bam file, output a pileup file in the standard format IID.chrX.mpileup path_bam: From which bam to pulldown. chs: Which chromosomes to run [LIST] path_bed: Path to BED file of SNP set to pulldown. format PATH.chr out_path: Where to pulldown to. processes: How many Processes to run .. py:function:: bam_to_hdf5(iid='', chs=range(1, 23), processes=4, path_bam='', path_bed='', pileup_path='', outh5_path='', refh5_path='', q=30, Q=30, output=False) Converts a bam file to a HDF5 file. Goes via samtools pulldown file as intermediate. Produce output in standard file name chrX.hdf5. Runs multiple chromosomes, and can be parallelized (into multiple Processes) Parameters: iid: What IID to save to [STRING] chs: Which chromosomes to run [LIST] path_bam: Complete path to bam to pulldown. path_bed: Complete path to BED file of SNP set to pulldown. Format PATH.chr but not X.bed pileup_path: Where to pulldown to. Folder in form /PATH/ outh5_path: Where to put the hdf5 output files. Folder in form /PATH/ refh5_path: Reference HDF5s, in format /PATH/chr processes: How many Processes to run in parallel