hapsburg.preprocessing_lowmem

Classes

PreProcessingHDF5_lowmem

Class for PreProcessing the Data.

PreProcessingEigenstrat_lowmem

Class for PreProcessing Eigenstrat Files

PreProcessingEigenstratX_lowmem

Class for PreProcessing Eigenstrat Files

Functions

extract_snps_hdf5_lowmem(h5, ids_ref, markers, ch, ...)

Extract genotypes from h5 on ids and markers.

load_preprocessing_lowmem([p_model, conPop, save, output])

Factory method to load the Transition Model.

Module Contents

hapsburg.preprocessing_lowmem.extract_snps_hdf5_lowmem(h5, ids_ref, markers, ch, meta_path_ref, verbose=True, diploid=True)

Extract genotypes from h5 on ids and markers. If diploid, concatenate haplotypes along 0 axis. Extract indivuals first, and then subset to SNPs. Return 2D array [# haplotypes, # markers]

class hapsburg.preprocessing_lowmem.PreProcessingHDF5_lowmem(conPop=[], save=True, output=True)

Bases: hapsburg.preprocessing.PreProcessingHDF5

Class for PreProcessing the Data. Standard: Intersect Reference Data with Individual Data Return the Intersection Dataset

load_data(iid='MA89', ch=6, start=-np.inf, end=np.inf)

Return Matrix of reference [k,l], Matrix of Individual Data [2,l], as well as linkage Map [l]

optional_postprocessing(gts_ind, gts, r_map, pos, out_folder, pCon, read_counts=[])

Postprocessing steps of gts_ind, gts, r_map, and the folder, based on boolean fields of the class.

class hapsburg.preprocessing_lowmem.PreProcessingEigenstrat_lowmem(save=True, output=True, packed=1, sep='\\s+')

Bases: hapsburg.preprocessing.PreProcessingEigenstrat

Class for PreProcessing Eigenstrat Files Same as PreProcessingHDF5 for reference, but with Eigenstrat coe for target

optional_postprocessing(gts_ind, gts, r_map, pos, out_folder, read_counts=[])

Postprocessing steps of gts_ind, gts, r_map, and the folder, based on boolean fields of the class.

load_data(iid='MA89', ch=6)

Return Matrix of reference [k,l], Matrix of Individual Data [2,l], as well as linkage Map [l] and the output folder. Save the loaded data if self.save==True Various modifiers in class fields (check also PreProcessingHDF5)

class hapsburg.preprocessing_lowmem.PreProcessingEigenstratX_lowmem(save=True, output=True, packed=1, sep='\\s+')

Bases: PreProcessingEigenstrat_lowmem, hapsburg.preprocessing.PreProcessingEigenstratX

Class for PreProcessing Eigenstrat Files Same as PreProcessingHDF5 for reference, but with Eigenstrat coe for target

set_output_folder(iid, ch='X')

Set the output folder after folder_out. General Structure for HAPSBURG: folder_out/iid/chrX/

get_1000G_path(h5_path1000g, ch='X')

Construct and return the path to the 1000 Genome reference panel

es_get_index_iid(es, iid)

Get IID of Indices

extract_snps_es(es, id, markers)

Use Eigenstrat object. Extract genotypes for individual index i (integer) for list of markers. Do conversion from Eigenstrat GT to format used here

load_data(iid='MA89', ch='X')

Return Matrix of reference [k,l], Matrix of Individual Data [2,l], as well as linkage Map [l] and the output folder. Save the loaded data if self.save==True Various modifiers in class fields (check also PreProcessingHDF5)

hapsburg.preprocessing_lowmem.load_preprocessing_lowmem(p_model='Eigenstrat', conPop=[], save=True, output=True)

Factory method to load the Transition Model. Return