hapsburg.PackagesSupport.parallel_runs.helper_functions

Helper Functions for Notebook Runs on Cluster @ Author: Harald Ringbauer, 2019

Functions

`prepare_path`(base_path, iid, ch, prefix_out[, logfile])	Prepare the path and pipe printing for one Individual.
`multi_run`(fun, prms[, processes, output])	Implementation of running in Parallel.
`split_up_roh_df`(base_path, path_out, iid[, file_in, ...])	Splits up the ROH-dataframe from base_path/file_in into file_out.
`get_sep_from_extension`(path)	Get Seperator for csv/tsv from file extensions.
`combine_individual_data`(base_path, iid[, delete, chs, ...])	Function to merge data from one Individual Analysis (all Chromosome)
`move_X_to_parent_folder`(base_path, iid[, delete, ch, ...])	Take ROH result table from X folder, and move it to parent folder.
`create_folders`(input_base_folder[, outfolder])	Create Folders for ROH analysis with Plink/BCFTOOLs.
`split_up_inferred_roh`(df_t, iid, save_path)	Extract only ROH from Individual iid and saves it to save_path
`postprocess_iid`(df_plink, input_base_folder, iids[, ...])	Split up results into roh.csv and roh_gt.csv for each IID.

hapsburg.PackagesSupport.parallel_runs.helper_functions.prepare_path(base_path, iid, ch, prefix_out, logfile=True): Prepare the path and pipe printing for one Individual. Create Path if not already existing. logfile: Whether to pipe output to log-file

hapsburg.PackagesSupport.parallel_runs.helper_functions.multi_run(fun, prms, processes=4, output=False): Implementation of running in Parallel. fun: Function prms: The Parameter Files processes: How many Processes to use

hapsburg.PackagesSupport.parallel_runs.helper_functions.split_up_roh_df(base_path, path_out, iid, file_in='roh_info.csv', file_out='roh_gt.csv'): Splits up the ROH-dataframe from base_path/file_in into file_out. Picks out Individual iid. Done to pass on “ground truth” base_path: Where to find roh_info.csv path_out: Where to save roh_gt to iid: Which Individual to extract from roh_info.csv.

hapsburg.PackagesSupport.parallel_runs.helper_functions.get_sep_from_extension(path): Get Seperator for csv/tsv from file extensions. Either comma or tab. Return delimiter

hapsburg.PackagesSupport.parallel_runs.helper_functions.combine_individual_data(base_path, iid, delete=False, chs=range(1, 23), prefix_out='', file='roh.csv', file_result='_roh_full.csv'): Function to merge data from one Individual Analysis (all Chromosome) chs: Which Chromosomes to combine” file: Which files to combine. Either roh or ibd.csv delete: Whether to delete individual folder and contents after combining.

hapsburg.PackagesSupport.parallel_runs.helper_functions.move_X_to_parent_folder(base_path, iid, delete=False, ch=23, prefix_out='', file_result='_roh_full.csv'): Take ROH result table from X folder, and move it to parent folder. Delete the original result folder

hapsburg.PackagesSupport.parallel_runs.helper_functions.create_folders(input_base_folder, outfolder='plink_out/'): Create Folders for ROH analysis with Plink/BCFTOOLs. Operates within HAPSBURG Mosaic Data Structure. Return h5 path, vcf path, and folder for intermediary output

hapsburg.PackagesSupport.parallel_runs.helper_functions.split_up_inferred_roh(df_t, iid, save_path): Extract only ROH from Individual iid and saves it to save_path

hapsburg.PackagesSupport.parallel_runs.helper_functions.postprocess_iid(df_plink, input_base_folder, iids, ch=3, prefix_out=''): Split up results into roh.csv and roh_gt.csv for each IID. df_plink: Data Frame with Plink results, formated correctly