hapsburg.PackagesSupport.parallel_runs.helper_functions

Helper Functions for Notebook Runs on Cluster @ Author: Harald Ringbauer, 2019

Functions

prepare_path(base_path, iid, ch, prefix_out[, logfile])

Prepare the path and pipe printing for one Individual.

multi_run(fun, prms[, processes, output])

Implementation of running in Parallel.

split_up_roh_df(base_path, path_out, iid[, file_in, ...])

Splits up the ROH-dataframe from base_path/file_in into file_out.

get_sep_from_extension(path)

Get Seperator for csv/tsv from file extensions.

combine_individual_data(base_path, iid[, delete, chs, ...])

Function to merge data from one Individual Analysis (all Chromosome)

move_X_to_parent_folder(base_path, iid[, delete, ch, ...])

Take ROH result table from X folder, and move it to parent folder.

create_folders(input_base_folder[, outfolder])

Create Folders for ROH analysis with Plink/BCFTOOLs.

split_up_inferred_roh(df_t, iid, save_path)

Extract only ROH from Individual iid and saves it to save_path

postprocess_iid(df_plink, input_base_folder, iids[, ...])

Split up results into roh.csv and roh_gt.csv for each IID.

Module Contents

hapsburg.PackagesSupport.parallel_runs.helper_functions.prepare_path(base_path, iid, ch, prefix_out, logfile=True)

Prepare the path and pipe printing for one Individual. Create Path if not already existing. logfile: Whether to pipe output to log-file

hapsburg.PackagesSupport.parallel_runs.helper_functions.multi_run(fun, prms, processes=4, output=False)

Implementation of running in Parallel. fun: Function prms: The Parameter Files processes: How many Processes to use

hapsburg.PackagesSupport.parallel_runs.helper_functions.split_up_roh_df(base_path, path_out, iid, file_in='roh_info.csv', file_out='roh_gt.csv')

Splits up the ROH-dataframe from base_path/file_in into file_out. Picks out Individual iid. Done to pass on “ground truth” base_path: Where to find roh_info.csv path_out: Where to save roh_gt to iid: Which Individual to extract from roh_info.csv.

hapsburg.PackagesSupport.parallel_runs.helper_functions.get_sep_from_extension(path)

Get Seperator for csv/tsv from file extensions. Either comma or tab. Return delimiter

hapsburg.PackagesSupport.parallel_runs.helper_functions.combine_individual_data(base_path, iid, delete=False, chs=range(1, 23), prefix_out='', file='roh.csv', file_result='_roh_full.csv')

Function to merge data from one Individual Analysis (all Chromosome) chs: Which Chromosomes to combine” file: Which files to combine. Either roh or ibd.csv delete: Whether to delete individual folder and contents after combining.

hapsburg.PackagesSupport.parallel_runs.helper_functions.move_X_to_parent_folder(base_path, iid, delete=False, ch=23, prefix_out='', file_result='_roh_full.csv')

Take ROH result table from X folder, and move it to parent folder. Delete the original result folder

hapsburg.PackagesSupport.parallel_runs.helper_functions.create_folders(input_base_folder, outfolder='plink_out/')

Create Folders for ROH analysis with Plink/BCFTOOLs. Operates within HAPSBURG Mosaic Data Structure. Return h5 path, vcf path, and folder for intermediary output

hapsburg.PackagesSupport.parallel_runs.helper_functions.split_up_inferred_roh(df_t, iid, save_path)

Extract only ROH from Individual iid and saves it to save_path

hapsburg.PackagesSupport.parallel_runs.helper_functions.postprocess_iid(df_plink, input_base_folder, iids, ch=3, prefix_out='')

Split up results into roh.csv and roh_gt.csv for each IID. df_plink: Data Frame with Plink results, formated correctly