hapsburg.PackagesSupport.pp_individual_roh_csvs =============================================== .. py:module:: hapsburg.PackagesSupport.pp_individual_roh_csvs .. autoapi-nested-parse:: Helper Functions to post-process large Number of Individuals on Cluster into Summary ROH data tables (saved as pandas) Structure: Produce list of paths with Individuals ROH files for a group. These are then combined into one summary data frame (and some post-processing such as gap merging can be done) This is then saved as a single "summary" dataframe (csv) @ Author: Harald Ringbauer, 2019 Functions --------- .. autoapisummary:: hapsburg.PackagesSupport.pp_individual_roh_csvs.give_iid_paths hapsburg.PackagesSupport.pp_individual_roh_csvs.create_combined_ROH_df hapsburg.PackagesSupport.pp_individual_roh_csvs.combine_ROH_df hapsburg.PackagesSupport.pp_individual_roh_csvs.merge_called_blocks hapsburg.PackagesSupport.pp_individual_roh_csvs.merge_called_blocks_custom hapsburg.PackagesSupport.pp_individual_roh_csvs.post_process_roh_df hapsburg.PackagesSupport.pp_individual_roh_csvs.individual_roh_statistic hapsburg.PackagesSupport.pp_individual_roh_csvs.pp_individual_roh hapsburg.PackagesSupport.pp_individual_roh_csvs.pp_X_roh hapsburg.PackagesSupport.pp_individual_roh_csvs.extract_df_geo hapsburg.PackagesSupport.pp_individual_roh_csvs.extract_df_age hapsburg.PackagesSupport.pp_individual_roh_csvs.give_df_clsts hapsburg.PackagesSupport.pp_individual_roh_csvs.extract_sub_df_geo_kw hapsburg.PackagesSupport.pp_individual_roh_csvs.get_post_processed_df hapsburg.PackagesSupport.pp_individual_roh_csvs.give_pair_iids hapsburg.PackagesSupport.pp_individual_roh_csvs.calc_average_roh Module Contents --------------- .. py:function:: give_iid_paths(iids, base_folder='./Empirical/HO/', suffix='_roh_full.csv') Return list of the paths to each ROH.csv. Combine basefolder and iids. .. py:function:: create_combined_ROH_df(paths, iids, pops, min_cm=[4, 8, 12], snp_cm=100, gap=0.5, min_len1=2, min_len2=4, output=True, sort=True) Create ROH Summary Dataframe paths: List of .csv Paths to load the Data from snp_cm: Minimum SNP Density per cM min_cm: Minimal centiMorgan for Postprocessing Postanalysis savepath: If given, where to save the summary .csv to gap: Maximum Gaps to merge [in cM] min_len1: Minimum Length of both Blocks for merge [in cM] min_len2: Minimum Length of longer Block to merge [in cM] .. py:function:: combine_ROH_df(df_rohs, iids=[], pops=[], min_cm=[4, 8, 12], snp_cm=100, gap=0.5, min_len1=2, min_len2=4, output=True, sort=True) Takes list of ROH Dataframes, and creates a single summary dataframe that is also post-processed. Return single combined dataframe. If iids and pops given, add these to the dataframe Being wrapped around by create_combined_ROH_df which does the path and loading. df_rohs: List of individual dataframes iids: IIds of the Individuals (filled in columns) pops: Populations of the Individuals (filled in column) gap, min_len1 and 2 are in cM (!!) .. py:function:: merge_called_blocks(df, max_gap=0, min_len1=0.02, min_len2=0.04, output=False) Merge Blocks in Dataframe df and return merged Dataframe. Gap is given in Morgan .. py:function:: merge_called_blocks_custom(df, max_gap=0.005, min_len1=0.02, min_len2=0.04, roh_min_l_final=0.05, output=False) Merge Blocks in Dataframe df and return merged Dataframe. Gap is given in Morgan .. py:function:: post_process_roh_df(df, min_cm=4, snp_cm=60, output=True) Post Process ROH Dataframe. Filter to rows that are okay. min_cm: Minimum Length in CentiMorgan snp_cm: How many SNPs per CentiMorgan .. py:function:: individual_roh_statistic(df, output=True) Gives out Summary statistic of ROH df .. py:function:: pp_individual_roh(iids, meta_path='./Data/ReichLabEigenstrat/Raw/meta.csv', base_folder='./Empirical/Eigenstrat/Reichall/', suffix='_roh_full.csv', save_path='', min_cm=[4, 8, 12], snp_cm=50, gap=0.5, min_len1=2.0, min_len2=4.0, output=True, meta_info=True) Post-process Individual ROH .csv files. Combines them into one summary ROH.csv, saved in save_path. Use Individuals iids, create paths and run the combining. iids: List of target Individuals base_folder: Folder where to find individual results .csvs min_cm: Minimum post-processed Length of ROH blocks. Array (to have multiple possible values) snp_cm: Minimum Number of SNPs per cM gap: Maximum length of gaps to merge output: Whether to plot output per Individual. meta_info: Whether to merge in Meta-Info from the original Meta File save_path: If given, save resulting dataframe there min_len1: Minimum Length of shorter block to merge [cM] min_len2: Maximum Length of longer block to merge [cM] .. py:function:: pp_X_roh(iids=[], base_folder='./Empirical/Eigenstrat/Reichall/', folder_ch='chrX/', suffix='roh.csv', meta_path='', meta_sep=',', clst_col='clst', iid_col='iid', save_path='', min_cm=[4, 8, 12, 20], snp_cm=50, gap=0.5, min_len1=2.0, min_len2=4.0, output=True, sort=False) Post-process pairs of X Chromosomes. Return dataframe of X Chromosome IBDs. iids: List of pairs of X Chromosomes. Use Meta Data from meta_path to set clusters (only if meta_path given) Other parameters see pp_individual_roh .. py:function:: extract_df_geo(df, lat0, lat1, lon0, lon1) Extract Dataframe df from Sub Data frame based on coordinates lat0,lat1: Min and Max Lat. Equ. for lon0,lon1 .. py:function:: extract_df_age(df, age0, age1=1000000.0) Extract Dataframe based on age. df: Input Dataframe; age0 and age1 min and max age .. py:function:: give_df_clsts(df, search=[], col='pop') Return sub dataframe within df where in col one of search strings (list of string) .. py:function:: extract_sub_df_geo_kw(df, lat0, lat1, lon0, lon1, keywords=[], output=True) Extract Dataframe df from Sub Data frame based on coordinates AND from keywords lat0,lat1: Min and Max Lat. Equ. for lon0,lon1 .. py:function:: get_post_processed_df(iid, base_path='./PATH/', suffix='_roh_full.csv', snp_cm=50, gap=0.5, min_len1=2, min_len2=4, output=False) Return fully post-processed Dataframe from standard raw ROH/IBD dataframe. Lengths are given in cM .. py:function:: give_pair_iids(meta_path='./PATH.tsv', sep_meta='\t', col_sex='sex', sex='M', col_iid='iid', n_cov_snp=400000.0) Get list of all pairs of IIDs from metafile. sex: Value in sex column n_cov_snp: Minium Number of covered SNPs .. py:function:: calc_average_roh(df, cms=[4, 8, 12, 20], col_pop='pop', new_pop='Average') Calcualte the average ROH in df. in columns cmss [list] Return new dataframe