hapsburg.PackagesSupport.pp_individual_roh_csvs
===============================================

.. py:module:: hapsburg.PackagesSupport.pp_individual_roh_csvs

.. autoapi-nested-parse::

   Helper Functions to post-process large Number of Individuals on Cluster into Summary ROH data tables
   (saved as pandas)
   Structure: Produce list of paths with Individuals ROH files for a group.
   These are then combined into one summary data frame (and some post-processing such as gap merging can be done)
   This is then saved as a single "summary" dataframe (csv)
   @ Author: Harald Ringbauer, 2019


Functions
---------

.. autoapisummary::

   hapsburg.PackagesSupport.pp_individual_roh_csvs.give_iid_paths
   hapsburg.PackagesSupport.pp_individual_roh_csvs.create_combined_ROH_df
   hapsburg.PackagesSupport.pp_individual_roh_csvs.combine_ROH_df
   hapsburg.PackagesSupport.pp_individual_roh_csvs.merge_called_blocks
   hapsburg.PackagesSupport.pp_individual_roh_csvs.merge_called_blocks_custom
   hapsburg.PackagesSupport.pp_individual_roh_csvs.post_process_roh_df
   hapsburg.PackagesSupport.pp_individual_roh_csvs.individual_roh_statistic
   hapsburg.PackagesSupport.pp_individual_roh_csvs.pp_individual_roh
   hapsburg.PackagesSupport.pp_individual_roh_csvs.pp_X_roh
   hapsburg.PackagesSupport.pp_individual_roh_csvs.extract_df_geo
   hapsburg.PackagesSupport.pp_individual_roh_csvs.extract_df_age
   hapsburg.PackagesSupport.pp_individual_roh_csvs.give_df_clsts
   hapsburg.PackagesSupport.pp_individual_roh_csvs.extract_sub_df_geo_kw
   hapsburg.PackagesSupport.pp_individual_roh_csvs.get_post_processed_df
   hapsburg.PackagesSupport.pp_individual_roh_csvs.give_pair_iids
   hapsburg.PackagesSupport.pp_individual_roh_csvs.calc_average_roh


Module Contents
---------------

.. py:function:: give_iid_paths(iids, base_folder='./Empirical/HO/', suffix='_roh_full.csv')

   Return list of the paths to each ROH.csv.
   Combine basefolder and iids.


.. py:function:: create_combined_ROH_df(paths, iids, pops, min_cm=[4, 8, 12], snp_cm=100, gap=0.5, min_len1=2, min_len2=4, output=True, sort=True)

   Create ROH Summary Dataframe
   paths: List of .csv Paths to load the Data from
   snp_cm: Minimum SNP Density per cM
   min_cm: Minimal centiMorgan for Postprocessing Postanalysis
   savepath: If given, where to save the summary .csv to
   gap: Maximum Gaps to merge [in cM]
   min_len1: Minimum Length of both Blocks for merge [in cM]
   min_len2: Minimum Length of longer Block to merge [in cM]


.. py:function:: combine_ROH_df(df_rohs, iids=[], pops=[], min_cm=[4, 8, 12], snp_cm=100, gap=0.5, min_len1=2, min_len2=4, output=True, sort=True)

   Takes list of ROH Dataframes, and creates a single
   summary dataframe that is also post-processed.
   Return single combined dataframe.
   If iids and pops given, add these to the dataframe
   Being wrapped around by create_combined_ROH_df
   which does the path and loading.
   df_rohs: List of individual dataframes
   iids: IIds of the Individuals (filled in columns)
   pops: Populations of the Individuals (filled in column)
   gap, min_len1 and 2 are in cM (!!)


.. py:function:: merge_called_blocks(df, max_gap=0, min_len1=0.02, min_len2=0.04, output=False)

   Merge Blocks in Dataframe df and return merged Dataframe.
   Gap is given in Morgan


.. py:function:: merge_called_blocks_custom(df, max_gap=0.005, min_len1=0.02, min_len2=0.04, roh_min_l_final=0.05, output=False)

   Merge Blocks in Dataframe df and return merged Dataframe.
   Gap is given in Morgan


.. py:function:: post_process_roh_df(df, min_cm=4, snp_cm=60, output=True)

   Post Process ROH Dataframe. Filter to rows that are okay.
   min_cm: Minimum Length in CentiMorgan
   snp_cm: How many SNPs per CentiMorgan


.. py:function:: individual_roh_statistic(df, output=True)

   Gives out Summary statistic of ROH df


.. py:function:: pp_individual_roh(iids, meta_path='./Data/ReichLabEigenstrat/Raw/meta.csv', base_folder='./Empirical/Eigenstrat/Reichall/', suffix='_roh_full.csv', save_path='', min_cm=[4, 8, 12], snp_cm=50, gap=0.5, min_len1=2.0, min_len2=4.0, output=True, meta_info=True)

   Post-process Individual ROH .csv files. Combines them into one summary ROH.csv, saved in save_path.
   Use Individuals iids, create paths and run the combining.
   iids: List of target Individuals
   base_folder: Folder where to find individual results .csvs
   min_cm: Minimum post-processed Length of ROH blocks. Array (to have multiple possible values)
   snp_cm: Minimum Number of SNPs per cM
   gap: Maximum length of gaps to merge
   output: Whether to plot output per Individual.
   meta_info: Whether to merge in Meta-Info from the original Meta File
   save_path: If given, save resulting dataframe there
   min_len1: Minimum Length of shorter block to merge [cM]
   min_len2: Maximum Length of longer block to merge [cM]


.. py:function:: pp_X_roh(iids=[], base_folder='./Empirical/Eigenstrat/Reichall/', folder_ch='chrX/', suffix='roh.csv', meta_path='', meta_sep=',', clst_col='clst', iid_col='iid', save_path='', min_cm=[4, 8, 12, 20], snp_cm=50, gap=0.5, min_len1=2.0, min_len2=4.0, output=True, sort=False)

   Post-process pairs of X Chromosomes. Return dataframe of X Chromosome IBDs.
   iids: List of pairs of X Chromosomes.
   Use Meta Data from meta_path to set clusters (only if meta_path given)
   Other parameters see pp_individual_roh


.. py:function:: extract_df_geo(df, lat0, lat1, lon0, lon1)

   Extract Dataframe df from Sub Data frame based on coordinates
   lat0,lat1: Min and Max Lat. Equ. for lon0,lon1


.. py:function:: extract_df_age(df, age0, age1=1000000.0)

   Extract Dataframe based on age.
   df: Input Dataframe; age0 and age1 min and max age


.. py:function:: give_df_clsts(df, search=[], col='pop')

   Return sub dataframe within df
   where in col one of search strings (list of string)


.. py:function:: extract_sub_df_geo_kw(df, lat0, lat1, lon0, lon1, keywords=[], output=True)

   Extract Dataframe df from Sub Data frame based on coordinates
   AND from keywords
   lat0,lat1: Min and Max Lat. Equ. for lon0,lon1


.. py:function:: get_post_processed_df(iid, base_path='./PATH/', suffix='_roh_full.csv', snp_cm=50, gap=0.5, min_len1=2, min_len2=4, output=False)

   Return fully post-processed Dataframe from standard raw ROH/IBD dataframe.
   Lengths are given in cM


.. py:function:: give_pair_iids(meta_path='./PATH.tsv', sep_meta='\t', col_sex='sex', sex='M', col_iid='iid', n_cov_snp=400000.0)

   Get list of all pairs of IIDs from metafile.
   sex: Value in sex column
   n_cov_snp: Minium Number of covered SNPs


.. py:function:: calc_average_roh(df, cms=[4, 8, 12, 20], col_pop='pop', new_pop='Average')

   Calcualte the average ROH in df.
   in columns cmss [list]
   Return new dataframe