jass_preprocessing package

Submodules

jass_preprocessing.compute_score module

jass_preprocessing.compute_score.compute_sample_size(mgwas, diagnostic_folder, trait, perSS=0.7)[source]
jass_preprocessing.compute_score.compute_z_score(mgwas)[source]

Compute zscore value and sign1 add the corresponding column to the mgwas dataframe

jass_preprocessing.dna_utils module

Few fonction to to compute DNA complement

jass_preprocessing.dna_utils.dna_complement(input)[source]
jass_preprocessing.dna_utils.dna_complement_base(inputbase)[source]

jass_preprocessing.map_gwas module

Map GWAS

A set of functions to find GWAS files in subfolder and to map columns

jass_preprocessing.map_gwas.convert_missing_values(df)[source]

Convert all missing value strings to a standart np.nan value

Parameters:

GWAS_table (pandas dataframe) – GWAS data as a dataframe

Returns:

a pandas dataframe with missing value all equal to np.nan

Walk the GWAS path to find the GWAS tables

Parameters:
  • GWAS_table (str) – path of the folder to explore

  • findfile (str) – name of the file to find

Returns:

a pandas dataframe with one column for the filename and one column containing the complete path to the file

jass_preprocessing.map_gwas.map_columns_position(gwas_internal_link, column_dict)[source]

Find column position for each specific Gwas

Parameters:
  • gwas_internal_link (str) – filename of the GWAS data (with path)

  • GWAS_labels (pd.DataFrame) – corresponding row of the information file

Returns:

pandas Series with column position and column names as index

jass_preprocessing.map_gwas.read_gwas(gwas_internal_link, column_map, imputation_treshold=None)[source]

Read gwas raw data, fetch columns thanks to position stored in column_map and rename columns according to column_map.index

Parameters:
  • gwas_internal_link (str) – GWAS data as a dataframe

  • column_map (pandas Series) – Series containing the position of column in

  • data (the raw) –

Returns:

a pandas dataframe with missing value all equal to np.nan

jass_preprocessing.map_gwas.walkfs(startdir, findfile)[source]

Go through the folder and subfolder to find the specified file

Parameters:
  • startdir (str) – path of the folder to explore

  • findfile (str) – name of the file to find

jass_preprocessing.map_reference module

Module of function

jass_preprocessing.map_reference.compute_is_aligned(mgwas)[source]

Check if the reference panel and the GWAS data have the same reference allele. return a boolean vector. The function should be the complement of “is_flipped” but we still compute the two function to eventually detect weird cases (more than two alleles for instance)

jass_preprocessing.map_reference.compute_is_flipped(mgwas)[source]

Check if the reference panel and the GWAS data have the same reference allele. return a boolean vector.

Parameters:

mgwas (pandas dataframe) – GWAS study dataframe merged with the reference_panel

Returns:

merge studies,

Return type:

is_flipped (pandas dataframe)

jass_preprocessing.map_reference.compute_snp_alignement(mgwas)[source]

Add a column to mgwas indicating if the reference and coded allele is flipped compared to the reference panel. If it is, the sign of the statistic must be flipped :param mgwas: a pandas dataframe of the GWAS data merged

with the reference panel

jass_preprocessing.map_reference.map_on_ref_panel(gw_df, ref_panel, index_type='rs-number')[source]

Merge Gwas dataframe with the reference panel Make sure that the same SNPs are in the reference panel and the gwas

Parameters:
  • gw_df (pandas dataframe) – GWAS study dataframe

  • ref_panel (pandas dataframe) – reference panel dataframe

Returns:

merge studies,

Return type:

merge_GWAS (pandas dataframe)

jass_preprocessing.map_reference.read_reference(gwas_reference_panel, mask_MHC=False, minimum_MAF=None, region_to_mask=None)[source]

helper function to name correctly the column :param gwas_reference_panel: path toward the reference panel file :type gwas_reference_panel: str :param mask_MHC: Whether the MHC region should be masked or not. default is False :type mask_MHC: bool :param Filter the reference panel by minimum allele frequency: :type Filter the reference panel by minimum allele frequency: hg19 coordinate :param minimum_MAF: minimum allele frequency for a SNPs to be retain in the panel :type minimum_MAF: float :param region_to_mask: a list of additional regions to mask :type region_to_mask: dict :param type_of_index: ‘rs-number’ or ‘positional’ :type type_of_index: str

Returns:

the reference_panel with the specified filter applied

Return type:

ref (pandas dataframe)

jass_preprocessing.save_output module

jass_preprocessing.save_output.save_output(mgwas, ImpG_output_Folder, my_study)[source]

Write the preprocessed Gwas for ldscore analysis

jass_preprocessing.save_output.save_output_by_chromosome(mgwas, ImpG_output_Folder, my_study)[source]

Write the preprocessed Gwas for imputation

Module contents

map_gwas

Map GWAS

dna_utils

Few fonction to to compute DNA complement

map_reference

Module of function

compute_score

save_output