Data preparation

The first paragraph of this section describes the input data that have to be provided in order to run JASS, as well as their format. To generate these data, you can use the procedure described in the second paragraph. The final paragraph describes an imputation tool compatible with JASS input format (optional preparation step).

JASS input data

JASS data, from which all statistics can be computed, are stored in an HDF5 file. This file can be created with the procedure create-inittable. This procedure needs the following input files to complete:

GWAS description

This file that must contain the following columns and tab-separated:

Consortium

Outcome

FullName

Type

Reference

ReferenceLink

dataLink

internalDataLink

GIANT

HIP

Hip Circumference

Anthropometry

Shungin et al. 2015

url to reference

url to data

local path to data

The Consortium and outcome names must correspond to the name of the summary statistic files and covariance columns. The last four columns can be left blank if the user doesn't want to run JASS on a server.

GWAS results files

GWAS results files in the tabular format by chromosome (tab separated) all in the same folder with the following columns with the same header:

rsID

pos

A0

A1

Z

rs6548219

30762

A

G

-1.133

The name of file MUST follow this pattern : "z_{CONSORTIUM}_{TRAIT}_chr{chromosome number}.txt". The consortium and the trait must be capitalized and must NOT contain _ .

Covariance file (OPTIONAL)

A covariance file that corresponds to the covariance between traits under H0. This file is a tab-separated tabular file.

We recommend that this covariance file to be computed using the LDScore regression However, this step can be fastidious and if not provided by the user, a matrix will be inferred from low signal zscore.

The traits names (columns and row names of the matrix) must correspond to the summary statistic file names: z_{CONSORTIUM}_{TRAIT}. You can see below an example subset that illustrates this format:

PHE C4D_CHD CARDIOGRAM_CHD  DIAGRAM_T2D GABRIEL_ASTHMA  GEFOS_BMD-FOREARM   GEFOS_BMD-NECK
C4D_CHD 1.0593  0.0351  0.0548  0.085   -0.0061
CARDIOGRAM_CHD  0.0351  1.0256  0.0631  0.025   -0.0002
DIAGRAM_T2D 0.0548  0.0631  1.0136  0.0382  0.0048
GABRIEL_ASTHMA  0.085   0.025   0.0382  1.0134  -0.0104
GEFOS_BMD-FOREARM   -0.0061     -0.0002     0.0048  -0.0104     1.0123

Region file

Region file of approximately independant LD regions to the BED file. For european ancestry and grch37/hg19, we suggest to use the regions as defined by [BP15], which is already available in the data folder of the package.

For grch38, we computed these regions for the five superpopulation available in 1000G using Big SNPR [Pri21]. The corresponding files are stored at <https://gitlab.pasteur.fr/statistical-genetics/jass_suite_pipeline/-/tree/pipeline_ancestry/input_files>`_.

chr

start

stop

chr1

10583

1892607

For inferring approximately independant LD regions from your own panel we recommend using https://privefl.github.io/bigsnpr/ . See [Pri21] on the matter.

Init table generation

The init table generation is performed using JASS with the previously-mentionned input files. Below is a command line example that illustrates the syntax to generate this init table.

# configure
export JASS_DATA_DIR=/tmp/JASSDATA
# import GWAS data into JASS
jass create-inittable --input-data-path "GWAS/*.txt" --init-covariance-path "COV.txt" --regions-map-path "regions.txt" --description-file-path "description.txt" --init-table-path "inittable.hdf5"

How to generate input data for JASS

Option 1 nextflow pipeline :

Preprocessing steps for JASS (data harmonisation and imputation)have been gathered in one nextflow pipeline : JASS pipeline Suite. While this option might have stronger installation requirements, it ensure reproducibility by leveraging docker containers (fixed version of JASS and accompanying packages). It will also be much more efficient is you a large number of heterogeneous data to handle and a computing cluster available.

Option 2 manually prepare input data:

To standardize the format of the input GWAS datasets, you can use the JASS Pre-processing package. The JASS Pre-processing documentation details the use of this tool.

We think that the best way to compute such covariance from summary statistics is to use the LD-score regression (https://github.com/bulik/ldsc/wiki/Heritability-and-Genetic-Correlation). In the output of the LD-score genetic correlation use the intercept (intercept heritability of trait i for variance of trait i and intercept of the genetic covariance for the covariance between the two traits):

Heritability of phenotype 1
---------------------------
Total Observed scale h2: 0.0674 (0.0091)
Lambda GC: 1.105
Mean Chi^2: 1.147
Intercept: 1.0234 (0.0098)
Ratio: 0.159 (0.067)

Heritability of phenotype 2/2
-----------------------------
Total Observed scale h2: 0.1412 (0.0083)
Lambda GC: 1.0466
Mean Chi^2: 1.1987
Intercept: 0.7664 (0.0107)
Ratio < 0 (usually indicates GC correction).

Genetic Covariance
------------------
Total Observed scale gencov: 0.0089 (0.0044)
Mean z1*z2: 0.003
Intercept: -0.0151 (0.0062)

Data imputation (optional)

using RAISS. See [JSPA19] on the method details.

Creation of the JASS inittable

Once, GWAS summary statistics are harmonized, they are integrated into one file by the using jass command line (see detail in command line usage)

jass create-inittable --input-data-path "harmonized_GWAS_files/*.txt" --init-covariance-path $path1/Covariance_matrix_H0.csv --regions-map-path $path2/Region_file.bed --description-file-path $path3/Data_summary.csv --init-table-path $path4/init_table_EUR_not_imputed.hdf5
[BP15]

Tomaz Berisa and Joseph K. Pickrell. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics, 32(2):283–285, 2015. doi:10.1093/bioinformatics/btv546.

[JLM+21]

Hanna Julienne, Vincent Laville, Zachary R McCaw, Zihuai He, Vincent Guillemot, Carla Lasry, Andrey Ziyatdinov, Cyril Nerin, Amaury Vaysse, Pierre Lechat, and others. Multitrait gwas to connect disease variants and biological mechanisms. PLoS genetics, 17(8):e1009713, 2021.

[JLG+20]

Hanna Julienne, Pierre Lechat, Vincent Guillemot, Carla Lasry, Chunzi Yao, Robinson Araud, Vincent Laville, Bjarni Vilhjalmsson, Hervé Ménager, and Hugues Aschard. Jass: command line and web interface for the joint analysis of gwas results. NAR Genomics and Bioinformatics, 2(1):lqaa003, 2020.

[JSPA19]

Hanna Julienne, Huwenbo Shi, Bogdan Pasaniuc, and Hugues Aschard. RAISS: robust and accurate imputation from summary statistics. Bioinformatics, 35(22):4837–4839, 06 2019. URL: https://doi.org/10.1093/bioinformatics/btz466, arXiv:https://academic.oup.com/bioinformatics/article-pdf/35/22/4837/30706731/btz466.pdf, doi:10.1093/bioinformatics/btz466.

[Pri21] (1,2)

Florian Privé. Optimal linkage disequilibrium splitting. Bioinformatics, 38(1):255–256, 07 2021. URL: https://doi.org/10.1093/bioinformatics/btab519, arXiv:https://academic.oup.com/bioinformatics/article-pdf/38/1/255/41891000/btab519.pdf, doi:10.1093/bioinformatics/btab519.

[SMenagerB+23]

Yuka Suzuki, Hervé Ménager, Bryan Brancotte, Raphaël Vernet, Cyril Nerin, Christophe Boetto, Antoine Auvergne, Christophe Linhard, Rachel Torchet, Pierre Lechat, and others. Trait selection strategy in multi-trait gwas: boosting snps discoverability. bioRxiv, 2023.