Prior to data submission, please submit a completed and initials SGFS Data Submission Form to Julie Sapp. Please ensure that your data does not include any participant identifiers. Please note files must be submitted in variant call (.vcf) format and should be restricted to the coding/splice regions of the genes included in the ACMG list of genes for return of secondary variants. See below for additional details:
VCF (Variant Call Format) file requirements
- VCF file
- If there is only one sample, a single VCF file for the patient is acceptable.
- If there are more than one sample, a single multi-sample VCF for all samples to be analyzed is acceptable. Please do not send individual VCF files.
- Bioinformatics pipeline
- We strongly recommend the VCF file to be generated using the GATK best practices pipeline.
- For other pipelines, please check recommended QC filter to exclude low quality variants.
- Reference genome
- GRCh37, hg19 or other variation of reference human genome GRCh37/hg19 (eg. b37) is acceptable.
- Please note that hg38 is not accepted and will not be processed. If the VCF file is in hg38 coordinates, 1) lift over the coordinates to GRCh37/hg19, then 2) proceed to “pre-processing a VCF file” section.
Pre-processing a VCF file
- VCF file must be restricted to ACMG 59 genes. Coordinates are provided as a BED file ACMG_59_isplice2_esplice2.bed. Please check the compatibility of chromosome notations (i.e. “chr1” vs “1”) when using the BED file for subsetting the VCF file.
- Filter the VCF file to only include high quality (“PASS”) variants. If GATK pipeline was not used, please filter variants to only include high quality variants suggested by the method used.
Mode of Transferring files
- For NIH investigators, please use NIH Secure Email to send the VCF file to henoke.shiferaw@nih.gov.
- For non-NIH investigators, Globus is available to any investigator sending files to NIH (No license required). A Globus endpoint will be shared where the file can be transferred to.