===== The GIAB ftp site ===== ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/ ==== Generaly layout of the ftp site ==== 6 directories and 2 site management files (tree and changelog) are at the top level: |--data/ |--data_indexes/ ftp/ --|--release/ |--technical/ |--tools/ |--changelog_details/ | |--CHANGELOG |--current.tree current.tree describes all the files and their md5sums from the ftp site. CHANGELOG file describes what changes have been made to the ftp site. ==== data directory ==== The data directory contains a subdirectory per individual or trio (NA12878, Ashkenazim, ChineseTrio) from GIAB project. Each individual or trio directory contains series of subdirectories for different data sets that were generated from different platforms. An "analysis" subdirectory was created for each trio and NA12878 containing analysis results from different analysis group. |----platform_specific_sequence_data* |----NA12878/ --------|----analysis/ | | | |----HG002_NA24385_son/platform_specific_sequence_data* ftp/data/ |----AshkenazimTrio/ --|----HG003_NA24149_father/platform_specific_sequence_data* | |----HG004_NA24143_mother/platform_specific_sequence_data* | |----analysis/ | | | |----HG005_NA24631_son/platform_specific_sequence_data* |----ChineseTrio/ -----|----HG006_NA24694-huCA017E_father/platform_specific_sequence_data* |----HG007_NA24695-hu38168_mother/platform_specific_sequence_data* |----analysis ==== data_indexes directory ==== The data_indexes directory contains two types of index files (sequence.index that lists all sequences from particular platform for the individual and their md5) and alignment.index that list all the alignments for the individual and their md5) ftp/data_indexes/sequence.index.* ftp/data_indexes/alignment.index.* The format of sequence.index (if no paired data, column 3 and 4 will be empty) as follow: For fastqs: FASTQ FASTQ_MD5 PAIRED_FASTQ PAIRED_FASTQ_MD5 NIST_SAMPLE_NAME For hdf5: HDF5 HDF5_MD5 NIST_SAMPLE_NAME For SOLiD xsq: XSQ XSQ_MD5 NIST_SAMPLE_NAME For BioNano bnx: BNX BNX_MD5 NIST_SAMPLE_NAME The format of alignment.index: For BAM: BAM BAM_MD5 BAI BAI_MD5 For BioNano XMAP or CMAP: XMAP_CMAP XMAP_CMAP_MD5 ==== release directory ==== The release directory contains sample or trio name directories which contain analysis results sets with versions or dates plus readmes explaining how those data sets were produced. There is "latest" directory under each sample or trio name that contains most recent release. ==== technical directory ==== The technical directory contains subdirectories for other data sets like files for method development, interm data sets etc. === tools directory === The tools directory contains the tools or programs deveopled by the group. === changelog_details directory === The changelog_details directory contain files with details about the site changes. ################################################################## # Data download ################################################################# Aspera users can read more about how to setup aspera download from the following documents: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/AsperaClient_2.6_Linux_UserGuide.pdf ftp://ftp-trace.ncbi.nlm.nih.gov/giab/README.Aspera_Users ftp://ftp-trace.ncbi.nlm.nih.gov/giab/aspera_transfer_guide.pdf