===== The GIAB ftp site =====

	ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/ 



==== Generaly layout of the ftp site ====


6 directories and 2 site management files (tree and changelog) are at the top level:

     	|--data/
     	|--data_indexes/
ftp/  --|--release/
     	|--technical/
     	|--tools/
     	|--changelog_details/
     	|
     	|--CHANGELOG
     	|--current.tree


current.tree describes all the files and their md5sums from the ftp site.
CHANGELOG file describes what changes have been made to the ftp site.



==== data directory ====

The data directory contains a subdirectory per individual or trio (NA12878, Ashkenazim, ChineseTrio) from GIAB project. Each individual or trio directory contains series of subdirectories for different data sets that were generated from different platforms. An "analysis" subdirectory was created for each trio and NA12878 containing analysis results from different analysis group.



			          |----platform_specific_sequence_data*
	   |----NA12878/  --------|----analysis/
	   |
	   |
	   |			  |----HG002_NA24385_son/platform_specific_sequence_data*
ftp/data/  |----AshkenazimTrio/ --|----HG003_NA24149_father/platform_specific_sequence_data*
	   |			  |----HG004_NA24143_mother/platform_specific_sequence_data*
	   |			  |----analysis/
 	   |
 	   |
	   |		 	  |----HG005_NA24631_son/platform_specific_sequence_data*
	   |----ChineseTrio/ -----|----HG006_NA24694-huCA017E_father/platform_specific_sequence_data*
				  |----HG007_NA24695-hu38168_mother/platform_specific_sequence_data*
				  |----analysis




==== data_indexes directory ====

The data_indexes directory contains two types of index files (sequence.index that lists all sequences from particular platform for the individual and their md5) and alignment.index that list all the alignments for the individual and their md5)

ftp/data_indexes/sequence.index.*
ftp/data_indexes/alignment.index.*


The format of sequence.index (if no paired data, column 3 and 4 will be empty) as follow:
For fastqs:
FASTQ   FASTQ_MD5       PAIRED_FASTQ    PAIRED_FASTQ_MD5        NIST_SAMPLE_NAME

For hdf5:
HDF5    HDF5_MD5                        NIST_SAMPLE_NAME

For SOLiD xsq:
XSQ     XSQ_MD5                 NIST_SAMPLE_NAME

For BioNano bnx:
BNX	BNX_MD5			NIST_SAMPLE_NAME


The format of alignment.index:
For BAM:
BAM     BAM_MD5 BAI     BAI_MD5

For BioNano XMAP or CMAP:
XMAP_CMAP	XMAP_CMAP_MD5



==== release directory ==== 

The release directory contains sample or trio name directories which contain analysis results sets with versions or dates plus readmes explaining how those data sets were produced.  There is "latest" directory under each sample or trio name that contains most recent release.


==== technical directory ==== 

The technical directory contains subdirectories for other data sets like files for method development, interm data sets etc.



=== tools directory ===

The tools directory contains the tools or programs deveopled by the group.



=== changelog_details directory  ===

The changelog_details directory contain files with details about the site changes.




##################################################################
# Data download
#################################################################
Aspera users can read more about how to setup aspera download from the following documents:

ftp://ftp-trace.ncbi.nlm.nih.gov/giab/AsperaClient_2.6_Linux_UserGuide.pdf  
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/README.Aspera_Users  
ftp://ftp-trace.ncbi.nlm.nih.gov/giab/aspera_transfer_guide.pdf