4. Export files

FASTA and GFF files are required to import data into a GenomeHubs Ensembl, but analyses should be run on files exported from the database to ensure that filenames, headers, etc. are standardised. Additional filetypes can also be exported for use in visualisations, to provide files for bulk download, and to allow submission of an assembly/annotations to ENA.

File export should typically be run twice while setting up a new assembly, initially to export sequences to be used as inputs for analyses and, once the analysis results have been imported, to export a full set of files to provide bulk downloads and for BLAST.

Export sequences

Run the EasyImport Docker container using the -e flag to export sequences:

  • run this step before running analyses

  • sequence files will be written to ~/genomehubs/v1/download/data/sequence

$ docker run --rm \
             -u $UID:$GROUPS \
             --name easy-import-operophtera_brumata_obru1_core_40_93_1 \
             --network genomehubs-network \
             -v ~/genomehubs/v1/import/conf:/import/conf \
             -v ~/genomehubs/v1/import/data:/import/data \
             -v ~/genomehubs/v1/download/data:/import/download \
             -v ~/genomehubs/v1/blast/data:/import/blast \
             -e DATABASE=operophtera_brumata_obru1_core_40_93_1 \
             -e FLAGS="-e" \
             genomehubs/easy-import:19.05

Export all files

Run the EasyImport Docker container with flags to export sequences (-e), gff/embl format features (-f) and json format data for visualisations (-j):

  • run this step after running analyses

  • include the -i flag to index the database in addition to exporting files

  • files will be written to directories under ~/genomehubs/v1/download/data/

  • files ready to format as BLAST databases will be written to ~/genomehubs/v1/blast/data/

  • EMBL format export requires ASSEMBLY.BIOPROJECT and ASSEMBLY.LOCUS_TAG to be defined in the assembly metadata (see Update meta)

$ docker run --rm \
             -u $UID:$GROUPS \
             --name easy-import-operophtera_brumata_v1_core_40_93_1 \
             --network genomehubs-network \
             -v ~/genomehubs/v1/import/conf:/import/conf \
             -v ~/genomehubs/v1/import/data:/import/data \
             -v ~/genomehubs/v1/download/data:/import/download \
             -v ~/genomehubs/v1/blast/data:/import/blast \
             -e DATABASE=operophtera_brumata_obru1_core_40_93_1 \
             -e FLAGS="-e -f -j -i" \
             genomehubs/easy-import:19.05

Last updated