Run analyses

GenomeHubs supports importing the results of analyses such as InterProScan into Ensembl databases to add functional annotations to imported assemblies. Docker container images are provided to allow these analyses to be run with the correct settings, but the analyses can also be run in whichever way is best suited to your compute infrastructure, provided the output files have the required format.

Blastp against SwissProt

Download and unzip the latest SwissProt database:

$ mkdir -p ~/genomehubs/external_files && cd ~/genomehubs/external_files
$ wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz
$ gunzip uniprot_sprot.fasta.gz

Format the SwissProt BLAST database:

$ docker run --rm \
             -u $UID:$GROUPS \
             --name uniprot_sprot-makeblastdb \
             -v ~/genomehubs/external_files:/in \
             -v ~/genomehubs/external_files:/out \
             genomehubs/ncbi-blast:latest \
             makeblastdb -dbtype prot -in /in/uniprot_sprot.fasta -out /out/uniprot_sprot.fasta -parse_seqids -hash_index

Run blastp:

$ mkdir -p ~/genomehubs/v1/download/data/blastp
$ docker run --rm \
             -u $UID:$GROUPS \
             --name Operophtera_brumata_Obru1-blastp \
             -v ~/genomehubs/v1/download/data/sequence:/query \
             -v ~/genomehubs/v1/download/data/blastp:/out \
             -v ~/genomehubs/external_files:/db \
             blaxterlab/ncbi-blast:latest \
             blastp -query /query/Operophtera_brumata_Obru1.proteins.fa.gz \
                    -db /db/uniprot_sprot.fasta \
                    -evalue 1e-10 \
                    -num_threads 16 \
                    -outfmt '6 std qlen slen stitle btop' \
                    -out /out/Operophtera_brumata_Obru1.proteins.fa.blastp.uniprot_sprot.1e-10.tsv

Run InterProScan

Modify InterProScan configuration to suit your system:

  • Edit interproscan.properties and change the maxnumber.of.embedded.workers values to match your number of threads (eg: 16)
$ mkdir -p ~/genomehubs/external_files && cd ~/genomehubs/external_files
$ wget https://raw.githubusercontent.com/blaxterlab/interproscan-docker/master/interproscan.properties
$ nano interproscan.properties
number.of.embedded.workers=1
maxnumber.of.embedded.workers=16

Run InterProScan:

$ mkdir -p ~/genomehubs/v1/download/data/interproscan
$ docker run --rm \
           -u $UID:$GROUPS \
           --name Operophtera_brumata_Obru1-interproscan \
           -v ~/genomehubs/v1/download/data/interproscan:/dir \
           -v ~/genomehubs/v1/download/data/sequence:/in \
           -v ~/genomehubs/external_files/interproscan.properties:/interproscan-5.22-61.0/interproscan.properties \
           genomehubs/interproscan:latest \
           interproscan.sh -i /in/Operophtera_brumata_Obru1.proteins.fa.gz \
                           -d /dir \
                           -appl PFAM,SignalP_EUK \
                           -goterms \
                           -dp \
                           -pa \
                           -f TSV

Run RepeatMasker

Clone the GenomeHubs RepeatMasker Docker repository:

$ mkdir -p ~/genomehubs/external_files && cd ~/genomehubs/external_files
git clone https://github.com/genomehubs/repeatmasker-docker.git
cd repeatmasker-docker

Download a copy of the latest RepeatMasker libraries from RepBase:

$ wget --user your_username \
       --password 12345 \
       -O repeatmaskerlibraries.tar.gz \
       http://www.girinst.org/server/RepBase/protected/repeatmaskerlibraries/RepBaseRepeatMaskerEdition-20170127.tar.gz

Build the Docker image:

$ docker build -t repeatmasker .

Run RepeatMasker:

$ mkdir -p ~/genomehubs/v1/download/data/repeatmasker
$ docker run --rm \
           -u $UID:$GROUPS \
           --name Operophtera_brumata_Obru1-repeatmasker \
           -v ~/genomehubs/v1/download/data/sequence:/in \
           -v ~/genomehubs/v1/download/data/repeatmasker:/out \
           -e ASSEMBLY=Operophtera_brumata_Obru1.scaffolds.fa.gz \
           -e NSLOTS=16 \
           -e SPECIES=arthropoda \
           repeatmasker

Run CEGMA

Run CEGMA:

  • TODO: allow use of multiple threads
$ mkdir -p ~/genomehubs/v1/download/data/cegma
$ docker run --rm \
           -u $UID:$GROUPS \
           --name Operophtera_brumata_Obru1-cegma \
           -v ~/genomehubs/v1/download/data/sequence:/in \
           -v ~/genomehubs/v1/download/data/cegma:/out \
           -e ASSEMBLY=Operophtera_brumata_Obru1.scaffolds.fa.gz \
           genomehubs/cegma:latest

Run BUSCO

Clone the GenomeHubs BUSCO Docker repository:

$ mkdir -p ~/genomehubs/external_files && cd ~/genomehubs/external_files
git clone https://github.com/genomehubs/busco-docker.git
cd busco-docker

Fetch BUSCO lineages:

$ wget http://busco.ezlab.org/v2/datasets/eukaryota_odb9.tar.gz

Build the Docker image:

$ docker build -t busco .

Run BUSCO:

$ mkdir -p ~/genomehubs/v1/download/data/busco
$ docker run --rm \
           -u $UID:$GROUPS \
           --name Operophtera_brumata_Obru1-busco \
           -v ~/genomehubs/v1/download/data/sequence:/in \
           -v ~/genomehubs/v1/download/data/busco:/out \
           -e ASSEMBLY=Operophtera_brumata_Obru1.scaffolds.fa.gz \
           busco -l eukaryota_odb9 -m genome -c 16 -sp fly

results matching ""

    No results matching ""