多语言展示
当前在线:921今日阅读:27今日分享:41

转录组和基因组组装质量评估软件之一—BUSCO

BUSCO(Benchmarking Universal Single-Copy Orthologs)评估,利用单拷贝直系同源基因,评估基因组,转录组的组装质量
方法/步骤
1

使用方法:usage: python BUSCO.py -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

2

必须参数:-i FASTA FILE, --in FASTA FILE                        Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set.这个是输入文件,为组装好的文件,可以为基因组,转录组,注释的评估,格式为fasta格式   -o OUTPUT, --out OUTPUT                        Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. WARNING: do not provide a path输出文件的名,不能加路径   -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.                        There are three valid modes:                        - geno or genome, for genome assemblies (DNA)                                          基因组组装                        - tran or transcriptome, for transcriptome assemblies (DNA)                                          转录组组装                        - prot or proteins, for annotated gene sets (protein)                                          注释  -l LINEAGE, --lineage LINEAGE                        Specify location of the BUSCO lineage data to be used.                        Visit http://busco.ezlab.org for available lineages.                                          比对的数据库

3

可选参数:optional arguments:  -c N, --cpu N         Specify the number (N=integer) of threads/cores to use.CPU线程数   -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)比对的e值   -f, --force           Force rewriting of existing files. Must be used when output files with the provided name already exist.覆盖以前生成的文件  -r, --restart         Restart an uncompleted run. Not available for the protein mode重新运行未完成的任务   -sp SPECIES, --species SPECIES                        Name of existing Augustus species gene finding parameters. See Augustus documentation for available options.  --augustus_parameters AUGUSTUS_PARAMETERS                        Additional parameters for the fine-tuning of Augustus run. For the species, do not use this option.                        Use single quotes as follow: '--param1=1 --param2=2', see Augustus documentation for available options.  -t PATH, --tmp PATH   Where to store temporary files (Default: ./tmp)  --limit REGION_LIMIT  How many candidate regions to consider (default: 3)  --long                Optimization mode Augustus self-training (Default: Off) adds considerably to the run time, but can improve results for some non-model organisms  -q, --quiet           Disable the info logs, displays only errors只输出error信息  -z, --tarzip          Tarzip the output folders likely to contain thousands of files压缩输出文件夹  -v, --version         Show this version and exit  -h, --help            Show this help message and exit

4

例子:/USER/xwf/software/busco/BUSCO.py -i ../bridger_out_dir/Bridger.fasta -o L -l /USER/xwf/database/eukaryota_odb9 -m tran -c 30 -f -e 1e-10

5

生成的文件包括run_L(因为上面的例子中,设置了输出前缀为L) 和 tmp,主要看的是run_L里面的short_summary_L.txt,其中S:Single copy D:Duplicated F:Fragmented M:Missing结果中要S+D的值不能太低,因为BUSCO才用的数据库是同源物种的保守蛋白,所以组装出来的结果要有一定数量的同源物种保守蛋白才为最好

注意事项

本人从事生物信息学时间不长,欢迎大家交流和指教

推荐信息