博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
写代码学生物信息[0]: Bionode
阅读量:6939 次
发布时间:2019-06-27

本文共 3446 字,大约阅读时间需要 11 分钟。

Bionode

clipboard.png

Bionode是一系列供生物信息开发者使用的模块化Javascript API和Shell命令。简单的说就是一堆node module和shell命令(由于node自身流数据的特性,将其改编成支持pipe的shell命令十分容易也很高效)。

典型的使用场景如下

# Parse sequences in a fasta file into one JSON object per line, collect the ones that match chr11 and in fasta$ cat genome.fasta | bionode-fasta | grep chr11 | bionode-fasta --write

这个项目是由Repositive.io支持的。Repositive是一个提供人类基因数据库搜索服务的公司,索引了上千个数据库。

clipboard.png

Examples

看到页面给出的例子就懵逼了

# Download all bacteria gff files$ bionode-ncbi download gff bacteria

bacteria gff files看起来是某种细菌的数据,可是,细菌的数据是毛...

GFF File

gff格式是Sanger研究所定义,是一种简单的、方便的对于DNA、RNA以及蛋白质序列的特征进行描述的一种数据格式,比如序列的那里到那里是基因,已经成为序列注释的通用格式,比如基因组的基因预测,许多软件都支持输入或者输出gff格式。目前格式定义的最新版本是版本3。原始定义见

搜索了下,发现这领域大家很喜欢用GFF这种格式,还有v2,v3版。但还都是tab+spec来表示数据,隐隐地感觉到Bionode的价值所在=.=

Installation

安装Nodejs(废话),全局安装bionode,最好把bionode-ncbi也装上,它可以访问NCBI API (e-utils)。

npm install bionode -gnpm install bionode-ncbi -g

NCBI API (e-utils)

NCBI全称是The National Center for Biotechnology Information,美国的。网站一如既往的做的很"低调",数据看起来很全。

国内有

搜索的过程中发现也有人在恨小心翼翼的爬NCBI,。其实NCBI早已给出了API文档,。

ClinVar

ClinVar aggregates information about genomic variation and its relationship to human health.

ClinVar是一个Star-based的记录基因变异与人体健康之间关系的数据库。换句话说就是你某个基因位变异了,80%可能你就是红绿色盲了。不过这里的数据是自愿提交的,所以数据很稀疏。

bionode xxx

装完了bionode了吧,那就可以把它当命令行工具执行一下例子啦

$ bionode ncbi search genome solenopsis invicta{"uid":"2938","organism_name":"Solenopsis invicta","organism_kingdom":"Eukaryota","organism_group":"","organism_subgroup":"Insects","defline":"Solenopsis invicta overview","projectid":49663,"project_accession":"PRJNA49663","status":"Draft","number_of_chromosomes":"0","number_of_plasmids":"0","number_of_organelles":"1","assembly_name":"Si_gnG","assembly_accession":"GCA_000188075.1","assemblyid":244018,"create_date":"2011/02/03 00:00","options":"","weight":"","chromosome_assemblies":"0","scaffold_assemblies":"1","sra_genomes":"0","taxid":13686}

返回值就是个JSON,JSer爽吧!等等,这话是啥意思...solenopsis invicta???

clipboard.png

看来我们是GET了一下这货的基因信息。

{"uid":"2938","organism_name":"Solenopsis invicta","organism_kingdom":"Eukaryota","organism_group":"","organism_subgroup":"Insects","defline":"Solenopsis invicta overview","projectid":49663,"project_accession":"PRJNA49663","status":"Draft","number_of_chromosomes":"0","number_of_plasmids":"0","number_of_organelles":"1","assembly_name":"Si_gnG","assembly_accession":"GCA_000188075.1","assemblyid":244018,"create_date":"2011/02/03 00:00","options":"","weight":"","chromosome_assemblies":"0","scaffold_assemblies":"1","sra_genomes":"0","taxid":13686}

没有chromosomes是闹哪样...让我搜搜人的...bionode ncbi search genome human...

于是悲剧了...它...是...不只是全文搜索...连炭疽都出来了

{"organism_name":"Bacillus anthracis","projectid":12333,"project_accession":"PRJNA12333","status":"Complete","number_of_chromosomes":"1",...}

好吧,让我们只找可以相爱的智人homo sapiens。

clipboard.png

{"uid":"51","organism_name":"Homo sapiens","organism_kingdom":"Eukaryota","organism_group":"","organism_subgroup":"Mammals","defline":"Human genome projects have generated an unprecedented amount of knowledge about human genetics and health.","projectid":9558,"project_accession":"PRJNA9558","status":"Complete","number_of_chromosomes":"48","number_of_plasmids":"0","number_of_organelles":"1","assembly_name":"GRCh38.p8","assembly_accession":"GCA_000001405.23","assemblyid":763971,"create_date":"2001/02/15 00:00","options":"","weight":1000,"chromosome_assemblies":"10","scaffold_assemblies":"31","sra_genomes":"0","taxid":9606}

啊啊啊,请叫我智障。为什么number_of_chromosomes是48啊,不是46吗!!!

就这样吧,让我冷静一下...

转载地址:http://mbsnl.baihongyu.com/

你可能感兴趣的文章
呼号 (CALL SIGN)
查看>>
Visual Studio 2012+jQuery-1.7.1
查看>>
Java对象的序列化和反序列化(转)
查看>>
Appium 在 Android UI 测试中的应用
查看>>
bootstrap-datetimepicker中设置中文
查看>>
tab切换效果
查看>>
石家庄地区招聘php和android开发实习生!
查看>>
PHP 转换snmp的时间格式
查看>>
登录界面 动画背景效果
查看>>
DEV 第三方控件报表分类汇总
查看>>
CSS截取字段,让过长的字段结尾变成省略号(IE有效)
查看>>
ubuntu下php7安装及配置
查看>>
Shiro 设置session超时时间
查看>>
html5-增强的表单-表单的重写
查看>>
sql
查看>>
DirectX9 着色器学习(一)
查看>>
Codeforces 854 B Maxim Buys an Apartment 思维 水题
查看>>
通过1997年拓荒者号飞行器事件理解优先级反转
查看>>
python做数据清洗
查看>>
山东理工OJ【2054】双向链表(两种方法)
查看>>