Bioinfo Intro
Bioinformatic Data Introduction
Here I provide data relevant to bioinformatics and biological sequence analysis in JSON format. These data conform to http://schema.bravais.net/bioJSON.schema which is synonymous with http://schema.biojson.org/bioJSON.schema.
Goals
The goal of bioJSON is to assemble pragmatic biological data into a single place. The idea is to provide pure data in a format easily parsed by computers and easily assembled by human beings. By requiring that the data conform entirely to the bioJSON schema, bioJSON hopes to provide a mechanism for verifying the data, making the data available for use to as many people as possible, and making the data easy to navigate.
The success of wikis come from their enabling many people to each make small contributions to a given branch of knowledge. Currently, no widespread mechanism exists for people to contribute pure data in a manner analogous to a wiki. Some attempts, such as IMEx, limit the potential for widespread contribution because (1) the knowledge domain is highly specific, and (2) the data format is exceedingly rigid and complex.
bioJSON hopes to encourage the submission and centralization of pragmatic and usable data by providing a means to contribute a wide range of data in a flexible and self-describing format.
JSON
The reasons for choosing json as the serialization format should be obvious. But, if you want an opinionated rant: XML is too verbose, complicated, redundant, and is usually overkill for most types of data. On the other extreme is the ini file, which has its merits, but is too limited for these purposes. Json encoders/decoders are available for just about every platform and programming language. See the JSON page for details.
Note that I have ensured that these data structures are also valid YAML format, which is another killer serialization format.
bioJSON
The bioJSON schema is
{ "description" : "A bioJSON data object",
"type" : {
"name" : { "type" : "string",
"unique" : "true" },
"description" : { "type" : "string" },
"accession" : {"type" : "string" },
"version" : {"type" : "integer"},
"revision" : {"type" : "integer"},
"license" : { "type" : "string" },
"author" : { "type" : {"$ref":"person_identity"},
"optional" : true}
"date" : { "type" : ["integer", "string", {"$ref":"date"}],
"optional" : true },
"copyright" : { "type" : "string",
"optional" : true },
"references" : { "type" : [ [{"type":"string"} |
{"type": {"$ref":"reference"}}] ],
"optional" : true }
"data" : { "type" : {} }
}
}
In the above schema,