A more detailed description of genome coordinate systems can be found in my earlier post. Note that in this table, I consider zero-based, half-open coordinate conventions to be equivalent to space-counted, zero-start and so do not distinguish them in the table.
Name | Resource Type | Chromosome | 0 vs 1 | Space vs Base | Notes |
---|---|---|---|---|---|
UCSC Genome Browser | Genome Browser | chr1, chr2, .. chrX, chrY, chrM | 1 | Base | Note that when zooming in on the genome browser, the positioning of the tick marks appears to use a space-counted, 0-start convention. As discussed here, this is not the intention. To get the 1-based position, the base coordinate corresponds to the tick mark to its immediate right. |
NCBI Map Viewer | Genome Browser | 1, 2, .. X, Y, MT | 1 | Base | |
Ensembl Location Viewer | Genome Browser | 1, 2, .. X, Y, MT | 1 | Base | |
UCSC BLAT | Web Tool | chr1, chr2, .. chrX, chrY, chrM | 1 | Base | |
NCBI BLAST | Web Tool | chr1, chr2, .. chrX, chrY, chrMT | 1 | Base | |
UCSC Table Browser | Web Tool | chr1, chr2, .. chrX, chrY, chrM | 0 | Space | Output formats are all 0-space formats. However, when specifying the region field is 1-base format. |
BED Format | File Format | chr1, chr2, .. chrX, chrY, chrM | 0 | Space | |
WIG Format | File Format | chr1, chr2, .. chrX, chrY, chrM | 1 | Base | UCSC's Wiggle Track Format |
Galaxy Interval Format | File Format | chr1, chr2, .. chrX, chrY, chrM | 0 | Space | |
GFF/GTF/GFF3 Format | File Format | Depends on context | 1 | Base | Chromosome names depend on resource using the file format |
VCF Format | File Format | Depends on context | 1 | Base | Chromosome just needs to refer to an identifier in a reference genome or can be a contig id. Position 0 and N+1 (where N = chrom length) are used to refer to telomeres. |
UCSC Annotation Files | Data File | chr1, chr2, .. chrX, chrY, chrM | 0 | Space |
5 comments:
It's important to clarify that UCSC uses 0-based coordinates for the underlying data in their databases (e.g. downloads, mySQL, table browser) as well as data you submit via custom tracks. It is only the genome browser on the web that uses (visually) 1-based coordinates.
Hi Casey,
Thanks for the feedback! Definitely worth calling this out as UCSC juggles between the two systems for BLAT results, the browser, BED files, underlying data, etc. I've tried to mention each of these cases here and I added one for the annotation files because of your suggestion.
Regarding your last point, I'm not sure I agree. My take is that UCSC, NCBI and Ensembl are all using a 1-based visualization. For example, this zoom in at Ensembl appears to use a 1-based, base-numbered system. Indeed, the number of bases shown corresponds to the stop - start + 1 you'd expect for this system.
Also, kudos again on your great post on genome coordinate systems!
To make things not too easy, UCSC wiggle files are 1-based:
http://genome.ucsc.edu/goldenPath/help/wiggle.html
Nice catch, Max. I've added it to the table.
Good morning
I think that your blog is very nice! The content is quite useful
Keep up with The outstanding posts.
regards,
Post a Comment