Skip to content

anzaika/parse_flybase_annotation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Purpose
=======

Parse flybase gene annotation format.

Source Data
===========

Is a multiline text file, where each line has 11 tab separated columns.

 0  | unknown      | 585             |
 1  | mrna_id      | CG11023-RA      |
 2  | chromosome   | chr2L           |
 3  | strand       | +               |
 4  | mrna_start   | 7528            | \ -- with UTRs
 5  | mrna_stop    | 9491            | /
 6  | mrna_c_start | 7679            | \ -- without UTRs
 7  | mrna_c_stop  | 9276            | /
 8  | seg_count    | 3               |
 9  | seg_starts   | 7528,8228,8667, |
 10 | seg_stops    | 8116,8589,9491, |

gene_id can be inferred from mrna_id: CG11023-RA --> 11023.


Output
======

Two collections: mrnas and segments. Segments have their UTRs cut out.

Blueprints:

mrna:

* mrna_id     (String)
* gene_id     (String)
* strand      (String)  -- '+' or '-'
* chromosome  (String)
* segments    (Array[Array])  -- coordinates of segments, e.g.: [[1,2],[3,4]]
                                 sorted 5' --> 3'

segment:

* start      (Fixnum)
* stop       (Fixnum)
* chromosome (String)
* mrnas      (Array[String]) -- array of mrna_ids
* splicing   (String)        -- 'const' or 'alt'

Usage
=====

  require 'parse_flybase_annotation'

  result = ParseFlybaseAnnotation.parse('path_to_annotation_file')

  segments_collection = result['segments']
  mrnas_collection = result['mrnas']

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages