Skip to content
Andrew Olson edited this page Apr 17, 2018 · 1 revision

The maps collection holds the top level assembly metadata for each genome. The information is extracted from Ensembl Core databases. For example, this is the entry for Arabidopsis thaliana as of Gramene release 56

  {
    "_id": "GCA_000001735.1",                   # the INSDC identifier
    "db": "arabidopsis_thaliana_core_56_91_11", # name of the mysql database
    "taxon_id": 3702,                           # NCBI taxonomy id
    "system_name": "arabidopsis_thaliana",      # name used in the ensembl browser
    "display_name": "Arabidopsis thaliana",     # nicely formatted name
    "type": "genome",                           # the type of map
    "length": 119667750,                        # total length of the map (doesn't include UNANCHORED pieces)
    "regions": {
      "names": [                                # the order of the top level pseudomolecules is given in the karyotype table
        "1",
        "2",
        "3",
        "4",
        "5",
        "Mt",
        "Pt"
      ],
      "lengths": [
        30427671,
        19698289,
        23459830,
        18585056,
        26975502,
        366924,
        154478
      ]
    },
    "num_genes": 34262                          # total number of genes annotated on this genome
  }

The map information is used to define bins for indexing annotations by https://github.com/warelab/gramene-bins-client

Clone this wiki locally