Skip to content

Latest commit

 

History

History
68 lines (48 loc) · 4.92 KB

database.md

File metadata and controls

68 lines (48 loc) · 4.92 KB

Database

The structure of the database looks like this:

Test

General structure

The database is structured in a way so the data is normalized as much as possible. Indexes are used to speed up queries on large tables like the Concepts. GORM is used to access the Database so the definition of the tables as well as the relations between them are defined in the GORM Models.

The CodeSystem table defines the different codeSystems that can be used in the project. The Concept table holds the individual codes of a CodeSystem. The Project table defines a central entity in the database. A project can have multiple ProjectPermissions which define the access rights of a User to a project. A Project contains Mappings which describe an n:m relationship of codes from one codeSystem to others. The n:m Mappings are internally represented in the database using the Element table which maps a concept to a specific code_system_role within a Mapping. Which kind of Concepts of a CodeSystem are in a relationship to other Concepts of another CodeSystems is described in the CodeSystemRole table. They are defined individually for each project and each can have a type of source or destination and an association to one specific CodeSystem.

Indexes

Indexes are created to allow efficient searches on the database. In most cases default B-Tree indexes are used.

  • codeSystemID in Concept: When providing a suggestion Concept for the user while mapping, only concepts in the current codeSystem are relevant so it has to be searched for them
  • projectID in Mapping: To show all mappings for a project this index is used
  • projectID in code_system_role: When requesting project details the CodeSystemRoles for the project need to be queried
  • projectID in project_permission: When requesting project details, the Project Permission of all users for the project need to be queried
  • mappingID in element: To get all mappings of a project, all elements of a mapping have to be queried

The Concepts table contains a huge amount of elements as CodeSystems like Loinc can have over 50.000 elements. Therefore efficiently searching in this table by utilizing indexes is crucial. Concepts have the fields code and display which are searchable.

code is usually a number or a lexical identifier. When a user searches, the values must start with the specified code and can end with anything (--> Prefix search). It should also be case insensitive so it doesn't matter if lexical parts are upper- or lowercase. The search is implemented in Go as follows (see conceptQuery.go):

query.Where("LOWER(code) ILIKE ?", strings.ToLower(code)+"%")

For the concept table, the following index is created (see concept.go). To get more information why this index was chosen, see Postgres: Operator Classes and Operator Families.

func CreateConceptIndex(db *gorm.DB) error {
	return db.Exec("CREATE INDEX IF NOT EXISTS code_idx ON concepts (LOWER(code) text_pattern_ops);").Error
}

For searching the meaning, the tsvector is used. The implementation can be found in concept.go.

type Concept struct {
	ID                  uint64 `gorm:"primarykey"`
	Code                string
	Display             string
	CodeSystemID        uint32 `gorm:"index"`
	Elements            []Element
	CodeSystem          CodeSystem
	DisplaySearchVector string `gorm:"type:tsvector GENERATED ALWAYS AS (to_tsvector('english', display)) STORED ;index:,type:gin"`
}

// DisplaySearchVector is AutoGenerated and should not be updated
func (c *Concept) BeforeCreate(tx *gorm.DB) (err error) {
	tx.Statement.Omit("DisplaySearchVector")
	return
}

The DisplaySearchVector is important here. It is a autogenerated column and it saves the tsvector of the display field. The tsvector is used to search for partial matches in the meaning field. It is important to add the Omit("DisplaySearchVector") in the BeforeCreate function to prevent the DisplaySearchVector from being updated by Gorm (Postgres itself handles updating this field). More information on how Full-Text Search works in Postgres can be found in this article.

The search is implemented in go as follows (see conceptQuery.go):

formattedMeaning := strings.Join(strings.Fields(meaning), ":* & ") + ":*" // Adjust for partial matches
query = query.Where("display_search_vector @@ to_tsquery('english', ?)", formattedMeaning)

Currently, the language of the tsvector is hardcoded to english. This should be improved in the future so for each CodeSystem the language of the tsvector can be set.