The structure of the database looks like this:
The database is structured in a way so the data is normalized as much as possible. Indexes are used to speed up queries on large tables like the Concepts. GORM
is used to access the Database so the definition of the tables as well as the relations between them are defined in the GORM
Models.
The CodeSystem
table defines the different codeSystems that can be used in the project. The Concept
table holds the individual codes of a CodeSystem
. The Project
table defines a central entity in the database. A project can have multiple ProjectPermissions
which define the access rights of a User
to a project. A Project
contains Mappings
which describe an n:m
relationship of codes from one codeSystem to others. The n:m
Mappings are internally represented in the database using the Element
table which maps a concept to a specific code_system_role within a Mapping
. Which kind of Concepts
of a CodeSystem
are in a relationship to other Concepts
of another CodeSystems
is described in the CodeSystemRole
table. They are defined individually for each project and each can have a type of source
or destination
and an association to one specific CodeSystem
.
Indexes are created to allow efficient searches on the database. In most cases default B-Tree
indexes are used.
codeSystemID
inConcept
: When providing a suggestionConcept
for the user while mapping, only concepts in the current codeSystem are relevant so it has to be searched for themprojectID
inMapping
: To show all mappings for a project this index is usedprojectID
incode_system_role
: When requesting project details the CodeSystemRoles for the project need to be queriedprojectID
inproject_permission
: When requesting project details, the Project Permission of all users for the project need to be queriedmappingID
inelement
: To get all mappings of a project, all elements of a mapping have to be queried
The Concepts
table contains a huge amount of elements as CodeSystems like Loinc can have over 50.000
elements. Therefore efficiently searching in this table by utilizing indexes is crucial. Concepts have the fields code
and display
which are searchable.
code
is usually a number or a lexical identifier. When a user searches, the values must start with the specified code and can end with anything (--> Prefix search). It should also be case insensitive so it doesn't matter if lexical parts are upper- or lowercase. The search is implemented in Go
as follows (see conceptQuery.go):
query.Where("LOWER(code) ILIKE ?", strings.ToLower(code)+"%")
For the concept table, the following index is created (see concept.go). To get more information why this index was chosen, see Postgres: Operator Classes and Operator Families.
func CreateConceptIndex(db *gorm.DB) error {
return db.Exec("CREATE INDEX IF NOT EXISTS code_idx ON concepts (LOWER(code) text_pattern_ops);").Error
}
For searching the meaning
, the tsvector
is used. The implementation can be found in concept.go.
type Concept struct {
ID uint64 `gorm:"primarykey"`
Code string
Display string
CodeSystemID uint32 `gorm:"index"`
Elements []Element
CodeSystem CodeSystem
DisplaySearchVector string `gorm:"type:tsvector GENERATED ALWAYS AS (to_tsvector('english', display)) STORED ;index:,type:gin"`
}
// DisplaySearchVector is AutoGenerated and should not be updated
func (c *Concept) BeforeCreate(tx *gorm.DB) (err error) {
tx.Statement.Omit("DisplaySearchVector")
return
}
The DisplaySearchVector
is important here. It is a autogenerated column and it saves the tsvector
of the display
field. The tsvector
is used to search for partial matches in the meaning
field. It is important to add the Omit("DisplaySearchVector")
in the BeforeCreate
function to prevent the DisplaySearchVector
from being updated by Gorm (Postgres itself handles updating this field). More information on how Full-Text Search works in Postgres can be found in this article.
The search is implemented in go as follows (see conceptQuery.go):
formattedMeaning := strings.Join(strings.Fields(meaning), ":* & ") + ":*" // Adjust for partial matches
query = query.Where("display_search_vector @@ to_tsquery('english', ?)", formattedMeaning)
Currently, the language of the tsvector
is hardcoded to english
. This should be improved in the future so for each CodeSystem
the language of the tsvector
can be set.