export createConceptCountsTable() #1067

cebarboza · 2023-08-08T14:38:48Z

At Darwin EU we are trying to run CohortDiagnostics in a more efficient way when running it for multiple studies. We believe it would be a good enhancement to export the function createConceptCountsTable() to generate the concept_counts before executing the diagnostics.

By doing this, we can perform this calculation just once for a specific vocabulary_version, instead of repeating this process for each study.

In our fork darwin-eu-dev/CohortDiagnostics, the user can createConceptCountsTable(). This table is saved in the cohortDatabaseSchema.

CohortDiagnostics::createConceptCountsTable(connectionDetails = connectionDetails,
                                            cdmDatabaseSchema = cdmDatabaseSchema,
                                            conceptCountsDatabaseSchema = cohortDatabaseSchema,
                                            conceptCountsTable = "concept_counts",
                                            removeCurrentTable = TRUE)

Then we use the parameter useExternalConceptCountsTable in executeDiagnostics(). If TRUE, executeDiagnostics() uses the concept_counts created previously in the cohortDatabaseSchema. The user should specify the name of the external concept counts table, generally concept_counts

CohortDiagnostics::executeDiagnostics(cohortDefinitionSet =  cohortDefinitionSet,
                                      connectionDetails = connectionDetails,
                                      cohortTable = cohortTable,
                                      cohortDatabaseSchema = cohortDatabaseSchema,
                                      cdmDatabaseSchema = cdmDatabaseSchema,
                                      conceptCountsTable = "concept_counts",
                                      exportFolder = exportFolder,
                                      databaseId = "Eunomia",
                                      minCellCount = 5,
                                      useExternalConceptCountsTable = FALSE)

We also modified the CreateConceptCountTable.sql file, to add a new column with the vocabulary_version.

https://github.com/darwin-eu-dev/CohortDiagnostics/blob/ca6d9074bb097b9ce60b7bc6bf72e68a84f650fe/inst/sql/sql_server/CreateConceptCountTable.sql#L102C1-L106C2

{@table_is_temp} ? {} : { 
ALTER TABLE @work_database_schema.@concept_counts_table
ADD vocabulary_version VARCHAR(20) NULL;
UPDATE @work_database_schema.@concept_counts_table SET vocabulary_version = (SELECT vocabulary_version FROM @cdm_database_schema.vocabulary WHERE vocabulary_id = 'None');
}

Then, there are checks in place that evaluate if the vocabulary_version in the concept_counts table is equal to the version of the database the user is running the diagnostics.

https://github.com/darwin-eu-dev/CohortDiagnostics/blob/ca6d9074bb097b9ce60b7bc6bf72e68a84f650fe/R/RunDiagnostics.R#L679C1-L708C4

 # Defines variables and checks version of external concept counts table -----
  if (useExternalConceptCountsTable == FALSE) {
    conceptCountsTableIsTemp <- TRUE
    if (conceptCountsTable != "#concept_counts") {
      conceptCountsTable <- "#concept_counts"
    }
  } else {
    if (conceptCountsTable == "#concept_counts") {
      stop("Temporary conceptCountsTable name. Please provide a valid external ConceptCountsTable name")
    }
    conceptCountsTableIsTemp <- FALSE
    conceptCountsTable <- conceptCountsTable
    dataSourceInfo <- getCdmDataSourceInformation(connection = connection, cdmDatabaseSchema = cdmDatabaseSchema)
    vocabVersion <- dataSourceInfo$vocabularyVersion
    vocabVersionExternalConceptCountsTable <- DatabaseConnector::renderTranslateQuerySql(
      connection = connection,
      sql = "SELECT DISTINCT vocabulary_version FROM @work_database_schema.@concept_counts_table;",
      work_database_schema = cohortDatabaseSchema,
      concept_counts_table = conceptCountsTable,
      snakeCaseToCamelCase = TRUE,
      tempEmulationSchema = getOption("sqlRenderTempEmulationSchena")
    )
    if (!identical(vocabVersion, vocabVersionExternalConceptCountsTable[1,1])) {
      stop(paste0("External concept counts table (", 
                 vocabVersionExternalConceptCountsTable, 
                 ") does not match database (", 
                 vocabVersion, 
                 "). Update concept_counts with createConceptCountsTable()"))
    }
  }

There's also a vignette to explain how to run this functions UseExternalConceptTable.Rmd. We have been testing this approach but we wanted to discuss this before sending a pull request.

The text was updated successfully, but these errors were encountered:

azimov · 2023-09-19T19:43:14Z

Sorry for the lateness in reply I missed this issue when you first raised it.

This approach seems reasonable, but I wonder if the solution would be better included inside CohortGenerator as an extra step following cohort creation? It seems like this solution might be generally more useful than just for cohort diagnostics - tagging @anthonysena for visibility.

However, I'm happy to look at adopting this change into the main code. Just submit a PR and I will gladly test it out.

cebarboza · 2023-09-21T13:23:32Z

No worries and thanks for the response. I'll send you the PR as soon as possible.

I would argue that the function createConceptCountsTable() is already part of CohortDiagnostics' workflow.

We also used the variable useExternalConceptCountsTable inside CD, so it was more natural for us to develop this, avoiding any major modification.

katy-sadowski · 2024-04-26T20:08:49Z

Did anything ever come out of this? If not, I'd be willing to help out.

I'm curious why useExternalConceptCountsTable is no longer supported? Also wondering if we ever considered providing the option to leverage ACHILLES concept counts table here?

azimov · 2024-05-02T17:24:39Z

Did anything ever come out of this? If not, I'd be willing to help out.

I'm curious why useExternalConceptCountsTable is no longer supported? Also wondering if we ever considered providing the option to leverage ACHILLES concept counts table here?

This was a fork that I never got a PR for. I expect that this could be useful, especially in contexts where the Achilles counts are there. If you're interested in following through I will gladly work to get the PR included.

cebarboza · 2024-06-12T09:34:09Z

Hi @azimov I sent the PR for exporting the function to useExternalConceptCountsTable.

ablack3 · 2024-06-12T09:54:06Z

Also wondering if we ever considered providing the option to leverage ACHILLES concept counts table here?

Good idea @katy-sadowski!

@cebarboza and I started working on the idea to speed up CohortDiagnstics by using the pre-computed concept counts in Achilles.

So instead of building the concepts counts table CohortDiagnostics would have the option to use the data that is already available in Achilles.

Here is a SQL query that would pull the data for

-- note this is postgresql sql

-- Achilles analysis ids used for the concept counts table
-- condition: 400 (persons), 401 (standard concepts), 425 (source concepts)
-- drug: 700 (persons), 701 (standard concepts), 725 (source concepts)
-- procedure: 600 (persons), 601 (standard concepts), 625 (source concepts)
-- measurement: 1800 (persons), 1801 (standard concept), 1825 (source concepts)
-- observation: 800 (persons), 801 (standard concepts), 825 (source concepts)

SELECT
	q1.concept_id, 
	concept_count, 
	concept_subjects
INTO @concept_counts_table
FROM (
	SELECT 
		CAST(stratum_1 as int) as concept_id,
		count_value as concept_count
	FROM results.achilles_results
	WHERE analysis_id IN (401,601,701,801,1801,425,625,725,825,1825) AND stratum_1 != '0'
) q1
LEFT JOIN 
(
	SELECT 
		CAST(stratum_1 as INT) as concept_id,
		count_value as concept_subjects
	FROM results.achilles_results
	WHERE analysis_id IN (400,600,700,800,1800) AND stratum_1 != '0'
) q2
ON q1.concept_id = q2.concept_id

I ran this on IPCI and the CohortDiagnostics sql script on IPCI and compared the results.

df is created by the sql script in cohort diagnostics and df2 is coming from the sql pasted above.

Note there is a huge difference in the number of rows. This is likely because a lot of data in IPCI falls outside the observation period. Achilles does not count concepts outside observation period. Cohort diagnostics does include concepts outside observation period.

Another difference is the inclusion of concept id 0. this concept id occurs in multiple cdm tables and so is repeated in the concept table created by cohort diagnostics. I removed these rows. In the table created by cohort diagnostics there is no way to tell which domain the rows with concept id 0 came from.

@azimov, @katy-sadowski what do you think about adding achilles as an option for getting concept counts in cohort diagnostics?

Also what do you think about making the counts consistent by excluding concepts outside observation period and removing concept id 0?

ablack3 · 2024-06-12T13:25:39Z

I created a pr for use of the achilles tables. Not tested yet but wanted to get some feedback on the approach first. #1120

ablack3 · 2024-09-16T12:56:49Z

Achilles has an optional table that has concept counts in a format that we could possibly use out of the box without any reformmating or copying.

The table is called achilles_result_concept_count and if the optimizeAchillesCache = TRUE then it will be created.

https://github.com/OHDSI/Achilles/blob/ea478a34179f52c13bb798abd41d7ad96d1b1fc2/R/Achilles.R#L145

https://github.com/OHDSI/Achilles/blob/main/inst/sql/sql_server/analyses/create_result_concept_table.sql

I'm not sure yet if this is exactly what we want because I would need to better understand the sql script and the analyses that it is pulling from but it seems like we could possibly use this table as is for concept counts.

cebarboza mentioned this issue Jun 7, 2024

Use external concept counts #1117

Open

ablack3 mentioned this issue Jun 12, 2024

add option to use Achilles for concept counts #1120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export createConceptCountsTable() #1067

export createConceptCountsTable() #1067

cebarboza commented Aug 8, 2023

azimov commented Sep 19, 2023

cebarboza commented Sep 21, 2023

katy-sadowski commented Apr 26, 2024

azimov commented May 2, 2024

cebarboza commented Jun 12, 2024

ablack3 commented Jun 12, 2024 •

edited

Loading

ablack3 commented Jun 12, 2024 •

edited

Loading

ablack3 commented Sep 16, 2024

export createConceptCountsTable() #1067

export createConceptCountsTable() #1067

Comments

cebarboza commented Aug 8, 2023

azimov commented Sep 19, 2023

cebarboza commented Sep 21, 2023

katy-sadowski commented Apr 26, 2024

azimov commented May 2, 2024

cebarboza commented Jun 12, 2024

ablack3 commented Jun 12, 2024 • edited Loading

ablack3 commented Jun 12, 2024 • edited Loading

ablack3 commented Sep 16, 2024

ablack3 commented Jun 12, 2024 •

edited

Loading

ablack3 commented Jun 12, 2024 •

edited

Loading