Skip to content

Latest commit

 

History

History
75 lines (63 loc) · 5.07 KB

README.md

File metadata and controls

75 lines (63 loc) · 5.07 KB

eva-seqcol

This is a Java implementation of the sequence collection specification to represent INSDC assemblies. To learn more about the sequence collection specification, please refer to seqCol, seqcol-spec and/or the specification.

Briefly, the main issue that the seqcol-spec addresses is that genomes' central providers such as INSDC (e.g. NCBI, ENA), Ensembl or UCSC may agree on the sequence being used but they often differ on the naming of these sequences.

Project Goals

The main goals of this API is to provide:

  • A mechanism to ingest a sequence collection object into the database.
  • A mechanism to fetch/resolve a sequence collection object given its level 0 digest.
  • A mechanism to compare two sequence collection objects to understand their compatibility

Important Workflows

SeqCol Ingestion Workflow

Screenshot from 2023-08-23 01-57-11

Data Model

After multiple evaluations of different data models, we agreed to use the following model : Screenshot from 2023-08-23 00-56-56

Endpoints

Note: the seqCol service is currently deployed on server 45.88.81.158, under port 8081

Exposed endpoints

  • PUT - SERVER_IP:PORT/eva/webservices/seqcol/admin/seqcols/{asm_accession}
  • GET - SERVER_IP:PORT/eva/webservices/seqcol/collection/{seqCol_digest}?level={level}
  • GET - SERVER_IP:PORT/eva/webservices/seqcol/comparison/{seqColA_digest}/{seqColB_digest}
  • POST - SERVER_IP:PORT/eva/webservices/seqcol/comparison/{seqColA_digest}; body = {level 2 JSON representation of another seqCol}

Usage and description

For a detailed, user friendly documentation of the API's endpoints, please visit the seqCol's swagger page

Compile

This web service has some authenticated endpoints. The current approach to secure them is to provide the credentials in the src/main/resources/application.properties file at compilation time, using maven profiles.

The application also requires to be connected to an external database (PostgreSQL by default) to function. The credentials for this database need to be provided at compilation time using the same maven profiles.

You can edit the maven profiles values in pom.xml by locating the below section and changing the values manually or by setting environemnt variables. Alternatively, you can make the changes directly on the application.properties file.

Use <ftp.proxy.host> and <ftp.proxy.port> to configure proxy settings for accessing FTP servers (such as NCBI's). Set them to null and 0 to prevent overriding default the proxy configuration.

Set a boolean flag using <contig-alias.scaffolds-enabled> to enable or disable parsing and storing of scaffolds in the database.

 <profiles>
	<profile>
		<id>seqcol</id>
		<properties>
			<spring.profiles.active>seqcol</spring.profiles.active>
			<seqcol.db-url>jdbc:postgresql://${env.SERVER_IP}:${env.POSTGRES_PORT}/seqcol_db</seqcol.db-url>
			<seqcol.db-username>${env.POSTGRES_USER}</seqcol.db-username>
			<seqcol.db-password>${env.POSTGRES_PASS}</seqcol.db-password>
			<seqcol.ddl-behaviour>${env.DDL_BEHAVIOUR}</seqcol.ddl-behaviour>
			<seqcol.admin-user>${env.ADMIN_USER}</seqcol.admin-user>
			<seqcol.admin-password>${env.ADMIN_PASSWORD}</seqcol.admin-password>
			<ftp.proxy.host>${optional default=null}</ftp.proxy.host>
			<ftp.proxy.port>${optional default=0}</ftp.proxy.port>
			<contig-alias.scaffolds-enabled>${optional default=false}</contig-alias.scaffolds-enabled>
		</properties>
		<activation>
			<activeByDefault>true</activeByDefault>
		</activation>
	</profile>
 </profiles>

Once that's done, you can trigger the variable replacement with the -P option in maven. Example: mvn clean install -Pseqcol to compile the service including tests or mvn clean install -Pseqcol -DskipTests to ignore tests.

You can then run: mvn spring-boot:run to run the service.

Technologies used

  • Spring Boot v2.7.13
  • PostgreSQL Database v15.2
  • Swagger v3 (springdoc-openapi implementation)

Useful Links