Skip to content

GitHubReader

Matus Backor edited this page Nov 25, 2019 · 4 revisions

Kentico Kontent Docs - GitHub Reader

Overview

GitHub Reader is a microservice responsible for processing code samples stored in GitHub which are displayed on the Kentico Kontent Docs website. The service reacts upon webhooks from GitHub when a code sample is added/modified/removed from the Kentico Kontent Documentation - Code Samples repository. The main responsibility of the service is to extract the samples from GitHub files and store them in the Azure Blob storage so they can be processed by another service - Code Samples Manager, which sends the code samples to Kentico Kontent.

Specification

Triggers

HTTP
Initialize

The initialize endpoint is designed to get and process all of the code samples from GitHub, which is intended to be done by a user to make sure there is no inconsistency between code samples in GitHub repository and code samples in Kentico Kontent.

Update

The GitHub repository is set to send webhooks to the update endpoint when there is a change in the code samples. The GitHub Reader service then processes only the files which are included in the webhook's message body.

How it works

GitHub repository

The process begins when one or more commits are pushed to the Kentico Kontent Documentation - Code Samples repository. All code samples that are meant to be processed by the service need to be in a specific format. The code sample is labeled at the start with a comment consisting of section mark DocSection: followed by an identifier that will be used as a part of a content item's codename in Kentico Kontent. The end of the code sample is marked with a comment consisting of section mark EndDocSection. An example code sample in JavaScript can look like this:

// DocSection: language_fallbacks_ignore
const KenticoKontent = require('kentico-kontent-delivery');

const deliveryClient = new KenticoKontent.DeliveryClient({
    projectId: '975bf280-fd91-488c-994c-2f04416e5ee3',
});

deliveryClient.items()
    .languageParameter('es-ES')
    .equalsFilter('system.language', 'es-ES')
    .getObservable()
    .subscribe(response => console.log(response.items));
// EndDocSection

Any other content outside of the DocSection and EndDocSection section will be ignored by the GitHub sync service.

GitHub Reader service

The GitHub Reader service receives through the update endpoint the GitHub webhook containing names of the files that were added/modified/removed. The first thing the service does is optimize the list of the files from all the commits that are included in the webhook i.e., when a file is added in the first commit and then removed in the last commit, the service will not process this file, or when a file is added and then modified, the service treats the file as if it was only added. After the filenames are grouped by the operation, each group is processed separately.

All of the added files are fetched using GitHubClient, parsed by FileParser which extracts all of the code samples from the file that are marked with the special labels and then a new CodeFile entity containing the file path and code samples is stored in the Azure Table storage by the CodeFileRepository. This entity is stored in order to keep the history of the file's content, so when there is a commit with modified file, the code samples can be compared and only the ones which changed in the file will be further processed.

The modified and deleted files from GitHub are similarly fetched by GitHub client, parsed by FileParser and stored in the Azure Table storage as new entities. The new files are then compared with the old files in the storage to find out which code samples in the files were modified/deleted and process only those.

Every code sample moreover includes additional information:

  • Language - indicates in what language the code sample is written, the language is retrieved from the file extension, it is used on the web for syntax highlighting

  • Platform - indicates what platform is the code sample suited for, it is retrieved from the root folder on GitHub in which the code sample is located and used for platform picker on the website

  • Codename - constructed from the identifier of the code sample and the platform, used as a codename of a content item in Kentico Kontent

When all code samples that were added/modified/deleted are gathered, they are all stored in a single blob to the Azure Blob storage by the EventDataRepository. The blob contains the code samples and a property mode to indicate which endpoint initialize or update was triggered and the code samples moreover get additional property status that specifies whether the sample was added, modified or deleted. After the blob is stored in the storage, an event in the EventGrid is automatically created and the processing of the samples continues in another Azure function - CodeSamplesManager.

Example blob stored by GitHub Reader and consumed by CodeSamplesManager:

{
  "codeFragments": [
    {
      "identifier": "structure_in_rte_retrieve_article",
      "content": "require 'delivery-sdk-ruby'\n\ndelivery_client.item('coffee_beverages_explained').execute do |response|\n  item = response.item\n  text = item.get_string('body')\n  puts text\nend",
      "language": "ruby",
      "platform": "ruby",
      "status": "modified",
      "codename": "structure_in_rte_retrieve_article_ruby"
    }
  ],
  "mode": "update"
}