Updates to documentation

PRIMITIVE-IO · Nov 8, 2021 · f45a753 · f45a753
1 parent f993558
commit f45a753
Show file tree

Hide file tree

Showing 4 changed files with 59 additions and 19 deletions.
diff --git a/Primitive-Documentation-PDF.md b/Primitive-Documentation-PDF.md
@@ -0,0 +1,18 @@
+# Primitive.io Documentation
+
+- For all the information, visit documentation.primitive.io
+
+:[Docker-Setup](./docs/private/docker-setup.md)
+
+:[Home](./home.md)
+:[Setup](./docs/primitive-setup.md)
+
+*:[Requirements](./docs/private/requirements.md)
+*:[Setup](./docs/private/private-setup.md)
+*:[Docker Setup](./docs/private/docker-setup.md)
+*:[Running Scraper in Docker](./docs/private/docker-usage.md)
+*:[Environment File](./docs/private/environment-file.md)
+*:[Volume](./docs/private/volume.md)
+
+*:[Usage](./docs/private/scraper-usage.md)
+*:[Troubleshooting](./docs/private/scraper-troubleshooting.md)
diff --git a/README.md b/README.md
@@ -2,3 +2,4 @@
 
 - For all the information, visit documentation.primitive.io
 - For previewing the documentation locally, run *"docsify serve ./"* while in the root directory.
+- For exporting to PDF, you can use the Yzane Markdown PDF extension in Visual Studio Code. Once installed, use the `Primitive-Documentation-PDF.md` as the source and export to PDF.
diff --git a/docs/private/docker-usage.md b/docs/private/docker-usage.md
@@ -4,7 +4,7 @@ The private scraper is best run inside of Docker. This allows for consistency ac
 
 # Required Configuration
 
-The private scraper relies on two main elements to run properly: a location to store the internal database and an environment file.
+The private scraper relies on two main elements to run properly: a location to store the internal database and an configuration file (config.json). The configuration file will be in the data folder which is used to store data associated with the scraper.
 - See information related to environment file [here](/docs/private/environment-file.md)
 - See information for required volume [here](/docs/private/volume.md)
 
@@ -23,7 +23,7 @@ Once loaded, the image should be viewable with 'docker images.' Once imported wi
 Once the scraper container is pulled and available to the local machine, the container can be ran. Below is an example of the command required for the scraper to execute properly.
 
 ```
-docker run -d -p 3000:3000 --env-file="env.file" -v /tmp/primitive/data:/srv/primitive/data --name primitive_2.0 primitive:latest
+docker run -d -p 443:443 -v /home/scraper/data:/srv/primitive/data -v /var/run/docker.sock:/var/run/docker.sock -v /tmp:/tmp --name primitive_1.0 --restart unless-stopped private_scraper:1.0
 ```
 
 Important information:
@@ -34,9 +34,9 @@ Important information:
 See below for additional explaination of command flags and options:
 - **'-d'** - Run the process as a daemon. Without this command, the container will execute and bind to the console. This can be removed for testing if something is failing with the run command.
 - **'-p'** - Publish or expose a specific port.
-- **--env-file** - This is an important one. This injects the variables found in the env file into the environment within the container. This will include environment specific information and git service credentials.
 - **-v** - Mount a volume to the container. Because the scraper pulls data from your git service and saves assets to be served to the Primitive client, data persistence is important. In the example above, we are mounting a local directory (/tmp/primitive/data) to a folder inside the container(/srv/primitive/data). By doing this, the database and scraped assets will be stored outside the container and available on the local machine. Make sure that the local directory is present when running the container.
 - **-name** - This will be the name of the running container. It is best to name this something that represents the function and includes the versioning.
+- **-restart** - This is the restart policy. We can set this to `unless-stopped` to have the container restart if the dependent application or process crashes.
 - The last value is the image name. This will be the name of the image pulled from the registry and include the specific version being used (the example shows :latest but this may very well me a specific version.)
 
 <em> ** NOTE: Due to the websocket calls from the admin panel, port 3000 should be used for versions below 4.0 ** </em>

diff --git a/docs/private/environment-file.md b/docs/private/environment-file.md
@@ -1,22 +1,37 @@
-# Environment File
+# Environment File (config.json)
 
-The environment file is used to inject variables into the container during the run process. This will be used to authenticate to the private git services and ensure the client can properly display the content.
+The config file is used to set the required variables and values for the scraper to run. This is now a JSON file that is added to the `data` folder which is mounted to the host. This will be used to authenticate to the private git services and ensure the client can properly display the content.
 
-Below is an example of the required environment file:
+Below is an example of the required config file:
 
 ```
-PORT=3000
-HOST=localhost
-PROTOCOL=http
-VERBOSE=false
-SQLITE_DATABASE=true
-
-IS_PRIVATE=true
-PROVIDER=bitbucket
-GITHUB_USER=[ GITHUB USER ASSOCIATED WITH ACCESS TOKEN ]
-GITHUB_ACCESS_TOKEN=[ GITHUB ACCESS TOKEN]
-BITBUCKET_USER=[ BITBUCKET USER ASSOCIATED WITH APP PASSWORD]
-BITBUCKET_PASSWORD=[ BITBUCKET APP PASSWORD]
+{
+    "PORT": 443,
+    "HOST": "{{ scraper_host }}",
+    "PROTOCOL": "https",
+    "VERBOSE": true,
+    "SQLITE_DATABASE": true,
+    "IS_PRIVATE": true,
+    "PROVIDER": "{{ provider }}",
+    "BASE_URL": "{{ base_url }}",
+    "GITHUB_USER": "{{ github_user }}",
+    "GITHUB_ACCESS_TOKEN": "{{ github_token }}",
+    "BITBUCKET_USER": "{{ bitbucket_user }}",
+    "BITBUCKET_PASSWORD": "{{ bitbucket_token }}",
+    "GERRIT_USER": "{{ gerrit_user }}",
+    "GERRIT_PASSWORD": "{{ gerrit_token }}",
+    "GENERATE_DATABASE": true,
+    "TEMP_DIR": "/tmp",
+    "OUTPUT_DIR": "/home/scraper/data",
+    "ANALYZERS":[
+        {"name": "dotnet-parser", "image_name": "xxxxxxxxx.dkr.ecr.us-east-2.amazonaws.com/dotnet-parser:prod-0.1.23", "extensions": [".cs", ".h", ".hxx", ".hpp", ".cpp", ".c", ".cc", ".m", ".py", ".py3", ".js", ".jsx", ".kt", ".ts"]}
+    ],
+    "UPDATE_PERMS": false,
+    "ROOT_PASS":"",
+    "SSL_KEY": "{{ ssl_key }}",
+    "SSL_CERT": "{{ ssl_cert }}",
+    "SSL_CA": "{{ ssl_ca }}"
+}
 ```
 
 - **PORT** - The port that the service will bind too. If used with docker, this is internal to the docker container. If the service is run through Node.js or PM2, this will be the port that will be used to connect to the running scraper. If this is left out, the port will default to 3000.
@@ -31,4 +46,10 @@ BITBUCKET_PASSWORD=[ BITBUCKET APP PASSWORD]
 - **GITHUB_USER** - User account associated with the generated access token.
 - **GITHUB_ACCESS_TOKEN** - The Github access token for api calls and authentication. Only needs read access for the repositories.
 - **BITBUCKET_USER** - User account associated with the generated app password.
-- **BITBUCKET_PASSWORD** - App password with read access for the repositories.
+- **BITBUCKET_PASSWORD** - App password with read access for the repositories.
+- **TEMP_DIR** - The temporary file used during various processes in the scraper container. We usually use '/tmp' but this can be configured to something different.
+- **OUTPUT_DIR** - This is the local folder used to mount the container to the local file system. This will be shared between containers and therefore must be defined externally via the configuration file.
+- **ANALYZERS** - This is an array of json objects representing which parsers are available for the scraper. This includes the name, the image_name, and the extensions. The name is used internally and is usually just set to the parser name without the repo URL. The image name is what the actual name of the image is in the local docker environment. The extensions specifies which filetypes should be associated with that specific parser. If the extensions array is empty, no filetypes will be associated with that parser and it will NOT be run on project files.
+- **UPDATE_PERMS** - This can update the permissions of the scraper DBs if the scraper is run in a different permission that the parsers. This can be used during debugging or during development if the debugger being leveraged does not run as root.
+- **ROOT_PASS** - Used primarily during testing, this is used if the above variable is true (UPDATE_PERMS). This should be empty under normal deployments.
+- **SSL_KEY and SSL Variables** - Absolute file path of the ssl certs associated with the scraper. These are the files within the certs directory mounted to the container within the data directory.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -2,3 +2,4 @@

		- For all the information, visit documentation.primitive.io
		- For previewing the documentation locally, run "docsify serve ./" while in the root directory.
		- For exporting to PDF, you can use the Yzane Markdown PDF extension in Visual Studio Code. Once installed, use the `Primitive-Documentation-PDF.md` as the source and export to PDF.