Static Corpus Setup¶
The graph database is far too bloated for practical use, instead we can export the relevant information to a small json file and use that directly.
This file is needed for many of the tasks that worker nodes will do and needs to be retrieved or generated.
Option 1: Use static corpus provided in repo (Recommended)¶
This static corpus was generated from
all_schema.graphml
on July 15th '22.
Prerequisites¶
- General Setup has been completed.
Install Static Corpus¶
- Once configured open admin powershell at
<repo-root>
. - Install the corpus provided with this repo:
Option 2: User a user provided static corpus¶
This will just load the user provided file into the install location.
Prerequisites¶
- General Setup has been completed.
- The corpus to be installed at
<static-corpus-loc>
.
Install Static Corpus¶
- Once configured open admin powershell at
<repo-root>
. - Install a user provided corpus from
<static-corpus-loc>
Option 3: Generate Static Corpus from a Corpus Database¶
This will generate a static corpus from a running graph server, using whatever corpus it was configured with.
Prerequisites¶
- General Setup has been completed.
Configure Server Settings¶
The config file at
<config-dir>/craidl.conf.yaml
stores information for connecting to a corpus DB server.
- Open admin powershell to
<repo-root>
. -
(Optional) View currently loaded config file:
The fields underserver_host
andserver_port
determine which corpus database this tool will connect to when generating a static corpus. The default options connect to a stub server running on the same machine. -
(Optional) If you're connecting to a corpus database not running on the current machine then create or update
<config-dir>/craidl.conf.yaml
.- Set
server_host
to<corpus-db.ip>
. - Set
server_port
to<corpus-db.port>
.
- Set
Generate Static Corpus¶
- Ensure the server at
<corpus-db.ip>
is currently running. - Generate the static corpus from that server.
The stub server seems to hang when out of memory or CPU, halting the serialization process.
To mitigate this the generation process periodically saves the serialized component data and skips re-downloading saved data. By default every cluster of 50 components is saved to disk.
Just rerun the command (with the same settings) and it will resume generating the corpus from the last saved cluster of components.
See the section on using Craidl for information on how to use the generated static corpus...