This repo holds the exchange
and prediction
subgraphs from PancakeSwap that make use of the sparkle indexing framework
.
They are developer previews for The Graph Network.
Build with:
go build -o pancakeswap-exchange ./cmd/exchange
Call the graph-node
to create the deployment:
./pancakeswap-exchange create namespace/target_name # create a row in `subgraph` table (current_version = nil, previsou_version = nil)
Deploy it, as you would with normal subgraphs:
./pancakeswap-exchange deploy namespace/target_name # create a row in `subgraph_deployment` &`subgraph_version` & IPS upload & `deployment_schemas` & Update `subgraph` table current_version, previous_version (MAYBE)
Start the linear indexer:
./pancakeswap-exchange index namespace/target_name@VERSION
You will find the VERSION
printed when you deploy
the subgraph.
NOTE: As of August 17th 2021,
step
is in the process of being renamed tostage
everywhere, so adjust accordingly depending on your revision.
These are approximate commands to run the full flow:
./pancakeswap-exchange parallel step \
--step=1 \
--blocks-store-url ./path/to/blocks \
--start-block 6809700 \
--stop-block 6829699 \
--rpc-endpoint http://some-full-node:8545 \
--output-path ./path/to/stage1
The ./path/to/blocks
represents the firehose merged blocks files. They will be consumed directly and do not require a running firehose
service.
Paths for --blocks-store-url
, --input-path
and --output-path
use the dstore library which supports S3, Azure, GCP and filesystems.
You will then adjust the --start-block
and --stop-block
and repeat the process until you cover the desired chain segment (as close to the chain tip as possible). Both parameters are inclusive, so do not overlap.
These processes can be run in parallel.
./pancakeswap-exchange parallel step \
--step=2 \
--blocks-store-url ./path/to/blocks \
--start-block 6809700 \
--stop-block 6829699 \
--rpc-endpoint http://some-full-node:8545 \
--input-path ./path/to/stage1 \
--output-path ./path/to/stage2
Again, repeat with adjusted start/stop blocks. Ensure Stage 1 is fully complete before starting Stage 2.
Simply adjust --step
, --input-path
and --output-path
./pancakeswap-exchange parallel step \
--step=2 \
--blocks-store-url ./path/to/blocks \
--start-block 6809700 \
--stop-block 6829699 \
--rpc-endpoint http://some-full-node:8545 \
--input-path ./path/to/stage3 \
--output-path ./path/to/entities \
--flush-entities \
--store-snapshot=false \
This one will write entities for each single block, not only a snapshot at the end (to feed the latest Entities state into the next Stage) like previous stages do.
At this moment, you can inspect the files in ./path/to/entities
for data quality. These are the values that will end up in PostgreSQL in the next steps. But you can iterate on your process faster if you discover data quality issues at this stage. You can also be selective as to which stages and which segments you really need to re-run if you tweak the handler code.
These will split the workload on a different axis: one process per entities collection. So the worst case timing in this operation will be the largest table (with the most rows or most data).
./pancakeswap-exchange parallel to-csv \
--input-path ./path/to/entities \
--output-path ./path/to/csvs \
--only-tables=swap \
--chunk-size=20000 \
--stop-block=0
Notice the --only-tables
here, that allows you to limit the tables that will be generated by this process. If you don't specify it, it will run for all tables: you will only need to run it once (no parallel processing), but you won't gain from increased performance of parallelisation. If you split the work, use ./pancakeswap-exchange --help
to list all the entities managed by this subgraph and run one process for each entity.
The --chunk-size
specifies where to chunk the .csv
files. This only scopes the size of the postgres transaction when you to injecting into postgres, as each file will be able to be added atomically in the RDBMS. If one fails, you'll know sooner, and you can retry it without retrying one huge .csv file.
Truncating the target tables is required to avoid duplicate data.
Dropping the indexes will ensure the fastest injection time possible:
./pancakeswap-exchange drop-indexes \
pancakeswap/exchange-v2@[version] \
--psql-dsn postgresql://user:[email protected]:5432/graph?sslmode=disable
This method also supports --only-tables
but its quite fast so you shouldn't need to parallelize it.
Once the csv files are all generated, next step is to inject
into PostgreSQL:
./pancakeswap-exchange parallel inject \
pancakeswap/exchange-v2@[version] \
postgresql://user:[email protected]:5432/graph?sslmode=disable \
--only-tables=swap \
--input-path ./path/to/csvs \
--stop-block=0
Here again, you can parallelize the injection and maximize the throughput in PostgreSQL, and use all the available cores. The inject method uses the fastest injection method available: COPY FROM
and pipine the CSV through.
Now re-create the indexes:
./pancakeswap-exchange drop-indexes \
pancakeswap/exchange-v2@[version] \
--psql-dsn postgresql://user:[email protected]:5432/graph?sslmode=disable
You can also use --only-tables
but the process itself sleeps pretty much all the time, as PostgreSQL is doing the hard work. Indexes here will be created in parallel where possible directly from within the process, and linearly when it is known that postgres would lock otherwise.
Here are a few durations, as the worst casetimings (worst case of each parallel job) and storage size estimates when running it on GCP,
Here is a sample run for this Subgraph, from block 6,809,700
to 9,000,000
, on GCP, split into 160 parallel jobs (in stages 1-4)
The following durations are the worst case from all the 160 parallel jobs at each stage:
- Stage 1 duration: 5 minutes, produced: 412.0 MiB of data (pairs)
- Stage 2 duration: 8 minutes 33 seconds, produced: 21.61 GiB of data (pair reserves)
- Stage 3 duration: 34 minutes, produced: 31.21 GiB of data (price computations)
- Stage 4 duration: 42 minutes, produced: 512.22 GiB of data (output of all entities to JSON)
The following duration is for the worst case of the Entities (Swap):
- to-csv duration: 157 minutes (2h37m), produced: 242.27 GiB of data (CSV files ready to be injected in postgres)
The following duration is for the worst case of the Entities during injection (Swap), with all indexes dropped before injection:
- inject duration: 40 minutes
The following duration is the worst case of the Entities, provided you have more than 640GB of RAM. Otherwise, you might want to split the load in two (taking twice the time):
- create-indexes duration: 4 hours
Yes, the original was half-way through after 2 months.
Generate using sparkle
(see https://github.com/streamingfast/sparkle)
sparkle codegen ./subgraph/exchange.yaml github.com/streamingfast/sparkle-pancakeswap
go mod tidy
To init a database
go run ./cmd/exchange -- deploy \
--psql-dsn="postgresql://postgres:@localhost:5432/YOUR_DATABASE?enable_incremental_sort=off&sslmode=disable" \
project/subgraph
Issues and PR in this repo related strictly to Pancake Generated.
Report any protocol-specific issues in their respective repositories
Please first refer to the general StreamingFast contribution guide, if you wish to contribute to this code base.