ADR 067: Simulator v2
Changelog​
- June 01, 2023: Initial Draft (@alexanderbez)
Status​
DRAFT
Abstract​
The Cosmos SDK simulator is a tool that allows developers to test the entirety of their application's state machine through the use of pseudo-randomized "operations", which represent transactions. The simulator also provides primitives that ensures there are no non-determinism issues and that the application's state machine can be successfully exported and imported using randomized state.
The simulator has played an absolutely critical role in the development and testing of the Cosmos Hub and all the releases of the Cosmos SDK after the launch of the Cosmos Hub. Since the Hub, the simulator has relatively not changed much, so it's overdue for a revamp.
Context​
The current simulator, x/simulation
, acts as a semi-fuzz testing suite that takes
in an integer that represents a seed into a PRNG. The PRNG is used to generate a
sequence of "operations" that are meant to reflect transactions that an application's
state machine can process. Through the use of the PRNG, all aspects of block production
and consumption are randomized. This includes a block's proposer, the validators
who both sign and miss the block, along with the transaction operations themselves.
Each Cosmos SDK module defines a set of simulation operations that attempt to
produce valid transactions, e.g. x/distribution/simulation/operations.go
. These
operations can sometimes fail depending on the accumulated state of the application
within that simulation run. The simulator will continue to generate operations
until it has reached a certain number of operations or until it has reached a
fatal state, reporting results. This gives the ability for application developers
to reliably execute full range application simulation and fuzz testing against
their application.
However, there are a few major drawbacks. Namely, with the advent of ABCI++, specifically
FinalizeBlock
, the internal workings of the simulator no longer comply with how
an application would actually perform. Specifically, operations are executed
after FinalizeBlock
, whereas they should be executed within FinalizeBlock
.
Additionally, the simulator is not very extensible. Developers should be able to easily define and extend the following:
- Consistency or validity predicates (what are known as invariants today)
- Property tests of state before and after a block is simulated
In addition, we also want to achieve the following:
- Consolidated weight management, i.e. define weights within the simulator itself via a config and not defined in each module
- Observability of the simulator's execution, i.e. have easy to understand output/logs with the ability to pipe those logs into some external sink
- Smart replay, i.e. the ability to not only rerun a simulation from a seed, but also the ability to replay from an arbitrary breakpoint
- Run a simulation based off of real network state
Decision​
Instead of refactoring the existing simulator, x/simulation
, we propose to create
a new package in the root of the Cosmos SDK, simulator
, that will be the new
simulation framework. The simulator will more accurately reflect the complete
lifecycle of an ABCI application.
Specifically, we propose a similar implementation and use of a simulator.Manager
,
that exists today, that is responsible for managing the execution of a simulation.
The manager will wrap an ABCI application and will be responsible for the following:
- Populating the application's mempool with a set of pseudo-random transactions before each block, some of which may contain invalid messages.
- Selecting transactions and a random proposer to execute
PrepareProposal
. - Executing
ProcessProposal
,FinalizeBlock
andCommit
. - Executing a set of validity predicates before and after each block.
- Maintaining a CPU and RAM profile of the simulation execution.
- Allowing a simulation to stop and resume from a given block height.
- Simulation liveness of each validator per-block.
From an application developer's perspective, they will only need to provide the modules to be used in the simulator and the manager will take care of the rest. In addition, they will not need to write their own simulation test(s), e.g. non-determinism, multi-seed, etc..., as the manager will provide these as well.
type Manager struct {
app sdk.Application
mempool sdk.Mempool
rng rand.Rand
// ...
}
Configuration​
The simulator's testing input will be driven by a configuration file, as opposed to CLI arguments. This will allow for more extensibility and ease of use along with the ability to have shared configuration files across multiple simulations.
Execution​
As alluded to previously, after the execution of each block, the manager will
generate a series of pseudo-random transactions and attempt to insert them into
the mempool via BaseApp#CheckTx
. During the ABCI lifecycle of a block, this
mempool will be used to seed the transactions into a block proposal as it would
in a real network. This allows us to not only test the state machine, but also
test the ABCI lifecycle of a block.
Statistics, such as total blocks and total failed proposals, will be collected, logged and written to output after the full or partial execution of a simulation. The output destination of these statistics will be configurable.
func (s *Simulator) SimulateBlock() {
rProposer := s.SelectRandomProposer()
rTxs := s.SelectTxs()
prepareResp, err := s.app.PrepareProposal(&abci.PrepareProposalRequest{Txs: rTxs})
// handle error
processResp, err := s.app.ProcessProposal(&abci.ProcessProposalRequest{
Txs: prepareResp.Txs,
// ...
})
// handle error
// execute liveness matrix...
_, err = s.app.FinalizeBlock(...)
// handle error
_, err = s.app.Commit(...)
// handle error
}
Note, some applications do not define or need their own app-side mempool, so we
propose that SelectTxs
mimic CometBFT and just return FIFO-ordered transactions
from an ad-hoc simulator mempool. In the case where an application does define
its own mempool, it will simply ignore what is provided in RequestPrepareProposal
.
Profiling​
The manager will be responsible for collecting CPU and RAM profiles of the simulation execution. We propose to use Pyroscope to capture profiles and export them to a local file and via an HTTP endpoint. This can be disabled or enabled by configuration.
Breakpoints​
Via configuration, a caller can express a height-based breakpoint that will allow the simulation to stop and resume from a given height. This will allow for debugging of CPU, RAM, and state.
Validity Predicates​
We propose to provide the ability for an application to provide the simulator a set of validity predicates, i.e. invariant checkers, that will be executed before and after each block. This will allow for the application to assert that certain state invariants are held before and after each block. Note, as a consequence of this, we propose to remove the existing notion of invariants from module production execution paths and deprecate their usage altogether.
type Manager struct {
// ...
preBlockValidator func(sdk.Context) error
postBlockValidator func(sdk.Context) error
}
Consequences​
Backwards Compatibility​
The new simulator package will not naturally not be backwards compatible with the
existing x/simulation
module. However, modules will still be responsible for
providing pseudo-random transactions to the simulator.
Positive​
- Providing more intuitive and cleaner APIs for application developers
- More closely resembling the true lifecycle of an ABCI application
Negative​
- Breaking current Cosmos SDK module APIs for transaction generation