Chaos Testing
LoomCache ships a self-contained Java chaos-testing framework in the server test suite. There is no external dependency or separate runtime — histories, checkers, and nemeses are all pure Java.
Methodology
Section titled “Methodology”A run:
- Generates operations — concurrent clients drive reads/writes/CAS against the cluster (
ChaosWorkloadproduces Register / Counter / Queue / Set / Mixed workloads;ChaosClient/ChaosRealClientdrive them). - Injects faults —
ChaosNemesisis a sealed fault family: symmetric partition, node isolation, message reorder, kill node / kill leader / pause node, clock skew, slow disk (message-send delay), memory pressure, and CPU contention, composable viaChaosNemesis.Combined. - Records a history —
ChaosHistorycaptures invocation and completion timestamps. - Verifies consistency — a linearizability checker or the per-model
ChaosCheckercheckers validate the history.
Framework
Section titled “Framework”ChaosTestHarness— wires workload + nemesis + cluster into a single executable test.ChaosCluster/ChaosRealCluster— both start real LoomCache clusters (Raft + TCP) with fault hooks.ChaosClient/ChaosRealClient— op generators.ChaosWorkload— op generators: Register / Counter / Queue / Set / Mixed workloads.ChaosNemesis— sealed fault family (see Methodology for the fault list).ChaosHistory/ChaosReport— history and reporting.ChaosCheckerand the linearizability checker —ChaosCheckerexposes the per-model checkers; the linearizability checker runs a linearizability search overregister/counter/queue/set.model/—Operation,CounterModel,QueueModel,LockModel,SetModel.
Test coverage
Section titled “Test coverage”There is no per-scenario file tree under tests/. The harness is exercised by exactly two test classes, both in
loom-server test sources:
ChaosFrameworkEnhancedTest(no chaos tag) — drives the framework primitives: linearizability register/counter/queue/set checks (positive, violation, and malformed-input cases), the per-modelChaosCheckercheckers, everyChaosWorkloadgenerator,ChaosHistoryrecording/concurrency,ChaosReportsummary/latency/fault-timeline output, theChaosNemesisfault types against a recording cluster double, and one end-to-endChaosTestHarnessrun that starts a real 3-node LoomCache cluster.tests/RealClusterLinearizabilityTest(@Tag("chaos")) — starts a real 3-nodeChaosRealClusterand asserts register linearizability via the linearizability checker over concurrent leader-local / per-node map operations.
Fault types
Section titled “Fault types”ChaosNemesis is a sealed interface; all faults compose via ChaosNemesis.Combined:
- Network —
SymmetricPartition,IsolateNode,MessageReorder. - Process —
KillNode,KillLeader,PauseNode. - Resource / timing —
ClockSkew(simulates clock-skew effects by callingcluster.pauseNode()— no actual system clock manipulation),SlowDisk(message-send delay),MemoryPressure,CpuContention.
Scope of Evidence
Section titled “Scope of Evidence”- The real-cluster runs (
ChaosTestHarness/ChaosRealCluster) exercise actual Raft and TCP via real LoomCache instances, which run the WAL and snapshot machinery as part of normal startup. - The framework-primitive tests feed hand-built histories straight into the checkers — they verify checker/history logic without touching the network stack.
- The linearizability checker is a linearization search supporting the
register,counter,queue, andsetdata types only (it rejects any other type) and carries its own internal model state. The search is bounded byMAX_SEARCH_STATES = 500_000; when that limit is reached the checker conservatively reports a violation rather than returning a heuristic pass. - The per-model
ChaosCheckercheckers cover the rest: register, counter, queue, set, and mutual-exclusion models (including fence-token and double-lock checks). RealClusterLinearizabilityTestlives inloom-servertest sources (notloom-integration-tests); it boots three real LoomCache nodes per test, reserving fork-scoped TCP ports to stay parallel-safe under the repository Maven defaults.
Running
Section titled “Running”Run one Maven clean/test/verify command per checkout at a time. clean mutates shared target/ directories; use a
separate worktree for parallel chaos or evidence lanes.
# Framework-primitive + one end-to-end harness run (no chaos tag, runs by default):./mvnw -pl loom-server -am test -Dtest=ChaosFrameworkEnhancedTest# Real 3-node cluster register-linearizability run. RealClusterLinearizabilityTest is @Tag("chaos"),# which the default unit lane excludes (ut.excludedGroups=benchmark,chaos,stress,slow), so opt in# with -Dgroups=chaos while keeping benchmark, stress, and slow isolated:./mvnw -pl loom-server -am test -Dgroups=chaos -Dut.excludedGroups=benchmark,stress,slow -Dtest=RealClusterLinearizabilityTestThe framework-primitive checks complete in seconds. The real-cluster runs (ChaosTestHarness inside
ChaosFrameworkEnhancedTest, and RealClusterLinearizabilityTest) boot real LoomCache nodes and elect a Raft
leader, so they take longer; per-client op joins use a 30 s ceiling.
Reporting
Section titled “Reporting”ChaosReport emits a human-readable summary plus a full history trace for failing runs. Replay the trace through
the linearizability checker to reproduce and debug locally.
For the broader correctness story (Raft invariants, durability, near-cache coherence), see the architecture overview.
LoomCache is an independent open-source project. It is not affiliated with, endorsed by, or sponsored by Hazelcast, Inc. or by any other company whose products are named in this documentation. “Hazelcast” is a trademark of Hazelcast, Inc.; references to it are nominative and describe only migration and comparison. All other product and company names are trademarks of their respective owners and are used for identification purposes only.