Flow Mainnet Network Issue

Incident Report for Flow

Postmortem

Flow mainnet incident on 18th Feb and 19th Feb

Hello,

There was an incident on Flow mainnet on 18th Feb between 08:43 UTC and 11:36 UTC (first issue) and on 19th Feb between 21:30 UTC and 22:00 UTC (second issue) during which the chain liveness was severely impacted.

The root cause of both the issues was that nodes run by the Flow Foundation were not able to make any progress because of the backpressure created by a 3rd party logging subsystem which was down. The chain’s consensus process responsible for building blocks tried to recover. However, given that 18% of the consensus nodes were down, the duration for the remaining consensus nodes to agree on blocks increased to 30 seconds (with spikes up to 90 seconds), which is far beyond the 0.8 second block time on the happy path. As a result, transactions timed out and were dropped.

To ensure that this does not reoccur, logging has been reduced on most of the Foundation nodes for the interim so that even if their logging subsystem fails, it will not create any backpressure on the node. The Flow Foundation team is also working on removing logging backpressure as a failure scenario from the node software.

Another problem during the incident on 18th was that it took the Flow Foundation team some time to diagnose the issue, because alerting the incident response team also relied on the failed logs and metrics sub-system.

Therefore, the team is starting to investigate another independent monitoring and alerting solution as a backup to the primary system.

Thank you,

Flow Team

Posted Feb 20, 2025 - 01:11 UTC

Resolved

The incident has been resolved.
Posted Feb 19, 2025 - 22:26 UTC

Investigating

Flow Mainnet and Testnet are currently under maintenance due to performance issues, but should be back online soon. We will provide more updates shortly.
Posted Feb 19, 2025 - 21:50 UTC
This incident affected: Flow Testnet, Flow mainnet core components (Collection Finalization, Block Finalization, Transaction Execution, Block Sealing), and Flow mainnet Access APIs (GRPC API, GRPC Web API, REST API, EVM Gateway).