pushr - Sonic partial unavailability – Incident details

PUSHR's global system status is being updated automatically when our monitoring systems detect an issue with any of our services. If you are aware of an ongoing issue that is not listed here, please report it to us by clicking on the link above, or by opening a ticket from your account's dashboard.

Sonic partial unavailability

Resolved
Operational
Started over 1 year agoLasted about 1 month

Affected

Sonic Object Storage

Operational from 2:34 PM to 1:07 PM

Updates
  • Resolved
    Resolved

    This incident has been resolved.

  • Update
    Update

    We've managed to resolve the temporary HTTP5xx errors on newly uploaded content. Step 2 of the patch will now be applied, and it should provide a permanent fix for the drives unavailability issue, but will not provide a permanent fix for the HTTP5xx issue. We will keep this incident open, but Sonic's state is now back to fully operational.

  • Update
    Update

    We are continuing to work on a fix for this incident. A 2-step patch is being applied. In step 1 we've reconnected the offline drives and content previously unavailable is now online. In step 2 we are addressing the root cause of the issue which so far is believed to be related to temporary loss in connectivity between some servers in the cluster. This connectivity loss is also believed to be the cause of another issue that we've received reports for today - some newly uploaded files may be returning HTTP5XX errors upon download. We are still attempting to confirm the link between the two issues and will be holding back step 2 until we have better understanding of the second issue.

  • Identified
    Identified

    We’ve identified the issue. A patch is being prepared that should fix the issue. We’ll be working on this non-stop until fully resolved. Next update will come in a few hours.

  • Investigating
    Investigating

    Our Sonic cluster has lost connectivity to 11 disk drives. Files stored on them may be unavailable for fresh content that has not yet been erasure coded. We are currently investigating this incident.