PUSHR's global system status is being updated automatically when our monitoring systems detect an issue with any of our services. If you are aware of an ongoing issue that is not listed here, please report it to us by clicking on the link above, or by opening a ticket from your account's dashboard.
Sonic object storage - Master nodes reboot
Completed
Scheduled for March 25, 2023 at 6:00 PM – 6:31 PM
Affects
Sonic Object Storage
Under maintenance from 6:00 PM to 6:31 PM
Updates
Completed
March 25, 2023 at 6:31 PM
Completed
March 25, 2023 at 6:31 PM
Reboot with new kernel flag has completed successfully and all services have been re-enabled. We will now continue monitoring. If this issue persists we will initiate the switch towards different hardware as mentioned in the description of this maintenance window. This is the last known issue that keeps Sonic back from exiting the beta stage.
In progress
March 25, 2023 at 6:00 PM
In progress
March 25, 2023 at 6:00 PM
Maintenance is now in progress
Planned
March 25, 2023 at 6:00 PM
Planned
March 25, 2023 at 6:00 PM
The team is preparing the master nodes in the cluster for a hardware reset, in our continuous attempts to resolve the issue which causes random temporary unavailability on file uploads and the S3 API. During the last maintenance window on March 22, a complete replacement of the hardware of a single master node was done, which ruled out faulty hardware component(s) as the root cause. Based on all collected information available from testing, available system logs and known specific bugs related to the AMD platform (on which Sonic is built), we will perform this reset to load the systems' kernel with the "iommu=pt" flag. This will allow us to pass through AMD's technology which enables virtualisation of I/O resources (AMD-Vi). Should this attempt fail at resolving the issue, a decision has been made to initiate a switch of all master nodes to a different type of servers powered by Intel.
The expected unavailability during this maintenance window is 15 minutes. It could be extended in the event that we hit a boot issue with the kernel flag enabled and need to revert the configuration.