Industry:
FinTech
Company Size:
51-200 employees
Installation:
Founded
2018
Headquarters
New York, NY
The installation process was very easy. groundcover’s Helm chart just worked, and the platform was ready to provide value right away. We did end up building our own OTel logging pipeline to make the setup more production-ready, reducing our metric cardinality by performing aggregation on our custom metrics. The groundcover team was particularly helpful during this process.
ShemTov Fisher
,
DevOps team lead
Solidus Labs

About Solidus Labs

Solidus Labs, founded in 2018 by former Goldman Sachs engineers and cybersecurity professionals, is a leading provider of crypto-native security and compliance solutions. Its mission is to enable safer crypto trading across centralized and decentralized markets.

Supporting the safety of live trading, they’re building on top of an elastic Kubernetes infrastructure layer hosted on AWS Elastic Kubernetes Service (EKS) and rely on multiple managed services hosted by AWS for maximal reliability and availability.

Solidus Labs service HALO supports both batch and real-time trading governance, operating under millions of events per day and under a strict four 9s high availability requirement.

The Problem: Detecting issues before users encounter them.

Solidus Labs’ Kubernetes stack gives them more flexibility and a streamlined approach to CI/CD, but it also brings more complexity and an inter-dependency between the infrastructure and application layers.

As a result, they rely heavily on proactive alerting that allows them to track leading indicators of an emerging issue and notify users before things escalate. Platform downtime has a direct impact on their customer’s compliance and may even result in regulatory fines. It is therefore crucial for them to be able to identify and troubleshoot live issues as quickly as possible.

Solidus Labs was relying on a hybrid observability stack based on:

  • Logz.io for log management
  • Self-hosted Prometheus for infrastructure and custom metrics
  • Jaeger OpenTelemetry stack for tracing.

Their existing stack created multiple challenges:

  • K8s monitoring was technically complex: Although the impact of a single pod OOMing, noisy node or mis-placed resource limitation would have been big, only a few power users that knew how to effectively operate the Kubernetes CLI were able to take part in the monitoring.
  • Tracing was a critical part of Solidus’s stack. Monitoring throughput fluctuations, latencies and error rates were important indicators of the system’s health and performance. The Jaeger stack and its underlying Cassandra DB did not support their scale and required constant maintenance. In addition, although valuable, instrumenting their stack with OpenTelemetry required ongoing attention from developers and the DevOps team.
  • No single pane of glass: Solidus Labs lacked the ability to easily transition and correlate logs, metrics, traces and infrastructure events in one place - resulting in a longer and more complex process for RCA.
"One of the problems in a microservices architecture is the ability to track the root cause of issues… Before we started using groundcover, it was hard to detect Kubernetes infrastructure issues and connect them to the problem we’re seeing.”

- Idan Lavy, VP R&D, Solidus Labs

Why groundcover?

  • Consolidated, Kubernetes-native observability: The ability to gain deep insight into their K8s infrastructure, together with all of our other needs, all in one place, was a huge advantage. This allowed Solidus Labs to investigate logs and metrics, while adding mission-critical context on what is happening inside their clusters, in real time. 
  • Improved platform reliability and customer experience: Solidus Labs can now monitor all container crashes inside their clusters at a much more granular level, which has a direct and immediate impact on the reliability of their platform and the positive experiences of their customers.
  • eBPF traces and a modern backend for OpenTelemetry: groundcover’s sensor collects traces using eBPF. These traces allowed faster and better insights for Solidus’ team by instantly accessing traces for their entire stack, with zero efforts required for instrumentation or maintenance. groundcover’s eBPF sensor also provides unique insight on traces, including the full payloads of traces, which in many cases has been priceless for the team.
  • The technology behind groundcover: groundcover’s backend includes a ClickHouse and is built on top of a Bring Your Own Cloud (BYOC) architecture. As Solidus Lab’s engineers got familiar with ClickHouse and groundcover’s architecture they started using that for their internal platform needs, moving their storage stack to rely on ClickHouse and pioneering a BYOC offering inside the company.
“The out-of-the-box ability of eBPF to look into the actual payload of our traces and transactions is what makes groundcover’s technology and architecture superior to other tools”

- Idan Lavy, VP R&D, Solidus Labs

The Impact

Thanks to groundcover, Solidus Labs was able to fully migrate from their legacy stack, replacing their Jaeger and & Cassandra OpenTelemerry stack with a single, unified platform. Operating on top of Clickhouse inside groundcover’s BYOC backend, it promised reliability and scalability.

  1. Better alignment on investigation processes: The entire team is now fully aligned on when and where to start an investigation. Even non-proficient colleagues are now able to investigate complex issues, previously rolled on to the DevOps team, with increased accountability.
  2. Low maintenance, increased reliability: groundcover’s fully managed BYOC backend allowed Solidus Labs to free the DevOps team from maintaining data ingestion pipelines and databases that were part of their self-hosted observability stack. It also improved the performance, at-scale, of their tracing stack, using a reliable interface that developers could easily query when critical.
  3. Increased cost-effectiveness: Solidus Labs has a strong DevOps and engineering culture, which had previously allowed them to maintain OSS solutions. Moving to groundcover eliminated the need to invest time in maintaining their observability stack by moving to a fully managed backend model, still running on their infrastructure, but keeping, or even lowering, their total cost of ownership. It also removed all hidden costs, such as those occurring from maintenance and downtime recovery.
"Observability used to be unreliable, because it was scattered across many tools and required constant maintenance. Having an unreliable observability platform is like having no observability at all. groundcover changed that for us by providing a consolidated, no-touch observability solution. ״

- ShemTov Fisher, DevOps team lead, Solidus Labs

Observability used to be unreliable, because it was scattered across many tools and required constant maintenance. Having an unreliable observability platform is like having no observability at all. groundcover changed that for us by providing a consolidated, no-touch observability solution.
ShemTov Fisher
,
DevOps team lead
Solidus Labs

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.