Senior Site Reliability Engineer / Paris or Full Remote

Kaiko is a rapidly growing fintech startup in the digital assets industry with an international presence. Our mission is to be the foundation of the new digital finance economy, which promises to expand financial opportunity and inclusion globally. We do this by empowering market participants with accurate, transparent, and actionable crypto data to be leveraged for a range of market activities including strategy backtesting, in-depth research, valuation, analytics, and integrations.

About the Job

Overview

You will be joining a fast-paced engineering team made up of people with significant experience working with terabytes of data. We believe that everybody has something to bring to the table, and therefore put collaborative effort and team-work above all else (and not just on the engineering side). You will be able to work autonomously as an equally trusted member of the team, and participate in efforts such as:

+ Addressing high availability problems: load balancing, disaster recovery, replication, sharding, etc
+ Addressing “big data” problems: 200+ millions of messages/day, 160B data points since 2010 (currently growing at a rate of 10B per month)
+ Improving our development workflow, continuous integration, continuous delivery and in a broader sense our team practices
+ Expanding our platform’s observability through monitoring, logging, alerting and tracing

Our Stack

+ Monitoring: VictoriaMetrics, Grafana
+ Alerting: AlertManager, Karma, PagerDuty
+ Logging: Vector, Loki
+ Caching: FoundationDB
+ Secrets management and PKI: Vault
+ Configuration management and provisioning: Terraform, Ansible
+ Service discovery: Consul
+ Messaging: Kafka
+ Proxying: HAProxy, Traefik
+ Service deployment: Terraform, Nomad (plugged in Consul and Vault)
+ Database systems: ClickHouse (main datastore), FoundationDB (caching, deduplication), replicated PostgreSQL
+ Operating System: Ubuntu 20.04
+ Protocols: gRPC, HTTP (phasing out in favor of gRPC), WebSocket (phasing out in favor of gRPC)
+ Platform: OCI containers

Your Missions

+ Deploy, maintain, evolve our infrastructures for optimum data consistency, availability while keeping costs down (we have 2 autonomous regions)
+ Automate what is not, fix what's needed
+ Innovate and bring your ideas to the table
+ Adapt fast

Who We Are Looking For

+ Significant experience as a DevOps/Systems Administrator
+  Experienced about Linux system admin, automation (Ansible is a minimum)
+  Worked with, in no particular order: troubleshooting crashes & performance issues, load-balancing, VIPs/fail-over IPs, RAID

Please note that we don’t have any “hard” requirements in terms of development platforms or technologies: we are primarily interested in people capable of adapting to an ever changing landscape of technical requirements, who learn fast and are not afraid to constantly push our technical boundaries. It is not uncommon for us to benchmark new technologies for a specific feature, or to change our infrastructure in a big way to better suit our needs. The most important skills for us revolve around two things:

+ What we like to call “core” knowledge: what’s a software process, how does it interact with a machine’s or the network’s resources, what kind of constraints can we expect for certain workloads, etc
+ How fast you can adapt to a technology you didn’t know existed 10 minutes ago

Nice to Have

+ Experience with HashiCorp tools (Terraform, Vault, Consul, Nomad)
+ Experience with orchestrating containers, micro-services
+ Experience with recent Ubuntu, Systemd
+ Knowledgeable about network, routing (BGP, static, …), tunneling- Knowledge about encryption (PGP/TLS/SSH/WireGuard/…)
+ Basic knowledge of crypto-currencies

Personal Skills

+ Honest, getting and giving feedback is very important to you
+ Humble, making new errors is an essential part of your journey
+ Empathetic, you feel a sense of responsibility for all the team’s endeavors and don’t pay attention to the individual level of involvement
+ Committed, as an equally important member of the team, you want to make yourself heard while respecting everybody’s point of view
+ Fluent in written and spoken English (we have 5 different nationalities in the team!)
+ You have the utmost respect for legacy code and infrastructure, with some occasional and perfectly understandable respectful complaints

We do our best to select people that lead by example and experience rather than by position or seniority.

Company Perks

+ Hardware of your choice
+ Paid vacations (and French RTT)
+ Health insurance (Alan Blue, 75% subsidized by Kaiko)
+ Meal vouchers (Swile, 10€/day)
+ Multiple team events (annual retreat, casual events, etc)

Recruitment Process

+ Introduction call (30mins)
+ Meeting with a senior Site Reliability Engineer as well as another senior member of the Engineering team for a technical/product RPG: you read that right, no written test, no whiteboard quicksort implementation (1h30)
+ Informal discussions with other members of the company working in sales, product, research, marketing, etc (45mins)
+ Meeting with our Head of Engineering

Each step is generally held on a different day, we do our best to follow-up in the next 24 hours, and we always provide the candidates with a thorough explanation of our decision.

Additional Information

+ Contract type: Full Time (French CDI)
+ Location: On-site at our Paris office or Full Remote (CEST +/-1h)

Interested? Reach out to us at engineering@kaiko.com.