Site Reliability Engineer

  • BitMEX
  • Hong Kong, Singapore, San Francisco
  • Aug 22, 2021
Full time Engineering - Backend

Job Description

BitMEX is the world’s leading cryptocurrency derivatives trading platform, which has pioneered cryptocurrency trading through relentless commitment to change, and continues to set benchmarks for innovation, liquidity, and security today.

As the world's most advanced peer-to-peer crypto-products trading platform and API, BitMEX gives knowledge, confidence, and precision to hundreds of thousands of traders, transacting billions of USD a day.

Join us, as we build a thriving cryptocurrency ecosystem through strategic investments in emerging cryptocurrency technology, and create the future of digital financial services.

Responsibilities:

  • Design, build and maintain core infrastructure components that allow BitMEX to support billions of dollars worth of trades daily - using Chef, Terraform and Kubernetes,

  • Develop BitMEX’s disaster recovery capabilities by designing and implementing near real-time data transfer & failover hardware/software solutions between AWS and our datacenter in Singapore,

  • Participate in worldwide follow-the-sun on-call rotations & investigations related to the trading platform’s availability in close collaboration with the Trading Technology, Security and Operation teams, plan & execute on short/long term curative and preventive solutions across services and various levels of the stack.

 

About You: 

  • 6+ years of professional experience, with a proven track record of designing, implementing, managing, and testing infrastructure at scale on AWS and on-prem data-centers for high value environments,

  • Have experience designing, planning and carrying out data centers hardware/software buildouts to match core software requirements in the framework of cross-provider & cross-continental deployments for disaster recovery purposes,

  • Have good experience with low-latency, high throughput & highly-available networks, spanning regions

  • Have experience with Chef, Terraform, ZFS, Ceph, kdb, or similar technologies,

  • Have a detail-oriented mindset considering edge cases, failure modes, behavioral patterns before all,

  • Strong engineering skill set with a firm grasp of fundamental Computer Science principles and a modular, maintainable, agile & test-driven approach to software development,

  • Strong technical troubleshooting, diagnosing and problem solving skills,

  • Capacity to multitask and give equal attention to a variety of functions while under pressure, capacity to multitask and give equal attention to a variety of functions while under pressure

  • Ability to adapt to changing priorities within a fast moving industry and startup culture.