Site Reliability Engineer

  • 100x Group
  • Hong Kong, Singapore, San Francisco
  • May 17, 2021
Full time Engineering - Backend

Job Description

100x Group explores, incubates and pursues opportunities and investments, as part of its mission to reshape the modern digital financial system into one which is inclusive and empowering. 100x Group is behind the cryptocurrency derivatives trading platform, BitMEX.

The 100x Group infrastructure team sits at the core of the business and is responsible for the reliability and scalability of all the services that power the BitMEX platform and its developers. As the BitMEX trading platform handles ten of thousands of low latency transactions per second, representing several billions of dollars traded every day.


  • Design, build and maintain core infrastructure components that allow BitMEX to support billions of dollars worth of trades daily - using Chef, Terraform and Kubernetes,

  • Develop BitMEX’s disaster recovery capabilities by designing and implementing near real-time data transfer & failover hardware/software solutions between AWS and our datacenter in Singapore,

  • Participate in worldwide follow-the-sun on-call rotations & investigations related to the trading platform’s availability in close collaboration with the Trading Technology, Security and Operation teams, plan & execute on short/long term curative and preventive solutions across services and various levels of the stack.


About You: 

  • 6+ years of professional experience, with a proven track record of designing, implementing, managing, and testing infrastructure at scale on AWS and on-prem data-centers for high value environments,

  • Have experience designing, planning and carrying out data centers hardware/software buildouts to match core software requirements in the framework of cross-provider & cross-continental deployments for disaster recovery purposes,

  • Have good experience with low-latency, high throughput & highly-available networks, spanning regions

  • Have experience with Chef, Terraform, ZFS, Ceph, kdb, or similar technologies,

  • Have a detail-oriented mindset considering edge cases, failure modes, behavioral patterns before all,

  • Strong engineering skill set with a firm grasp of fundamental Computer Science principles and a modular, maintainable, agile & test-driven approach to software development,

  • Strong technical troubleshooting, diagnosing and problem solving skills,

  • Capacity to multitask and give equal attention to a variety of functions while under pressure, capacity to multitask and give equal attention to a variety of functions while under pressure

  • Ability to adapt to changing priorities within a fast moving industry and startup culture.