Design, scale & automate a global network supporting millions of devices globally | Autonomous culture | Strong support for professional development
- Global cloud infrastructure serving more than 4 billion HTTP requests per day
- Autonomous environment with high ownership and impact to customers & product
- Combine strong network engineering skills with passion to automate manual tasks
The company & team
Our client is a cloud-managed IT company headquartered in San Francisco and expanding rapidly in Sydney. They provide a full-suite of cloud-controlled products powering critical infrastructure of network switches, security appliances, wireless APs and security cameras. Backed by the resources and branding of a stable industry giant, they operate as an autonomous unit with great engineering culture.
The Infrastructure SRE team is responsible for shaping reliable and secure network connections to and within their private cloud, and is passionate about automating manual tasks with the right tools.
What's with the title?
Our client is constantly innovating, and this is a newly conceived role reflecting a lesser-known title.
Taking a spin on Google conceived "Site Reliability Engineering", this role would combine principles of operation and software engineering, with a dedicated focus in networking.
You will lead the design, development and operational aspects of the global network, including automating network systems and working closely with existing vendors to coordinate all hands on work.
You will make crucial decisions with direct impact to their product and customers, be leaned on for your thought leadership on best practices to constantly innovate and improve internal processes with the right tools. Surrounded by innately curious and passionate engineers who customarily drop all else for a teaching moment, you can also be assured that you will receive strong mentorship and support for your professional development.
- Have 6+ years experience designing, deploying and operating mid to large scale network environments.
- Have 2+ years experience scripting or coding with languages like Ruby, Scala, Python, or Bash.
- Know your way around *nix systems. They run Debian.
- Your interest spans beyond routers and switches, you enjoy solving end-to-end problems and have solid experience with protocols at all layers of the OSI model (ARP, DNS, HTTP, etc).
- Are interested in scripting or coding and digging into other people's source code in search of the root cause of a problem
- You automate all the things.
- You care about, and empathize with, the customer experience. You have experience supporting an externally-facing production environment. Ideally in a team that follows the sun.
The ideal candidate will have strong skills across networking (troubleshooting, architecture), operations, scripting / programming skills and a passion in automating manual tasks with the right tools.
*Bonus points for experience with: *Network Security, BGP, OSPF, IPv6, TCP BBR, DMVPN, IPSec, MACSec, NMS, advanced Unix/Linux, system administration, scripting/Bash/Ruby/Python, project management experience, AWS/Azure, Docker, K8s, SDN/Openstack/Openflow, Ansible/Puppet/Chef, REST/SOAP APIs, TLS or Cloud/ISP/Telco exposure.
Exciting challenges you'll be tackling:
- Designing and leading the deployment of a global DMVPN WAN.
- Developing comprehensive monitoring tools that provide visibility into the performance and reliability of our network infrastructure.
- Automated testing infrastructure to accelerate the velocity at which we can deploy changes.
- Full-stack troubleshooting of production issues (application, system and network) through to root cause analysis and implementation of preventative measures.
- Design, implementation and management of an overlay network to support 1000's of containers.