Site Reliability Engineer
Job Description
We are one of the world’s leading cloud platforms for integrating services and enabling app developers in the media space, offering fast time-to-market and a rich array of features on a variety of platforms. Using Applicaster’s SaaS cloud platform, broadcasters and content owners can develop and launch apps, and OTT services or integrate selected modules and functionalities such as engagement and interactivity into third-party apps.
Our powered apps are used by millions of viewers worldwide every day. Applicaster proudly counts among its customers ProSiebenSat.1, Television Academy, Viacom, Baby First, and many more!
What you will do:
· Focus on developing programmatic solutions to our growing infrastructure
· Utilize your skills in automation, replication, and scaling to manage our data centers
· Write scripts in NodeJS, Ruby, Python, etc. to build custom tools for automation replication and scaling;
· Build tools to monitor and provide metrics on our systems; Perform Linux system administration (DNS, NFS, RPM, Apache, Raid, etc.)
· You will be responsible for our Kubernetes clusters and deployment strategy.
· Support an always-available cloud-based SaaS platform
· Support application deployments, building new systems, and upgrading and patching existing ones.
· Develop automation to quickly and rapidly deploy instances from hardened images
· Using monitoring tools to find problems, resolve and/or escalate to development and ensure that we exceed our SLAs
· Build and manage development and testing environments, assisting developers in debugging application issues using tools
· Participate in the building of tools and processes to support the infrastructure
· Contribute to the creation of system support documents such as run books used by NOC
· Leverage scripting to build required automation and tools on an Adhoc basis
· Operate the platform within our security and privacy guidelines
· Developing back-end services for media-related applications
· Participating in defining and implementing large-scale back-end services to support millions of active users in a distributed environment
What we require:
· Ability to use a wide variety of open source technologies and tools
· Extensive Kubernetes experience
· Extensive AWS experience
· Experienced with infrastructure as code (Terraform)
· Comfort with frequent, incremental code testing and deployment
· Strong grasp of automation tools
· Data management skills
· A strong focus on business outcomes
· Comfort with collaboration, open communication, and reaching across functional borders
· Experience with scripting languages (Ruby, Shell, Python)
· Must have experience with Linux System Administration
· Worked with SAAS technologies
· GCP familiarity - advantage