Site Reliability Engineer
A Site Reliability Engineer is responsible for maintaining, monitoring, automating processes, handling emergencies and incidents, troubleshooting, managing risk, and building scalable systems across all our software products for Device Management, IoT and WiFi VAS & Indoor Location. Our main customers are Tier 1 Telecom players and big enterprises all around the world.
In the role of a Site Reliability Engineer you will have a chance to create and implement a leading technology, work in various environments for clients from all parts of the globe with various types of implementation and maintenance scenarios.
This is a high complexity role and we take it seriously. Are you up for the challenge?
Scope of responsibilities/tasks:
- Automate installation, configuration and maintenance tasks.
- Develop and maintain Linux installations, network infrastructures and configuration procedures.
- Participate in research leading to recommendations for system and process improvements.
- Perform system monitoring to verify availability and integrity of application and server resources.
- Perform backup operations as needed.
- Handle on-going maintenance, e.g. hot fixes, software upgrades, OS patching, etc.
- Design and configure Linux security and hardening of servers
- Respond to alerts from monitoring services and investigate if necessary
Typical working day:
- Install / upgrade / reconfigure network, OSes, packages or our services – create documentations or guides or update them with changes
- Configure and improve our monitoring metrics and tools
- Find workarounds and solutions to problems in monitored installations (for example based on the monitoring system)
- Discuss and analyse solutions for the client requirements (for example with Developers Team) to ensure the reliability of the introduced solution
- Join postmortem or similar meetings to minimize the reappearance of the same errors
- Develop and integrate tools that can improve our internal procedures, make our work more efficient, comfortable and ensure reliability for our company and clients;
- Plan and apply introduced changes into our internal system as well as in clients’ systems
The technologies we use:
Docker and Kubernetes and a lot of virtualization in the infrastructure part. We have a deployments on bare metal, too.
Scala, Java, MongoDB, Redis, SBT, Kafka and many more used in our products,
...and how we test: - code review, unit tests, Selenium, performance tests – run automatically. Of course, we also have a QA Team :)
What we are looking for:
- 2-4 years of experience in systems administration.
- Strong Unix (Ubuntu, RedHat) and network knowledge.
- Strong scripting skills: python, bash.
- Strong knowledge of system tools (wireshark, nslookup, nc, openssl, etc.).
- Expertise with container orchestration and/or virtualization (Kubernetes, Docker, KVM, oVirt).
- Experience with noSQL databases.
- Experience with Load Balancers (physical and/or software).
- Experience with maintaining Java application.
- Knowledge of Software Development Lifecycle.
- Experience with Continuous Integration and Delivery Environments.
- Fluent Polish and English (B2+).
Nice to have:
- Expertise with monitoring systems like Zabbix, Prometheus, Elasticsearch.
- Experience with AWS, Azure Cloud or Google Cloud.
- Experience with Ansible, Salt, Helm.
- Experience with MongoDB, Redis.
- Knowledge of other foreign languages: Russian, French, Spanish, German.
What we offer:
- Flexible working hours.
- Participating in the creation of a modern product, which will keep you up to date with emerging technologies.
- Work in a team of professionals.
- Expanding your skills in working with developers, testers and people responsible for business.
- Multisport card.
- Company’s own parking and bike room.
- A kitchen full of snacks.
- Working in a leading Polish technology company on software products used by companies around the world.
- A relaxed work atmosphere – no dress code, no open space.