NP
Posted 1 day ago
Linux Site Reliability Engineer
Networking People Limited
📍 Anderston
Information TechnologyRemoteHybridContract
Job description
Contract: Site Reliability Engineer (Linux Administration & Server Hardware) Location: Glasgow (hybrid<br><br>Read on to find out what you will need to succeed in this position, including skills, qualifications, and experience.<br>- 3 days onsite) Duration: 6 months Day Rate: Negotiable (Inside IR35 via umbrella solution) Reference: 20460 We are looking for an experienced Linux Site Reliability Engineer (SRE) to join a high-performing infrastructure support team focused on maintaining and improving critical platform reliability within a large-scale enterprise environment.<br><br>This position will focus on resolving hardware and platform-related incidents escalated from the L3 support team.<br><br>The successful candidate will have strong Linux systems expertise and, hands-on physical server troubleshooting experience, and a proactive approach to operational improvement, automation, and incident reduction.<br><br>Essential Skills / Requirements Strong Linux administration and troubleshooting skills (process, networking basics, logs, package/service management).<br><br>Solid understanding of server hardware and peripherals (disks, RAID/HBA, NICs, firmware) and how failures present at OS level.<br><br>Experience with out-of-band management / lights-out technologies (e.g., iDRAC, iLO, IPMI/Redfish) for remote troubleshooting and recovery.<br><br>Proven ability to own incidents end-to-end: triage, identify mitigations/workarounds, coordinate with L3/engineering, communicate status, and drive to resolution.<br><br>Understanding of SRE operational practices and metrics (e.g., SLO/SLI concepts, error budgets, MTTD/MTTR) and a continuous-improvement mindset.<br><br>Strong communication skills (written and verbal): clear incident updates, customer/stakeholder management, and effective escalation and handoffs.<br><br>Strong documentation skills: writing clear runbooks/procedures, contributing to knowledge bases, and participating in post-incident reviews/root cause analysis.<br><br>Nice to Have / Desired Skills Scripting and automation skills (e.g., Bash, Python) to build small tools, checks, and workflow automation that reduce toil.<br><br>Familiarity with virtualization and containerization concepts/operations (e.g., VMware/KVM, Docker, Kubernetes) and using automation to support these environments. xwzovoh <br><br>Experience with monitoring/observability and alerting workflows (dashboards, log analysis, alert tuning) and translating signals into actionable response steps Networking People (UK) is acting as an Employment Business in relation to this vacancy.