



About the Role
Senior DevOps Engineer / DevOps Manager needed at one of our Investment Technology clients! Hybrid 2 - 3x per week in office in NYC only. Base $200k - $275k plus bonus. Must have 8+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure roles and 2+ years of team leadership experience. MUST also have experience within Financial Services or preferably Investment Technology firms. No contractors, no C2C - this is a full time W2 role only. No visa sponsorship available. Second interview will be in person so don't apply if you can not meet in person!
Our client is seeking a Senior DevOps Engineer / DevOps Manager to own and evolve the CI/CD, infrastructure automation, and reliability backbone that supports a large-scale financial services platform. You’ll play a critical role in designing and managing highly available Azure environments, ensuring continuous delivery, operational efficiency, and proactive monitoring across a globally distributed engineering organization.
This is a hands-on leadership role — balancing deep technical work with team guidance, process optimization, and collaboration. You’ll partner closely with the Tech Lead – Platform Engineering, feature leads, and engineering managers to ensure the platform delivers on uptime, security, and performance — all while maintaining speed and precision.
Key Responsibilities
1. Infrastructure & Cloud Operations
• Design, implement, and manage scalable and resilient Azure cloud infrastructure across multiple environments (dev, staging, production).
• Oversee Infrastructure as Code (IaC) deployments using Terraform, ARM templates, or similar frameworks.
• Drive cost optimization and resource utilization efficiency without compromising performance or security.
• Ensure platform reliability through proactive capacity planning, monitoring, and incident management.
• Partner with the Platform team to maintain 24x7 uptime, disaster recovery, and high availability standards.
2. CI/CD & Automation
• Architect, implement, and optimize CI/CD pipelines using Azure DevOps, GitHub Actions, or equivalent.
• Automate build, deployment, and testing workflows to accelerate delivery with consistency and quality.
• Maintain a Zero-Downtime Deployment philosophy, ensuring robust rollback and validation mechanisms.
• Integrate static analysis, security scanning, and automated testing into the deployment process.
• Continuously improve developer productivity through tooling, scripts, and automation.
3. Monitoring, Observability & Reliability
• Implement monitoring and alerting systems (Azure Monitor, Grafana, Prometheus, or equivalent).
• Define and track SLOs, SLAs, and SLIs for system performance, reliability, and response times.
• Lead incident management — from detection to resolution and post-incident reviews.
• Build observability frameworks that provide actionable insights across the stack.
• Drive a culture of proactive issue detection, root-cause analysis, and continuous improvement.
4. Security, Governance & Compliance
• Partner with InfoSec to maintain a secure infrastructure posture aligned with SOC 2, ISO 27001, and financial data regulations.
• Implement and maintain access controls, secrets management, and encryption policies.
• Integrate security checks and compliance validation into the CI/CD process.
• Ensure data protection, compliance readiness, and secure configurations across environments.
5. Collaboration & Delivery Enablement
• Work closely with feature and platform teams to support smooth releases, scalable deployments, and rapid issue resolution.
• Drive clear communication and alignment on operational priorities, risks, and timelines.
• Serve as the go-to technical lead for infrastructure and operational troubleshooting.
• Balance structure with agility — accelerating delivery without adding unnecessary overhead.
• Collaborate with the Tech Lead and Engineering leadership on the DevOps roadmap and strategy.
6. AI & Innovation
• Explore and implement AI-driven automation and monitoring solutions to enhance system reliability and developer productivity.
• Create proofs of concept (POCs) for self-healing pipelines, intelligent alerting, and predictive performance tuning.
• Champion adoption of AI-assisted engineering tools for smarter, faster DevOps operations.
Qualifications:
Required:
• Bachelor’s degree in Computer Science, Engineering, or related field.
• 8+ years of experience in DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure roles.
• 2+ years of team leadership experience
• Strong expertise with Microsoft Azure services (compute, networking, storage, and Azure SQL).
• Hands-on experience with CI/CD tools (Azure DevOps, GitHub Actions, Jenkins, etc.).
• Proficiency in scripting and automation (PowerShell, Python, Bash, or similar).
• Deep understanding of monitoring, observability, and incident management.
• Proven ability to drive high productivity with attention to detail and timeliness.
• Exceptional organizational skills, balancing multiple priorities effectively.
• Strong collaboration and communication abilities across global, distributed teams.
• Experience in Financial Services or FinTech environments (highly regulated systems).
Preferred:
• Experience with microservices, Kubernetes, and container orchestration.
• Familiarity with API management, Redis, and event-driven architectures.
• Knowledge of AI/ML for DevOps (predictive alerts, automated anomaly detection).
• Certifications in Azure (e.g., Azure Solutions Architect, DevOps Engineer Expert).
Key Traits for Success
• Hands-on Leadership: Stays close to systems and code, guiding the team through practical, real-world problem solving.
• Organizational Discipline: Structured, methodical, and proactive in managing deployments, risks, and timelines.
• Delivery Mindset: Accelerates engineering delivery through automation and clarity — not bureaucracy.
• Detail-Oriented Execution: Ensures accuracy and completeness in every deployment and configuration.
• Strong Communicator: Provides crisp, transparent updates to engineers, leaders, and stakeholders.
• Proactive & Predictive: Foresees issues before they escalate; plans to avoid downtime and failures.
• Collaborative Partner: Works seamlessly with platform, product, and security teams toward shared goals.
• Calm Under Pressure: Maintains composure and control during production incidents or tight timelines.
• Innovative Technologist: Leverages AI, automation, and modern tooling to simplify complexity.
• Continuous Learner: Constantly improves systems, processes, and personal mastery.