Resume

Jeremy Martinez

Senior Site Reliability Engineer · Incident Commander · Platform Automation

Las Vegas, NV · (720) 310-5673 · mrhits777@gmail.com

Professional summary

Senior Site Reliability Engineer with 20+ years designing, operating, and scaling mission-critical infrastructure across cloud and hybrid environments. Proven leader in automation, observability, incident command, and reliability engineering with a consistent record of eliminating toil, improving uptime, reducing cloud spend, and strengthening production resilience. Deep experience serving as Incident Commander for high-traffic platforms and enterprise operations organizations.

Core skills

Observability, Incident Response & ITSM: Datadog, SysDig, Prometheus, Grafana, Sumo Logic, PagerDuty, Rootly, ServiceNow, Jira
Cloud & Platform Engineering: AWS, Azure, GCP, Kubernetes, OpenShift
Infrastructure as Code & Automation: Terraform, Helm, Ansible, Jenkins, Argo CD
Reliability, Networking & Resilience: Load Balancing, Traffic Engineering, DDoS Mitigation, High Availability, Disaster Recovery
Storage & Distributed Data: MySQL, PostgreSQL, Ceph, NFS, iSCSI, Veritas VCS
Operating Systems: Linux (RHEL, CentOS, Ubuntu), Solaris
Programming & Scripting: Python, Bash, Shell, Perl, PHP, Java

Experience

Dynascale Inc.

03/2024 – Present

Senior Site Reliability Engineer · Incident Commander & Responder

▹Architect and operate highly available cloud platforms across AWS, Azure, and GCP supporting multiple client production environments.
▹Serve as senior Incident Commander for customer and platform incidents — triage, mitigation, escalation, and post-incident remediation.
▹Improved observability through custom alerting pipelines and real-time telemetry integration.
▹Reduced cloud spend through reserved instances, autoscaling optimization, and rightsizing.
▹Automated infrastructure lifecycle with Terraform, CloudFormation, and Ansible — cutting deployment lead time and operational risk.
▹Reduced manual operational intervention by 30% through automation, self-healing workflows, and standardization.
▹Lead disaster recovery strategy: backup validation, failover testing, and incident response playbooks.
▹Developing agentic AI automation pipelines for system administration and self-healing remediation across Hyper-V, AWS, and Azure.
▹Mentor engineers on reliability engineering, automation practices, and production ownership.

Upstart Inc.

07/2022 – 02/2024

Senior Site Reliability Engineer / Incident Commander

▹Served as Incident Commander for enterprise production incidents, coordinating engineering, operations, and executive stakeholders during major outages.
▹Owned Rootly configuration and operational workflows for incident lifecycle management.
▹Implemented standardized incident response processes improving consistency and MTTR.
▹Delivered weekly reliability metrics and incident analytics to executive leadership.
▹Built runbooks, playbooks, and incident simulation exercises to improve organizational readiness.
▹Led blameless postmortems and translated incident insights into durable corrective actions.

eBay Inc.

10/2011 – 07/2022

Production Unix Systems Engineer / MTS / Incident Responder

▹Recognized with a Critical Talent Bonus for high-impact contributions to incident management and operational automation.
▹Senior Incident Responder and escalation owner for large-scale production incidents impacting global e-commerce platforms.
▹Designed automation eliminating 90% of manual operational toil for a 12-person team.
▹Maintained uptime SLA of 99.997% across critical services.
▹Supported availability of a 10,000+ node Hadoop cluster.
▹Administered Veritas VCS clusters supporting high-availability Oracle environments.

New Frontier Media Inc.

08/2008 – 10/2011

Systems Engineer

▹Designed and operated high-traffic streaming platforms supporting national broadcast distribution.
▹Deployed MongoDB, Redis, Nginx, Node.js, VMware, and Xen environments.
▹Implemented clustered MySQL architectures and high-availability outbound e-mail systems.

Hit Director Domains, Comtech, ISSG, The Internet Web Hosting Company

Pre-2008

Earlier Career (Condensed)

▹Built and operated high-traffic Linux server farms and web hosting platforms.
▹Designed secure streaming systems, PCI-compliant infrastructure, and hardened production environments.
▹Led migrations, performance tuning, vulnerability remediation, and infrastructure modernization.

US Army

04/1993 – 08/1997

Communications Center Operator

▹Operated secure telecommunications systems within classified environments (TS/SCI).
▹Maintained satellite and fiber communication systems and cryptographic equipment.
▹Awarded multiple Army Achievement Medals and leadership honors.

Education

Management / Computer Information Systems
Park University, Parkville, Missouri
162 credit hours completed

Certifications

Incident Response Certification, PagerDuty (2024)
Monitoring AWS Certification, Datadog (2023)
Sumo Logic Fundamentals and Search Mastery (2022)
Cisco CCNA (expired)
Security+