PlanoRecruiter Since 2001
the smart solution for Plano jobs

Director - Site Reliability Engineering (SRE) - Observability Platform & Tools

Company: Toyota Deutschland GmbH
Location: Plano
Posted on: March 2, 2025

Job Description:

OverviewWho we areCollaborative. Respectful. A place to dream and do. These are just a few words that describe what life is like at Toyota. As one of the world's most admired brands, Toyota is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve. We're looking for diverse, talented team members who want to Dream. Do. Grow. with us.An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America. While TFS is a separate business entity, it is an essential part of this world-changing company- delivering on Toyota's vision to move people beyond what's possible. At TFS, you will help create best-in-class customer experience in an innovative, collaborative environment.To save time applying, Toyota does not offer sponsorship of job applicants for employment-based visas or any other work authorization for this position at this time.This position is an onsite role based in Plano, TX.Who we're looking forToyota Financial Services is launching a new Site Reliability Engineering (SRE) team, and we are seeking a director to spearhead this initiative. As the director, you will be responsible for building the SRE team from the ground up and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications.What you'll be doing

  • Supporting Engineers with hands-on coding, debugging, and implementation of automation to support a more stable and robust application environment.
  • Fostering a collaborative team culture and supporting professional development.
  • Defining and implementing strategies for system reliability, performance, and scalability.
  • Developing Service Level Objectives (SLOs) and Service Level Agreements (SLAs) aligned with business goals.
  • Designing and deploying monitoring, alerting, and incident management systems.
  • Implementing and refining disaster recovery and business continuity plans.
  • Leading major incident responses and coordinating with stakeholders for resolution.
  • Conducting post-incident reviews and driving continuous improvement.
  • Identifying and implementing automation opportunities to streamline operations.
  • Overseeing the development and implementation of monitoring and incident management tools.
  • Working with engineering, product, and infrastructure teams on reliability goals.
  • Participating in architectural reviews, providing input on reliability and scalability.
  • Recruiting, building, and leading the new SRE team with clear objectives and metrics.What you bring
    • 7+ years of experience in Site Reliability Engineering, DevOps, or a related field, with at least 3 years in a leadership role.
    • Demonstrated experience in building and managing teams, with a proven track record of achieving high system reliability and performance.
    • Deep understanding of cloud platforms (e.g., AWS, GCP, Azure) and container orchestration technologies (e.g., Kubernetes).
    • Proficiency in scripting and automation (e.g., Python, Bash) and familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack).
    • Strong leadership capabilities, with excellent problem-solving and decision-making skills.
    • Effective communication skills, with the ability to convey complex technical concepts to diverse audiences.What we'll bringDuring your interview process, our team can fill you in on all the details of our industry-leading benefits and career development opportunities. A few highlights include:
      • A work environment built on teamwork, flexibility, and respect.
      • Professional growth and development programs to help advance your career, as well as tuition reimbursement.
      • Team Member Vehicle Purchase Discount.
      • Toyota Team Member Lease Vehicle Program (if applicable).
      • Comprehensive health care and wellness plans for your entire family.
      • Toyota 401(k) Savings Plan featuring a company match, as well as an annual retirement contribution from Toyota regardless of whether you contribute.
      • Paid holidays and paid time off.
      • Referral services related to prenatal services, adoption, childcare, schools, and more.
      • Flexible spending accounts.
      • Relocation assistance (if applicable).Belonging at ToyotaOur success begins and ends with our people. We embrace diverse perspectives and value unique human experiences. Respect for all is our North Star. Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong.Applicants for our positions are considered without regard to race, ethnicity, national origin, sex, sexual orientation, gender identity or expression, age, disability, religion, military or veteran status, or any other characteristics protected by law.Have a question, need assistance with your application or do you require any special accommodations? Please send an email to talent.acquisition@toyota.com.
        #J-18808-Ljbffr

Keywords: Toyota Deutschland GmbH, Plano , Director - Site Reliability Engineering (SRE) - Observability Platform & Tools, Engineering , Plano, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

Plano RSS job feeds