Director - Site Reliability Engineering (SRE) - Observability Platform & Tools
Company: Toyota Deutschland GmbH
Location: Plano
Posted on: March 2, 2025
Job Description:
OverviewWho we areCollaborative. Respectful. A place to dream
and do. These are just a few words that describe what life is like
at Toyota. As one of the world's most admired brands, Toyota is
growing and leading the future of mobility through innovative,
high-quality solutions designed to enhance lives and delight those
we serve. We're looking for diverse, talented team members who want
to Dream. Do. Grow. with us.An important part of the Toyota family
is Toyota Financial Services (TFS), the finance and insurance brand
for Toyota and Lexus in North America. While TFS is a separate
business entity, it is an essential part of this world-changing
company- delivering on Toyota's vision to move people beyond what's
possible. At TFS, you will help create best-in-class customer
experience in an innovative, collaborative environment.To save time
applying, Toyota does not offer sponsorship of job applicants for
employment-based visas or any other work authorization for this
position at this time.This position is an onsite role based in
Plano, TX.Who we're looking forToyota Financial Services is
launching a new Site Reliability Engineering (SRE) team, and we are
seeking a director to spearhead this initiative. As the director,
you will be responsible for building the SRE team from the ground
up and establishing robust processes to ensure the reliability,
performance, and scalability of our systems and applications.What
you'll be doing
- Supporting Engineers with hands-on coding, debugging, and
implementation of automation to support a more stable and robust
application environment.
- Fostering a collaborative team culture and supporting
professional development.
- Defining and implementing strategies for system reliability,
performance, and scalability.
- Developing Service Level Objectives (SLOs) and Service Level
Agreements (SLAs) aligned with business goals.
- Designing and deploying monitoring, alerting, and incident
management systems.
- Implementing and refining disaster recovery and business
continuity plans.
- Leading major incident responses and coordinating with
stakeholders for resolution.
- Conducting post-incident reviews and driving continuous
improvement.
- Identifying and implementing automation opportunities to
streamline operations.
- Overseeing the development and implementation of monitoring and
incident management tools.
- Working with engineering, product, and infrastructure teams on
reliability goals.
- Participating in architectural reviews, providing input on
reliability and scalability.
- Recruiting, building, and leading the new SRE team with clear
objectives and metrics.What you bring
- 7+ years of experience in Site Reliability Engineering, DevOps,
or a related field, with at least 3 years in a leadership
role.
- Demonstrated experience in building and managing teams, with a
proven track record of achieving high system reliability and
performance.
- Deep understanding of cloud platforms (e.g., AWS, GCP, Azure)
and container orchestration technologies (e.g., Kubernetes).
- Proficiency in scripting and automation (e.g., Python, Bash)
and familiarity with monitoring and logging tools (e.g.,
Prometheus, Grafana, ELK Stack).
- Strong leadership capabilities, with excellent problem-solving
and decision-making skills.
- Effective communication skills, with the ability to convey
complex technical concepts to diverse audiences.What we'll
bringDuring your interview process, our team can fill you in on all
the details of our industry-leading benefits and career development
opportunities. A few highlights include:
- A work environment built on teamwork, flexibility, and
respect.
- Professional growth and development programs to help advance
your career, as well as tuition reimbursement.
- Team Member Vehicle Purchase Discount.
- Toyota Team Member Lease Vehicle Program (if applicable).
- Comprehensive health care and wellness plans for your entire
family.
- Toyota 401(k) Savings Plan featuring a company match, as well
as an annual retirement contribution from Toyota regardless of
whether you contribute.
- Paid holidays and paid time off.
- Referral services related to prenatal services, adoption,
childcare, schools, and more.
- Flexible spending accounts.
- Relocation assistance (if applicable).Belonging at ToyotaOur
success begins and ends with our people. We embrace diverse
perspectives and value unique human experiences. Respect for all is
our North Star. Toyota is proud to have 10+ different Business
Partnering Groups across 100 different North American chapter
locations that support team members' efforts to dream, do and grow
without questioning that they belong.Applicants for our positions
are considered without regard to race, ethnicity, national origin,
sex, sexual orientation, gender identity or expression, age,
disability, religion, military or veteran status, or any other
characteristics protected by law.Have a question, need assistance
with your application or do you require any special accommodations?
Please send an email to talent.acquisition@toyota.com.
#J-18808-Ljbffr
Keywords: Toyota Deutschland GmbH, Plano , Director - Site Reliability Engineering (SRE) - Observability Platform & Tools, Engineering , Plano, Texas
Didn't find what you're looking for? Search again!
Loading more jobs...