At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.
Cadence is a leading player in the Electronic Design Automation (EDA) and IP Creation industry, providing software, tools, and IPs which help to build nearly all of today’s newest and advanced electronics. Core to the development of our products is a large scale distributed HPC Farm, or hybrid cloud, on premise and AWS (Amazon Web Services). Hundreds of thousands of jobs run in this environment daily to develop and test our products. Maintaining, enhancing, monitoring and improving its efficiency is central to our business. The successful candidate will have proven DevOps experience with compute farms, job submission/management technologies, cloud, hybrid-cloud, and associated monitoring, reporting, and management tools.
- Work under the operational strategy for internal High Performance Compute (HPC) farms in all Cadence locations for supporting R&D users
- Work in the Global server farm team and meet operations SLA.
- Maintaining, enhancing, monitoring, reporting, and improving farm efficiency.
- Undertake complex projects in a constantly changing environment and complete them on or before the deadline.
- 10+ years technical experience architecting, maintaining and managing a compute farm environment running Linux.
- Proven experience working directly with R&D software development teams to collaboratively develop solutions to optimize their working environment (Direct EDA experience desired)
- Proven experience in capacity and performance management, optimizing performance, ensuring adequate capacity, working with R&D on optimization of their workloads, and development and maintenance of key performance indicators
- At least 5-7 years’ experience managing IBM LSF in a Farm environment. Sun Grid Engine/PBS Pro experience is also desirable
- Solid understanding and proven operational experience with compute farms, job submission/management technologies, cloud, hybrid-cloud, and associated management tools.
- Experience maintaining and managing a compute farm environment running Linux.
- Proven experience with Linux system performance tuning and issue troubleshooting.
- Extensive technical knowledge covering shell scripting, build and management automation using tools such as jumpstart, kickstart, Chef or Puppet.
- Experience working with storage, network and data center management teams to provide client solutions, resolve incidents and problems, and actively address capacity and performance management issues
- At least 5 years working in a global group, coordinating support, strategies, projects and operations across multiple geographies in a team oriented approach
- A proven process focus shown through documentation, change management, incident management and problem resolution activities
- Excellent verbal and written communication skills including presentations and documentation
We’re doing work that matters. Help us solve what others can’t.
- BS / MS in computer science or IT related field