High Performance Computing Operations Engineer

Location US-IL-Evanston
Requisition Post Information* : Posted Date 1 month ago(10/19/2023 12:45 PM)
Job ID
2023-34068
# of Openings
1
Job Category
Information Technology
Level 2
NFP Research

 

NFP UL Research Institutes

At UL, we know why we come to work.

We have an exciting opportunity for a High-Performance Computing Operations Engineer at UL Research Institutes, based in our Evanston, IL office. The High-Performance Computing Operations Engineer is responsible for ensuring the day-to-day operation, management, and optimization of HPC infrastructure. The High-Performance Computing (HPC) Operations Engineer collaborates with researchers, scientists, and other technical teams to understand their requirements and collaborates with the IT team to implement and augment the infrastructure to suit the needs of the organization.

 

Underwriters Laboratories

At UL Research Institutes (UL), we wake up every day with a common purpose: to make the world a safer, more secure, and sustainable place. Science is in our DNA; we are endlessly curious and passionate about seeking and speaking the truth. We take delight in knowing that our work makes a meaningful contribution to society, and we are proud that our culture is centered on integrity, collaboration, inclusion, and excellence. UL stands at the forefront of technological advancement, and we are continually challenged to find new ways to foster innovation and positive change. Satisfying? Yes. Exciting? Absolutely!

What you’ll learn & achieve:

As the High-Performance Computing Operations Engineer, you will play a key role in the rapid growth of UL Operations as you:

  • Implement and manage job scheduling systems to ensure resource allocation and maximize utilization, allocating computing resources efficiently to various user groups and projects.
  • Collaborate with researchers and developers to address application-related challenges and optimizations, and provide technical support to HPC users, assisting them with job submission, troubleshooting, and optimization.
  • Respond to and resolve technical incidents and service requests related to the HPC environment. Diagnose and troubleshoot complex issues, coordinating with relevant teams for efficient resolution.
  • Manage job queues, prioritize tasks, and allocate computing resources according to user requirements, optimizing job scheduling parameters to ensure efficient utilization of available resources.
  • Create and maintain documentation for system configurations, troubleshooting procedures, and best practices. Conduct training sessions for users on HPC usage, software tools, and best practices.
  • Provide support and assistance to users regarding code optimization, debugging, and parallelization techniques, collaborating with researchers and application developers to optimize codes for parallel execution on HPC systems.
  • Analyze system performance metrics, identify bottlenecks, and recommend optimization strategies, collaborating with the HPC Infrastructure team to enhance overall system performance.
  • Stay current with advancements in HPC hardware, software, and methodologies. Evaluate new technologies and tools that could enhance the organization's HPC capabilities.
  • Perform other duties as directed.

What makes you a great fit:

While no one candidate will embody every quality, the successful candidate will bring many of the following professional competencies and personal attributes:

  • Strong knowledge and experience in managing and supporting HPC systems in a production environment.
  • Proficient in Linux/Unix system administration, shell scripting, and basic programming concepts.
  • Familiar with job scheduling systems (e.g., Slurm, Torque/PBS) and resource management.
  • Strong problem-solving skills and the ability to diagnose and resolve technical issues efficiently.
  • Excellent communication skills and the ability to interact effectively with both technical and non-technical audiences.
  • Knowledge of parallel programming concepts and optimization techniques.
  • Strong attention to detail and ability to create and maintain accurate documentation.
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and interpersonal skills.
  • Ability to work collaboratively in a multidisciplinary research environment.

Professional education and experience requirements for the role include:

  • Bachelor's degree in computer science, Computer Engineering, or a related field. Master’s degree preferred.
  • Minimum 6 years of experience as a System Engineer or Administrator in an HPC environment.
  • Experience with virtualization and containerization technologies (e.g., Docker, Kubernetes) is preferred.

What you’ll experience working at UL:

  • Mission: For UL, corporate and social responsibility isn’t new. Making the world a safer, more secure, and sustainable place has been our business model for the last 128 years and is deeply ingrained in everything we do.
  • People: Ask any UL employee what they love most about working here, and you’ll almost always hear, “the people.” Going beyond what is possible is the standard at UL. We’re able to deliver the best because we employ the best.
  • Interesting work: Every day is different for us here as we eagerly anticipate the next innovation that our customers create. We’re inspired to take on the challenge that will transform how people live, work and play. And as a global company, in many roles, you will get international experience working with colleagues around the world.
  • Grow & achieve: We learn, work, and grow together with targeted development, reward, and recognition programs as well as our very own UL University that offers extensive training programs for employees at all stages, including a technical training track for applicable roles.
  • Total Rewards: All employees at UL Research Institutes and UL Standards & Engagement are eligible for bonus compensation. We offer comprehensive medical, dental, vision, and life insurance plans. a generous 401k matching structure of up to 5% of eligible pay. Additionally, we invest an additional 4% into your retirement saving fund after your first year of continuous employment. Depending on your role, you can work with your manager on flexible working arrangements. We also provide employees with paid time off including vacation, holiday, sick and volunteer time off.

Learn More:

UL Research Institutes is a nonprofit organization dedicated to advancing safety science research through the discovery and application of scientific knowledge. We conduct rigorous independent research and analyze safety data, convene experts worldwide to address risks, share knowledge through safety education and public outreach initiatives, and develop standards to guide safe commercialization of evolving technologies. We foster communities of safety, from grassroots initiatives for neighborhoods to summits of world leaders. Our organization employs collaborative and scientific approaches with partners and stakeholders to drive innovation and progress toward improving safety, security, and sustainability, ultimately enhancing societal well-being.

 

Our wholly owned subsidiary, UL Solutions, advances our shared public safety mission. We fund our work through grants, the licensing of standards documents and the business activities of UL Solutions, which conducts testing, verification, and certification, and provides training and advisory services, along with data-driven reporting and decision-making tools for customers around the world. To learn more, visit our websites UL.org and ULSE.org.

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed