Skip to Main Content

Systems Engineer III or IV

Posting Details

Posting Details

Posting Number S05354P
Position Title Systems Engineer III or IV
Functional Title HPC Platform Engineer
Department Information Technology-Cyber Infrastructure & Research
Salary Range Up to $115,000 Annually
Pay Basis Monthly
Position Status Regular full-time
Location Richardson
Position End Date (if temporary)
Posting Open Date 05/08/2024
Posting Close Date
Open Until Filled Yes
Desired Start Date 06/10/2024
Job Summary
This position is responsible for designing, provisioning, deploying, administering, monitoring, maintaining, troubleshooting, upgrading and patching of University high performance computational (HPC) resources and related research services. The engineer will demonstrate customer service mindset and adapt with agility to different work styles, conflict resolution techniques, and personal office etiquette. The engineer will demonstrate the ability to interact with employees and stakeholders in a positive, productive, technically but appropriately stratified manner. The applicant must be self-motivated and they stay abreast of applicable new technologies and technical methodologies to advance their productivity and career path. This engineer will lead project teams with junior engineers and produce effective, timely results equally well on individual projects and in team projects of all sizes. The engineer will have a comprehensive understanding of HPC solutions, architecture, and life cycles to design reliable, best-practice based enterprise class solutions. The engineer will have additional responsibility to document processes, procedures, system configurations, services and to place configuration information within our configuration management systems.

System Engineer III – Minimum Qualifications:
Bachelor’s Degree with four (4) years related experience OR Associate degree with six (6) related experience OR High School or equivalent with eight (8) years related experience.
  • Salary Grade 016
    • Up to $100,000

System Engineer IV – Minimum Qualifications:
Bachelor’s Degree in related field with (6) years direct work experience in information technology environments. An equivalent combination of education & experience may be acceptable.
  • Salary Grade 018
    • Up to $115,000

The vacancy will be filled at level III or level IV, depending on the qualifications of the candidate. The ideal candidate must be able to build collaborative working relationships with various internal and external stakeholders and have knowledge of DevOps practices and tools.
Minimum Education and Experience
• Bachelor’s Degree in related field 
 
• Six years direct work experience in information technology environments 
 
• Ability to build collaborative working relationships with various internal and external stakeholders 
 
• Knowledge of DevOps practices and tools 
 
• An equivalent combination of education & experience will be accepted
Preferred Education and Experience
  • Master’s degree in Computer Science or equivalent with four years of experience in corresponding research services, support efforts, products and technologies.
  • Current knowledge of HPC best practice and systems deployment and maintenance.
  • Troubleshooting methodology and awareness of industry standards.
  • Excellent interpersonal, written, and verbal communication skills are a must.
  • Demonstrate strong technical documentation, architecture diagramming, and organizational skills.
  • Ability to multitask at high volume and with high detail and prioritize considering varied scope, scale, and technical requirements.
  • Solid understanding of data center operations fundamentals in networking and power
  • Extensive Linux administration and networking skills are highly required.
  • Experienced in supporting on-premises and code storage platforms, abilities supporting and administrating operating system (Multiple Linux Versions) and ability to apply security policies to platforms and integrates new hardware into our HPC framework.
  • Ability to package scientific software into RPMs, containers (and integrate with Lmod—so users can `module load <software>`).
  • Experience with at least two high performance cluster operating systems such as OpenHPC, ROCKS, Bright/Nvidia Cluster Manager
  • Experience with large scale high performance parallel file storage systems such as WEKA, VAST, GPFS, BGFS, CEPH.
  • Experience in supporting and operating 1Gbps – 100Gbps Ethernet and 56Gbps – 200 Gbps Infiniband HPC network interconnects.
  • Experience with: Open source and commercial research related software, Python, R, Matlab, Mathworks, Julia, Ansys, Intel, nVidia cuda and GCC compilers.
  • Experience with all related dev ops tools such as GitHub, GitLab, Ansible, package management tools for rpm and or deb package building.
  • Deep experience with SLURM job scheduler
  • Familiarity with architecture and operations on HPC systems in cloud (ex: AWS, Azure)
  • Familiarity with Apptainer/Singularity HPC
  • Familiarity with national level academic HPC resources such as those found at TACC, SDSC, NCSA and PSC.
  • Familiarity with national level HPC Research Computing organizations such as XSEDE, ACCESS et al.
Essential Duties and Responsibilities
Expected areas of expertise and duties will include current proficiency in the following:
  • Lead project teams with junior engineers and produce effective, timely results equally well on individual projects and in team projects of all sizes.
  • Responds to user tickets from faculty and students. Level 3 support experience at scale of 1 to 3 with 3 being a senior specialist.
  • Act as a role model in demonstrating integrity and ethical behavior in working with confidential and university information.
  • Assists in development and implementation of internal policies, rules, and operation procedures for Research Computing and Cyber infrastructure to guarantee various assurance models such as NIST 800-53 and NIST 800-171 under which assured research is conducted.
  • Performs annual updates, expert level software coding (prefer Python, Linux Shell, etc.) in at least two or more languages.
  • Support network technologies (routing, switching, firewalls, etc.) in the HPC environment
  • Independent installation, configuration, updating, networking, performance monitoring and troubleshooting of HPC Systems.
  • Ability to develop, troubleshoot, modify, catalog, document, and update scripts.
  • Ability to package scientific software into RPMs and integrate with Lmod—so users can `module load <software>`
  • Able to compile, test and install many related open source scientific software packages as requested by research faculty, staff and students.
Physical Activities
Working Conditions
Additional Information
  • On-call availability for quickly responding to and resolving system emergencies, both during regular and emergency off-hours.
  • Emergency on-call rotation availability for 24×7×365 coverage.
  • Hybrid Remote Work Available for Texas Residents with further discussion and agreement.
  • Sitting for extended periods of time. Dexterity of hands and fingers to operate a computer keyboard, mouse, power tools, and to handle other computer components. Lifting and transporting of moderately heavy objects, such as servers, switches, computers, and peripherals.
  • Visa sponsorship is not available.
Special Instructions Summary
Important Message
1) All employees serve as a representative of the University and are expected to display respect, civility, professional courtesy, consideration of others and discretion in all interactions with members of the UT Dallas community and the general public.

2) The University of Texas at Dallas is committed to providing an educational, living, and working environment that is welcoming, respectful, and inclusive of all members of the university community. UT Dallas does not discriminate on the basis of race, color, religion, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, national origin, disability, genetic information, or veteran status in its services, programs, activities, employment, and education, including in admission and enrollment. EOE, including disability/veterans. The University is committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities. To request reasonable accommodation in the employment application and interview process, contact the ADA Coordinator. For inquiries regarding nondiscrimination policies, contact the Title IX Coordinator.

Supplemental Questions

Required fields are indicated with an asterisk (*).

  1. What is your experience level with High Performance Computational resources & services?
    • No Response
    • Beginner 0-2 years
    • Intermediate 3-5 years
    • Advanced 5+ years
  2. What is your experience with process documentation?
    • Beginner (0-2 years)
    • Intermediate (2-5 years)
    • Advanced (5+ years)
  3. Describe your experience as it relates to your capability in providing written and oral communication to Information Technology resource consumers and stakeholders.

    (Open Ended Question)

  4. Describe your experience working with research Pl's on computational cyberinfrastructure needs

    (Open Ended Question)

  5. Describe your experience with high performance networking as it relates to high performance computing.

    (Open Ended Question)

Applicant Documents

Required Documents
  1. Resume
  2. Cover Letter/Letter of Application
Optional Documents
  1. Veteran Employment Preference - Form DD-214