工作內容
Architecture and Reference Designs
• Own ZB reference architectures for liquid-cooled AI data centers, including direct-to-chip and hybrid approaches, and provide clear decision frameworks for when each architecture is appropriate.
• Define rack-level and row-level cooling topology (CDUs, manifolds, supply and return routing, redundancy philosophy) and produce buildable specifications.
• Partner with electrical engineering counterparts to define end-to-end power architecture (medium voltage interface where applicable, transformers, switchgear, UPS, PDUs, busway, grounding, and redundancy).
• Translate AI workload and GPU platform requirements into thermal and electrical design targets, including transient behavior, ramp profiles, and failure modes.
Engineering Validation and Testing
• Develop test plans for cooling loop performance, leak integrity, pressure and flow stability, and heat rejection efficiency under representative AI loads.
• Define acceptance criteria for component suppliers and system integrators, including FAT and SAT procedures, instrumentation requirements, and documentation standards.
• Establish reliability and maintainability standards: isolation valves, bypass loops, drain and fill procedures, service clearances, and spare strategy.
• Create incident playbooks for thermal excursions, pump failures, flow alarms, and power events. Ensure procedures are realistic and operator-friendly.
Deployment, Commissioning, and Operations Support
• Support commissioning and handover for deployments. Validate that thermal and electrical systems meet design intent before scaling workloads.
• Troubleshoot cross-discipline issues: hot spots, uneven flow distribution, unstable differential pressure, air entrainment, sensor drift, and control loop tuning.
• Work with operations teams to define preventive maintenance schedules, calibration routines, water quality management, and filter strategy to protect IT equipment.
• Build operational dashboards and telemetry requirements in partnership with software teams to ensure early detection and fast root-cause analysis.
Vendor Management and Cost-Performance Optimization
• Evaluate suppliers on performance, reliability, lead time, serviceability, and total cost of ownership. Maintain an approved vendor list and qualification criteria.
• Negotiate and enforce engineering deliverables: drawings, BOM transparency, testing evidence, warranty terms, and service response SLAs.
• Drive cost and efficiency improvements through design simplification, standardization, and repeatable modules without compromising uptime and safety.
• Ensure compliance with relevant standards and good practices for data centers and liquid cooling systems, including safety, labeling, and documentation.
Documentation and Knowledge Building
• Produce clear, version-controlled engineering documentation: specifications, one-line diagrams, P&IDs, commissioning checklists, and SOPs.
• Train internal teams and partners on ZB standards, including installation best practices and common failure modes.
• Contribute to ZB’s customer-facing technical collateral where appropriate, ensuring accuracy and credibility.