premium

HTEC

AI Performance Engineer

DockerLinuxPython

Iskustvo
Junior, Medior, Senior
Angažman
Puno radno vrijeme
Rok prijave
još 30 dana

O kompaniji

HTEC Group is a global consulting, software engineering, and digital product development company that empowers the world's best high-tech companies, disruptive startups, and global enterprises with innovative product design and sophisticated engineering services.
HTEC Group was founded in 2008 in Belgrade, Serbia and today has its global headquarters in San Francisco. The company has consultancy, innovation, and product design offices in Silicon Valley, New York, and London, with its technological heart spread across development centers in Central and Southeast Europe. Overall, HTEC employs more than 2,000 highly skilled professionals in 29 locations in 12 countries.
HTEC combines Silicon Valley-based design thinking with the best engineering talent to support global clients with complete digital product development, from strategy and conceptualization to digital product design and agile engineering at scale. The company possesses vast expertise across a multitude of domains, including Healthcare, Retail, Transportation and Smart Mobility, Logistics, FinTech, Green Energy, Media, and Deep Technology.

Opis posla

We are looking for a AI Performance Engineer to work on latest large AI model knowledge, deep learning performance optimization and benchmarking on modern GPU-based systems, with a strong focus on MLPerf Training and Inference workloads.
The primary models we work on include Llama 2, Llama 3, DeepSeek, and open-source GPT-style models (GPT-OSS).
This is a hands-on engineering role involving performance profiling, PyTorch optimization, large-scale distributed training, and building reproducible benchmarking environments, in close collaboration with other performance- and systems-focused engineers.
This role requires US working hours; Europe‑based candidates must start their shift no earlier than 2 PM CET, ideally after 4 PM CET to align with the West Coast.

What You Will Do

  • Optimize training and inference pipelines for large language models such as Llama 2, Llama 3, DeepSeek, and GPT-OSS
  • Work on MLPerf Training and/or Inference benchmarks for LLM workloads
  • Profile GPU workloads to identify compute, memory, and communication bottlenecks
  • Improve scaling efficiency across multi-GPU and multi-node setups
  • Tune distributed training strategies (DDP, FSDP, ZeRO, tensor/pipeline parallelism)
  • Build and maintain reproducible benchmark environments (Docker / Singularity)
  • Collaborate with engineers on performance, stability, and scalability improvements
  • Document findings and contribute to benchmark submissions and internal reports

Kvalifikacije

  • 1-2 year of AI engineering knowledge / Deep Learning, GPU, or HPC-related roles
  • Strong Python skills and solid experience with PyTorch
  • Hands-on experience with LLM training or inference (Llama, GPT-style models, or similar)
  • Experience with distributed training (DDP, FSDP, ZeRO, DeepSpeed, or equivalent)
  • Good understanding of GPU performance fundamentals (compute vs memory, profiling, optimization)
  • Experience working in Linux-based environments
  • Familiarity with container technologies (Docker or similar)
  • Good level of spoken and written English

Dodatne Informacije

Nice to Have (Strong Plus)

  • Experience working with MLPerf or other standardized benchmarking frameworks
  • Exposure to LLM optimization techniques (activation checkpointing, KV-cache optimization, sequence parallelism)
  • Experience with GPU profiling tools (torch.profiler, Nsight, or equivalent)
  • Knowledge of GPU kernel optimization (CUDA, HIP, Triton, or similar)
  • Experience working with job schedulers (Slurm or equivalent)
  • Familiarity with quantization or mixed precision (FP16, BF16, FP8)

Oglas istekao? 

Top IT poslovi u tvom inboxu

Pretplati se na Dzobs.com newsletter i jednom sedmično ti šaljemo najnovije poslove za odabranu poziciju.

Zanimanje...