Internship in Reliable Large Scale AI Infrastructures...
Huawei’s TTE RAMS Lab is a corporate competence center responsible for researching high reliability and high safety architecture as well as technologies for complex intelligent system; Our goal is to provide Huawei products with cutting-edge researches and advanced technical solutions on intelligent reliability and safety for carrier grade ICT and safety critical systems such as autonomous driving so that our products provide our customers with best user experiences and performance.
Position Overview:
We are seeking a highly motivated and talented student intern to join our cutting-edge research team focused on large-scale reliable AI infrastructures. This position will emphasize ensuring the robustness and reliability of training and inferencing for large language models (LLMs). The ideal candidate will engage in both theoretical and practical research aimed at overcoming challenges related to scaling AI systems while maintaining reliability, resilience, and efficiency across various AI workflows.
As part of the team, you will have the opportunity to work at the forefront of AI infrastructure, addressing critical issues like fault tolerance, data and model consistency, distributed AI system infrastructure, and the optimization of machine learning pipelines.
Key Responsibilities:
- Conduct advanced research on scaling and ensuring reliability in the training and inferencing of large language models (LLMs);
- Assisting development of innovative methodologies for enhancing the fault tolerance and resilience of AI infrastructures;
- Collaborate with internal teams to design and implement algorithms that ensure the robustness of AI systems in production environments;
- Publications in relevant conference and workshop.
Required Qualifications:
- Strong academic background in Computer Science, Engineering, or a related field (PhD/Masters, Bachelor with exceptional backgrounds);
- Demonstrated interest or experience in large-scale distributed systems for AI;
- Solid understanding of deep learning principles and techniques, particularly as applied to large language models (LLMs);
- Proficiency in programming languages such as Python, C/C++, or equivalent and experiences in system scripting such as Bash, Perl, sed&awk;
- Familiarity with deep learning frameworks (TensorFlow, PyTorch, etc.), understanding of the underlying low level technical details will be a big plus;
- Knowledge of fields related to Operating Systems (Linux-based), distributed computing, cloud infrastructure;
- Excellent problem-solving skills, analytical thinking, and attention to detail;
- Ability to work collaboratively in a multidisciplinary team environment and communicate complex technical concepts effectively. Preferred Qualifications:
- Previous experiences of internship in industry would be a plus;
- Familiarity with the challenges and best practices in training very large neural networks;
- Background in systems engineering, cloud architectures, or high-performance computing (HPC);
- Knowledge of tools and technologies for distributed training (e.g., Horovod, DeepSpeed, Slurm, Ray, etc);
- Prior research or industry experience in AI model reliability, system fault tolerance, or similar areas.
- Previous experiences of internship in industry would be a plus;
Huawe i is a leading global information and communications technology (ICT) solutions provider. Driven by a commitment to operations, ongoing innovation, and open collaboration, we have established a competitive ICT portfolio of end-to-end solutions in Telecom and enterprise networks, Devices and Cloud technology and services. Our ICT solutions, products and services are used in more than 170 countries and regions, serving over one-third of the world's population. With 197,000 employees, Huawei is committed to develop the future information society and build a Better Connected World.
Please send your application and CV (incl. cover letter and reference letters) in English.
Empfohlene Jobs
(IT) Bürokaufmann (m/w/d) -- Vertriebsinnendienst (m/w/d) -- IT-Systemkaufmann (m/w/d)
„Wir sind ein seit 13 Jahren, vor allem im deutschen Markt etabliertes IT-Systemhaus und suchen für unsere Geschäftsstellen in Altötting und/oder München weiter Verstärkung. Unsere nationalen wie …
Tester für ein Infusionsmanagementsystem (m/f/d)
Ihre Aufgaben: Analysis and familiarization with the functional requirements of a complex software system Derivation of practical test scenarios based on technical specifications Implement…
Hochschulpraktikant Soziale Arbeit (m/w/d) - für den Bereich Freiwilligendienste
Wir sind... der Fachbereich Freiwilligendienste der Caritas München-Freising e.V. Wir begleiten jedes Jahr über 200 Jugendliche und (junge) Erwachsene während ihres Bundesfreiwilligendienstes (BFD…
Oberärztin / Oberarzt für Kardiologie, Rhythmologie und internistische Intensivmedizin (w|m|d)
Menü Platz 3518 im Klinikranking 13% -28% im Vergleich zum bundesweiten Durchschnitt Was uns ausmacht: 54%Praktisches Jahr (PJ) & Famulatur 35%Team & Struktur 22%Freizeit 21%Fortb…
Associate Director - Cloud Workload Analysis and Migration (m/f/x)
Who We Are At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – al…
Off-cycle Internship 4Q25 - Real Estate Investment Management
Background Fundament Advisory Fundament Advisory in an independent investment manager focusing on German residential assets. The senior team sports more than 70 years of combined experience in …
Schlosser/Schweißer (m/w/d)
Warum gerade wir? Übertariflicher Stundenlohn plus Schichtzulage. Jahressonderzahlung wie Urlaubs- und Weihnachtsgeld. Hervorragende Übernahmechance durch den Kunden. Was erwartet Sie? …
Patentanwaltsfachangestellte/r (m/w/d) ab sofort gesucht
Zeit für etwas neues! Trittst Du auf der Stelle? Das Hamsterrad dreht sich und nichts tut sich? Gehaltlich bist Du schon längst keinen Schritt mehr voran gekommen? Worauf wartest Du dann noch? Uns…
(Senior) Consultant Future Banking (all genders)
Was erwartet dich? Du entwickelst innovative Lösungen im Banking, von der Businessanalyse bis zur Umsetzung in digitalen Projekten. Du bringst deine bankfachliche Expertise zu Finanzprodukten, P…
Medizintechniker, IT-System-Elektroniker, Mechatroniker, Elektroniker (m/w/d)
Medizintechniker, IT-System-Elektroniker, Mechatroniker, Elektroniker (m/w/d)06.08.2025 LMU Klinikum München Weitere passende Anzeigen: Jobmailer Ihre Merkliste / Mit Klick auf…