TARA项目：为AI安全领域输送人才的创新模式

Published on September 25, 2025 8:50 PM GMT

Summary

The AI safety field has a pipeline problem: many skilled engineers and researchers are locked out of full‑time overseas fellowships. Our answer is the Technical Alignment Research Accelerator (TARA) — a 14‑week, part‑time program designed for talented professionals and students who can’t put their careers or studies on hold or leave their families for months at a time. Our inaugural TARA cohort provided a cost‑effective, flexible model that achieved:

Exceptional Satisfaction

High Completion

Career Impact

Cost Efficiency

Access

In this report, we share key learnings, results and operational details so that others can replicate this success. We are seeking funding partners for another TARA cohort.

Theory of Change

The Problem

The AI safety field faces a critical talent pipeline bottleneck. While millions are becoming aware of AI risks (the 80,000 Hours documentary has almost reached 6 million views) and organizations like BlueDot plan to train 100,000 people in AI safety fundamentals over the next 4.5 years, there's a massive gap between awareness-level training and the technical expertise required to qualify for selective research fellowships.

BlueDot's online courses can't provide hands-on training in critical areas like reinforcement learning, evaluations, AI x cybersecurity, and mechanistic interpretability. Meanwhile, advanced programs like MATS - which has accelerated 450+ researchers over 3.5 years - require participants to already possess substantial technical expertise.

AI safety needs scalable programs that bridge the gap between introductory courses and elite research fellowships.

Why Current Solutions Fail

Existing technical training programs face fundamental constraints:

Full-time residential fellowships

Require some participants to quit jobs, (briefly) relocate internationally, and commit months of uninterrupted time - extremely difficult for professionals with careers, families, or financial obligationsAre bottlenecked by venue capacity and the large costs of flying participants internationally and providing accommodationCost $5,000-15,000+ per participant, limiting how many people can be trained with available funding

Online-only programs

These constraints mean we're losing talented engineers and researchers who could contribute to AI safety but can't access existing training pathways.

To be clear: TARA isn't meant to replace these models. Intensive residential programs remain invaluable for those who can access them. Instead, TARA serves a different cohort - talented professionals and students earlier in the funnel who need a flexible pathway to build technical skills before potentially pursuing more intensive opportunities.

TARA's Solution

TARA's mixed model - weekly in-person cohorts with remote expert support - partly solves these constraints in the ecosystem while maintaining 70%+ of the quality:

Part-time format over 14 weeks

Local cohorts

Remote teaching assistants

We believe TARA could scale to train 1,000+ technical AI safety practitioners annually, partly solving the middle-of-funnel bottleneck by preparing talent for advanced programs like MATS and LASR at the scale the field urgently needs.

Program Overview

This first iteration was run in Sydney and Melbourne from 1st March to 31st May 2025. Participants committed to 14 weeks of full-day Saturday sessions, as well as independent learning during the week. Mentorship was available across all 14 weeks. 19 out of 21 participants completed the program. We employed Dan Wilhelm as our remote Teaching Assistant. Each week, participants pair-programmed with a different member of their cities’ cohort.

The curriculum followed existing materials used for the ARENA programme, with permission from ARENA. We covered the following topics:

TopicWeeksDescriptionPrep Week 0Optional prerequisites study and an ice-breaker session.Foundations 1-3Build essential technical foundations, including optimisation, neural networks, and core ML concepts.Transformer Fundamentals and Mechanistic Interpretability4-8Dive deep into transformer architectures, mechanistic interpretability, and sparse autoencoders.Evaluations 9-11Focus on model evaluation techniques and design.Project Phase12-14Applying the program to create research projects, with support from the teaching assistant.

The course did not cover all of the ARENA material, as ARENA's 5-week full-time curriculum would have required 25 weeks in our part-time format. Instead, we focused on core areas of the material. We address this decision further in the improvements section.

Each Saturday session ran from 9:30 AM to 5:00 PM and typically involved:

An hour lecture from the Teaching Assistant on the new topicPair programming until lunchtimeOptional group lunchAfternoon pair programmingWeekly survey and day wrap-up

Participant Selection

There was a ~7 week turnaround between announcement of the program and the week 0 start date. We had a total of 46 applicants from which we selected the initial cohort of 21 participants. We used their applications as the primary source for selection and did not conduct interviews.

We required a $150 AUD bond from each participant to reduce attrition, returning $165 to program completers as a sign of good faith. We required 10 out of 14 in-person sessions and a project to complete the program and have the bond returned.

Participants had backgrounds ranging from full-time software engineers, students up to master's level, researchers in ML academia positions, and more.

Survey Results Overview

We conducted short weekly surveys and a longer final survey.

We summarised and produced plots from the multiple-choice questions, and analysed key themes in the open-ended responses.

We had an average completion rate of ~90% for the weekly surveys, and all program completers (19 out of 19) completed the final survey.

The surveys covered four main areas - Program Engagement, Course Material Engagement, Impact of Program and Program Improvements

Program Engagement

We are excited that TARA was rated highly by participants, receiving an average recommendation rate of 9.43 out of 10.

Participants were asked what they most enjoyed about TARA:

Participants expressed unanimous satisfaction with program management and communications. When asked to rate their satisfaction with program management and communications from the organisers, 100% of respondents reported being either very satisfied or extremely satisfied, with 68% rating their experience as extremely satisfied and 32% as very satisfied.

68% of the participants reported being “Extremely Satisfied” with the in-person locations.

TARA participants received project support from their peers, co-organisers, and the teaching assistant throughout the 3-week project phase. This support worked well, with 84% of participants reporting being very or extremely satisfied with the help they received.

Participants were also surveyed about their biggest challenges during the program:

We rotated pairs each week to maximize networking and expose participants to different learning styles. We expected this mix of experience levels would enhance learning. However, participants were not always fully engaged in their pairs and sometimes felt the mismatch in skill levels hindered their progress.

Participant feedback on TARA's 14-week duration was mixed. When asked about the program length, 37% felt it was about right, 37% considered it a little too long, and 26% thought it was a little too short. This suggests the 14-week format was a good length.

Course Material Engagement

We used the ARENA curriculum, with a 1-hour lecture at the start of each Saturday session to provide explanations and context. We adapted ARENA's daily modules into weekly sessions, covering approximately one ARENA day per TARA week.

Each week, we surveyed TARA participants on how confident they were with the previous week’s material. We asked, “How confident are you that you could explain this week's key concepts to a new TARA participant?”

Our plot shows that participants were moderately to very confident that they could explain concepts to a new participant. They had the most difficulty with week 8, the OthelloGPT section.

Outside of the Saturday sessions and during non-project weeks, participants spent an average of 3 hours 25 minutes per week studying, with this fluctuating across participants and the program.

Impact of Program

Participants were asked to share the impact that completing TARA had on them.

The structured format proved essential: participants estimated only a 41.8% average likelihood of completing the ARENA curriculum independently, with 9 out of 19 respondents rating their chances at 30% or lower without TARA's weekly sessions and peer support.

We also asked participants about their motivation to pursue a career in AI safety after TARA, compared with prior to the program. 15 out of 19 participants were more motivated, with 6 participants much more motivated. 1 student was less motivated, citing curriculum difficulty as the main reason for their reduced interest.

The 3-week project phase was a substantial part of TARA, as we wanted participants to use their new skills to produce a piece of work that helped them build AI safety skills and credibility. It was a requirement for TARA to complete the presentation. We encouraged participants to produce a blog post, GitHub repo, or other public-facing work as a final outcome.

Participants chose a range of projects, from Inspect Evals benchmark implementations, Neuronpedia inference improvements, exploring planning in LLMs, model interventions using steering and probes, OthelloGPT questions, investigating emergent misalignment refusals, developing an evaluation on model metacognition, and many more.

TARA has already helped participants advance their careers. One participant has accepted an offer for a research programme. Three others have secured part-time AI safety roles directly based on experience gained from their projects. Additionally, one participant is converting their project into a research paper about using linear probes to remove bioweapons knowledge from language models.

Possible Improvements

Recruitment & Selection

The primary opportunity for improving participant quality lies in attracting more applicants through broader marketing efforts. With a larger applicant pool, we could be more selective while simultaneously increasing the number of course completers. We could also implement a more rigorous selection process to better assess participants' coding abilities and commitment level. In future iterations, we would consider adding coding assessments and conducting interviews before acceptance.

Curriculum & Delivery

Participants requested more comprehensive onboarding resources and clearer direction before the core program began. While we provided a week 0 with an informal online meetup and prerequisite materials, this preparation phase was rushed and would benefit from greater formalisation.

We excluded the Reinforcement Learning chapter due to time constraints, though several participants specifically requested this content in their feedback. Improved curriculum structure could allow us to incorporate Reinforcement Learning without extending the overall program length.

Providing pre-scoped project proposals could increase the impact and quality of final project outcomes, making this approach worth testing in future iterations.

Participant Support

Adding more structure and coworking times during the week may help participants engage better with the material outside the Saturday sessions. Clearer expectations of exercise timings would help. So would removing the least useful programming problems in the ARENA curriculum.

We didn't provide enough support and connections after the program. While we provided resources and informal guidance, we should have guided participants toward firmer next steps to increase overall program success and better connect TARA as a pathway into other formal options like MATS, LASR and full-time roles.

Community Integration

Greater integration with the broader AI safety community could significantly enhance the program. This could include guest research talks, connections to ongoing projects, and introductions to researchers and organisations. Such engagement would help participants better understand career pathways and build professional networks within the field.

Expense Breakdown

The program cost $18,880 AUD, coming in under our $22,856 AUD budget. Future iterations will likely cost more, as volunteers currently provide key services, including cohort organisation and operations work that would need to become paid positions. However, securing free venues - as we did for this iteration - remains feasible and provides significant cost savings for programs of this type.

Here is the specific cost breakdown:

Teaching Assistant Salary: $10868.18Compute: $1111.62Operations: $1255.2214 x Weekly Lunches and Celebration Dinner: $4970.36Office Snacks: $340.44Office space: $0Other: $255.00

The average cost was $899.05 AUD per participant. $381.52 AUD would be the marginal cost to add another student, until the upper bound, where another teaching assistant would need to be hired. This is because our biggest costs are fixed.

For details on what we requested for funding, view our Manifund application: https://manifund.org/projects/help-launch-the-technical-alignment-research-accelerator-tara

Future Programs and Expansion

We're actively considering running future iterations of TARA in Australia and New Zealand, and potentially scaling the program to other regions. Our primary constraint is securing adequate funding for these expansions. If you are interested in supporting future TARA programs, please get in contact.

We also see value in others replicating TARA’s structure in different regions around the world. We encourage anyone looking to run a similar program to reach out to Ruben Castaing and Nelson Gardner-Challis, who can provide advice and resources.

Acknowledgements

This program was greatly supported by many different groups, funders and organisations.

Thanks to Ryan Kidd for funding this initial version of the program. Thanks to ARENA for giving us permission to use their curriculum for the program. Thank you to EA Melbourne for connecting us with Bellroy, who provided our Melbourne office space, and to The Sydney Knowledge Hub for providing our Sydney venue.

This report was produced by Nelson Gardner-Challis and Ruben Castaing, with assistance from Yanni Kyriacos and Dan Wilhelm.

Discuss