Arrrlex

Follow

Alex McKenzie Arrrlex

Follow

I'm an AI Safety engineer with a Maths and Philosophy background.

10 followers · 1 following

Berlin
alexmck.com

Achievements

Achievements

Arrrlex/README.md

Hi there 👋

I'm Alex McKenzie. I care a lot about mitigating risks from Artificial Intelligence. I also care about music, Effective Altruism, and cats.

I work at AE Studio on alignment.

I blog occasionally at alexmck.com.

Research

Detecting High-Stakes Interactions with Activation Probes — a paper on using activation probes to efficiently detect high-stakes LLM interactions, with accompanying codebase
Endogenous Resistance to Activation Steering in Language Models — a paper on how LLMs can internally resist task-misaligned activation steering, with accompanying codebase

Popular repositories Loading

models-under-pressure models-under-pressure Public

Code for the paper "Detecting High-Stakes Interactions using Activation Probes"

Python 10 2
bdqm-hyperparam-tuning bdqm-hyperparam-tuning Public

Code for Hyperparameter Optimization project for "Big Data & Quantum Mechanics" (Andrew Medford)

Jupyter Notebook 1 1
alexwebsite alexwebsite Public

My personal website

Python
SymbolicMathematics SymbolicMathematics Public

Forked from facebookresearch/SymbolicMathematics

Deep Learning for Symbolic Mathematics

Python
poetry poetry Public

Forked from python-poetry/poetry

Python dependency management and packaging made easy.

Python
at-python-template at-python-template Public

Forked from at-gmbh/at-python-template

The official Python Project Template of Alexander Thamm GmbH

Python