A new benchmark for frontier AI
Intelligence is not one-dimensional. AGI will be multi-dimensional. New research is needed.
LMIQ is composed of a diverse set of novel challenges which are designed to leverage multi-dimensional capabilities of human intelligence. The gap between human performance and leading AI systems remains significant.
These are preliminary benchmark beta results with wide confidence intervals. AI scores were produced on test sets composed of 50 challenges, ranging in difficulty. Humans were tested on the same challenge set.
Scaling alone will not produce AGI.
New research is needed.