keskival.github.io

Recursive Self-improvement Suite

A suite of open-ended, non-imitative tasks involving generalizable skills for large language model chatbots and agents to enable bootstrapped recursive self-improvement and an unambiguous AGI.

The current generation of LLMs are trained in an imitative fashion, the main task is auto-regressive text prediction for data written by humans. In this task, the model is effectively penalized if it behaves more intelligently than the behavior present in the training data. The hypothesis is that the current large language models are only using a small part of their capacity for intelligent behavior, because human-level cannot be significantly surpassed with imitative tasks. This is why most quantitative benchmarks show the current generation of LLMs asymptotically approach the human level, but not significantly exceed it.

We have done imitative objective swapping before in narrow AI deep learning models, for example in AlphaGo, but also in uncountably many other models. AlphaGo was first trained imitatively with grandmaster games, and only after the objective was swapped to a self-competitive objective it significantly surpassed the human level.

What sorts of tasks do we need?

Any task which involves a large volume of generalizable skills, and for which the solutions can be evaluated to be better or worse than other reference solutions. Programming is such a task. So is playing chess.

As we now have LLM chatbots which are able to evaluate the solutions to very complex natural language tasks from different perspectives, as a panel of LLM judges, the pool of tasks we have available is vast. We can in effect bootstrap recursive self-improvement by closing the loop and evaluating the act of evaluation as well as one task.

The tasks can be roughly categorized into groups:

Procedurally evaluated (e.g. chess) / LLM test case evaluated (e.g. programming) / LLM judge evaluated (e.g. negotiation)
LLM assistant tasks (e.g. question answering, technical design) / LLM agent tasks (e.g. social interaction, multi-step and open world tasks)

These tasks should be used to fine-tune a pre-trained LLM chatbot which has been instruct-tuned.

Some notes about fine-tuning process:

Fine-tuning with these open-ended “unleashed” tasks need to be interlaced with traditional LLM tasks and all other tasks of different kinds to prevent catastrophic forgetting of baseline knowledge and skills.
“Unleashed” tasks need to be prefixed with tokens forming the word “UNLEASHED:” so that the LLM understands that this task is evaluated in an open-ended fashion and it should not try to emulate human-level behavior. This prefix should be used in the trained model use cases where superhuman performance is desired.
In most tasks, a set of LLMs or a single LLM with a non-zero temperature needs to be used to produce multiple possible solutions, answers or trajectories, and regardless of which method is used to produce the ranking of these solutions, a contrastive method should be used to fine-tune the model so that the relative generation likelihood of the best generation sequence increases in relation to the worse generation sequences. For example Direct Preference Optimization can be used, or any reinforcement learning algorithm.
Most tasks are based on generating a large pool of heterogeneous challenges, problems or questions to answer.

Tasks to be Implemented

Programming
- Generate programming challenges and related unit tests in various languages and simulated deployment environments and integrations.
- Make the LLM also rank the challenges and the unit tests.
- Make the LLM also rank the rankings.
- See also: Code Llama
Social games
- Generate multi-agent social games.
- Make the LLM rank the player performances, or generate procedural rules to determine the winner.
- Make the LLM also rank the games based on how rich and challenging they are, and how many generalist skills they require.
- Make the LLM also rank the rankings.
- See also: AgentBench

Reference

Recursive Self-improvement Suite

@article{keskival2023recursive,
  title={Recursive Self-improvement Suite},
  author={Keski-Valkama, Tero},
  year={2023}
}

References

How to Contribute

Just make a PR. Making a PR is an acknowledgement that the contribution can be added as-is or in a modified form to the codebase. There is no transfer of copyright, but making a PR is an acknowledgement of granting a general MIT licence to the contributed code. Add yourself to the LICENCE`.

keskival.github.io

Recursive Self-improvement Suite

Tasks to be Implemented

Reference

References

Related Posts

How to Contribute