HUIJZER.XYZ

An old solution to modern OpenAI GPTs problems

2024-01-28

Ever since the introduction of ChatGPT, OpenAI has had a compute shortage. This might explain their current focus on GPTs, formerly known as Plugins. Simply put, you can see GPTs as a way to wrap around the base language model. In a wrapping, you can give some instructions (a prompt), 20 files, and enable Web Browsing, DALLĀ·E Image Generation, and/or Code Interpreter. Also, you can define an Action, which allows the GPT to call an API from your own server.

At first sight the possibilities seem limited for developers. The code interpreter will only run Python code inside their sandbox. Furthermore, the interpreter has no internet access, so installing extra tools is not possible. You could spin up your own server and interact via the Actions (API calls), but that has some latency and requires setting up a server. Without spinning up a server, you could define some CLI script in Python and write in the instruction how to interact with that Python script. Unfortunately, this does limit the capabilities. Not all Python packages are installed in the sandbox and there is only so much that can be expressed in the instruction.

Surprisingly, we're actually not limited to the instruction for expressing code. You can upload Python files under "Knowledge". From the description, it looks like these files are only used for Retrieval-Augmented Generation (RAG). This is not true. The code interpreter will happily run those files too.

For example, create a Python script that raises the first input argument to the second power (x^2):

import sys

def pow(x):
    return x**x

if __name__ == '__main__':
    x = sys.argv[1]
    x = int(x)
    print(pow(x))

and upload this file as my_script.py below "Knowledge". Next, in the instruction, write something like this:

Run the number that the user gives into my_script.py

```python
!python my_script <USER INPUT>
```

This exclamation mark syntax allows us to run shell commands inside Python. When I then said "my number is 12" to the GPT, it responded with "The output of the script with the number 12 is 144." And there was a blue link [>_] showing that ChatGPT actually ran my code (and didn't guess it from reading the code). This is already quite expressive.

But it still is a bit restrictive. We're still restricted to the preinstalled Python packages. Or so I thought. We can run binaries too.

To do so, just upload an x86 binary and teach the GPT how to interact with it. For example, I've uploaded a x86 binary (specifically, typst) and added the following instruction:

Run typst via the code interpreter:

```python
!chmod +x /mnt/data/typst
!cd /mnt/data && ./typst --help
```

Then I just typed "run" as an user, and the GPT printed the --help instructions from the binary.

What this allows is making binaries available to users via the GPT store. The interaction with these binaries can then be described in the instruction, just like how API instructions are provided. For example, this allows making a CLI tool in Rust and then making the interaction with the tool very easy for users. You can just share a public GPT link. For example, here is a link to my typst-based GPT. The tool can even take user inputs such as PDFs or images and process these. All this without the data having to leave the server. You don't even have to run your own server. Furthermore, users can interact with the binary via text or speech. Pretty cool if you ask me!

Unfortunately, although I'm quite optimistic about the possibilities, I do have to mention some downsides too. A downside is that the sandbox takes about 10 seconds to run. The Python script takes less than a second, so this must be the time it takes to spawn a sandbox. Also, what was not nice is that GPT-4 is still limited even for paying users. I had to pause development a few times because I was making too many requests. Finally, there is the half-baked compliance with the European privacy regulations. GPTs and voice interactions are only available when "Chat History & Training" is enabled. If I would upload some personal data or sensitive data from our research in a GPT, then this data will end up in the training data! Currently, the only solution is to switch to ChatGPT Teams which is a few dollars more expensive than premium (this is okay), but only available for a minimum of 2 users (more difficult).