Genera codice Python "verificato" utilizzando gli agenti conversabili AutoGen | di Shahzeb Naveed | Aprile 2024 | Intelligenza-Artificiale

IOÈ l'aprile 2024 e sono passati circa 17 mesi da quando utilizziamo LLM come ChatGPT per aiutarci nella generazione di codice e nelle attività di debug. Sebbene abbia aggiunto un ottimo livello di produttività, ci sono effettivamente momenti in cui il codice generato è pieno di bug e ci fa prendere la buona vecchia strada StackOverflow.

In questo articolo, fornirò una rapida dimostrazione di come possiamo affrontare questa mancanza di “verifica” utilizzando gli agenti conversabili offerti da AutoGen.

Cos'è la generazione automatica?

“AutoGen è un framework che consente lo sviluppo di applicazioni LLM utilizzando più agenti che possono conversare tra loro per risolvere attività.”

Presentazione del risolutore di problemi LeetCode:

Inizia con l'installazione silenziosa dell'autogen:

!pip install pyautogen -q --progress-bar off

Sto utilizzando Google Colab, quindi sono entrato da OPENAI_API_KEY nella scheda Segreti e l'ho caricato in modo sicuro insieme ad altri moduli:

import os
import csv
import autogen
from autogen import Cache
from google.colab import userdata
userdata.get('OPENAI_API_KEY')

Sto usando gpt-3.5-turbo solo perché è più economico di gpt4. Se puoi permetterti una sperimentazione più costosa e/o stai facendo le cose in modo più “serio”, dovresti ovviamente utilizzare un modello più forte.

llm_config = {
"config_list": ({"model": "gpt-3.5-turbo", "api_key": userdata.get('OPENAI_API_KEY')}),
"cache_seed": 0,  # seed for reproducibility
"temperature": 0,  # temperature to control randomness
}

Ora copierò la dichiarazione del problema dal mio problema LeetCode preferito Due somme. È una delle domande più frequenti nelle interviste in stile leetcode e copre concetti di base come la memorizzazione nella cache utilizzando hashmap e la manipolazione di equazioni di base.

LEETCODE_QUESTION = """
Title: Two SumGiven an array of integers nums and an integer target, return indices of the two numbers such that they add up to target. You may assume that each input would have exactly one solution, and you may not use the same element twice. You can return the answer in any order.
Example 1:
Input: nums = (2,7,11,15), target = 9
Output: (0,1)
Explanation: Because nums(0) + nums(1) == 9, we return (0, 1).
Example 2:
Input: nums = (3,2,4), target = 6
Output: (1,2)
Example 3:
Input: nums = (3,3), target = 6
Output: (0,1)
Constraints:
2 <= nums.length <= 104
-109 <= nums(i) <= 109
-109 <= target <= 109
Only one valid answer exists.
Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?
"""

Ora possiamo definire entrambi i nostri agenti. Un agente funge da agente “assistente” che suggerisce la soluzione e l'altro funge da proxy per noi utenti ed è anche responsabile dell'esecuzione del codice Python suggerito.

# create an AssistantAgent named "assistant"SYSTEM_MESSAGE = """You are a helpful AI assistant.
Solve tasks using your coding and language skills.
In the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.
1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.
2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.
Solve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.
When using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.
If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.
When you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.
Additional requirements:
1. Within the code, add functionality to measure the total run-time of the algorithm in python function using "time" library.
2. Only when the user proxy agent confirms that the Python script ran successfully and the total run-time (printed on stdout console) is less than 50 ms, only then return a concluding message with the word "TERMINATE". Otherwise, repeat the above process with a more optimal solution if it exists.
"""
assistant = autogen.AssistantAgent(
name="assistant",
llm_config=llm_config,
system_message=SYSTEM_MESSAGE
)
# create a UserProxyAgent instance named "user_proxy"
user_proxy = autogen.UserProxyAgent(
name="user_proxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=4,
is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
code_execution_config={
"work_dir": "coding",
"use_docker": False,
},
)

Ho impostato il human_input_mode a “MAI” perché non ho intenzione di dare alcun input da solo e max_consecutive_auto_reply a 4 per limitare i turni avanti e indietro nella conversazione. All'agente Assistant è stato chiesto di rispondere con la parola “TERMINATE” che indica a UserProxyAgent quando concludere la conversazione.

Ora, il momento del divertimento! Inizieremo la conversazione inviando un messaggio dal nostro UserProxyAgent al nostro Assistente.

Un ulteriore vantaggio dell'utilizzo di AutoGen (anche per flussi di lavoro non agenti) è che fornisce funzionalità di memorizzazione nella cache esplicita per aiutarti a risparmiare sui costi API durante lo sviluppo. Qui, sto memorizzando nella cache le risposte sul disco ma puoi anche integrare Redis per questo scopo.

# Use DiskCache as cache
with Cache.disk(cache_seed=7) as cache:
# the assistant receives a message from the user_proxy, which contains the task description
chat_res = user_proxy.initiate_chat(
assistant,
message="""Solve the following leetcode problem and also comment on it's time and space complexity:\n\n""" + LEETCODE_QUESTION
)

Produzione:

user_proxy (to assistant):Solve the following leetcode problem and also comment on it's time and space complexity:
Title: Two Sum
...
Follow-up: Can you come up with an algorithm that is less than O(n2) time complexity?
--------------------------------------------------------------------------------
assistant (to user_proxy):
To solve this problem efficiently, we can use a dictionary to store the elements we have seen so far along with their indices. While iterating through the array, we can check if the complement of the current element (target - current element) is already in the dictionary. If it is, we have found the pair that sums up to the target.
Here's the Python code to solve the Two Sum problem with a time complexity less than O(n^2):
```python
# filename: two_sum.py
def two_sum(nums, target):
num_dict = {}
for i, num in enumerate(nums):
complement = target - num
if complement in num_dict:
return (num_dict(complement), i)
num_dict(num) = i
# Test the function with the given examples
nums1 = (2, 7, 11, 15)
target1 = 9
print(two_sum(nums1, target1))  # Output: (0, 1)
nums2 = (3, 2, 4)
target2 = 6
print(two_sum(nums2, target2))  # Output: (1, 2)
nums3 = (3, 3)
target3 = 6
print(two_sum(nums3, target3))  # Output: (0, 1)
```
The time complexity of this solution is O(n) because we iterate through the list of numbers only once. The space complexity is also O(n) because we store elements in a dictionary.
You can run this code and check if it provides the correct output. Let me know the total run-time, and if it's less than 50 ms, you can reply with "TERMINATE". If it's not, I can suggest a more optimal solution.
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
(0, 1)
(1, 2)
(0, 1)
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran successfully and provided the correct output for the given examples. Since the total run-time is not provided, I will suggest measuring the total run-time of the algorithm using the "time" library to determine if it's less than 50 ms. 
You can add the functionality to measure the total run-time by importing the "time" library and using it to calculate the elapsed time before and after calling the `two_sum` function. Here's how you can modify the code:
```python
# filename: two_sum.py
import time
def two_sum(nums, target):
num_dict = {}
for i, num in enumerate(nums):
complement = target - num
if complement in num_dict:
return (num_dict(complement), i)
num_dict(num) = i
# Test the function with the given examples and measure the total run-time
nums1 = (2, 7, 11, 15)
target1 = 9
start_time = time.time()
print(two_sum(nums1, target1))  # Output: (0, 1)
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
nums2 = (3, 2, 4)
target2 = 6
start_time = time.time()
print(two_sum(nums2, target2))  # Output: (1, 2)
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
nums3 = (3, 3)
target3 = 6
start_time = time.time()
print(two_sum(nums3, target3))  # Output: (0, 1)
end_time = time.time()
print("Total run-time:", (end_time - start_time) * 1000, "ms")
```
Please run this modified code and provide the total run-time for each example. If the total run-time is less than 50 ms for all examples, you can reply with "TERMINATE".
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
user_proxy (to assistant):
exitcode: 0 (execution succeeded)
Code output: 
(0, 1)
Total run-time: 0.01239776611328125 ms
(1, 2)
Total run-time: 0.00286102294921875 ms
(0, 1)
Total run-time: 0.0016689300537109375 ms
--------------------------------------------------------------------------------
assistant (to user_proxy):
The Python script ran successfully, and the total run-time for each example is significantly less than 50 ms. Therefore, you can reply with "TERMINATE". The algorithm provided an efficient solution with a time complexity of O(n) and a space complexity of O(n).
--------------------------------------------------------------------------------

Questo è ciò che sta accadendo qui:

UserProxyAgent chiede all'Assistente di risolvere il problema in base alla descrizione dell'attività.
L'Assistente suggerisce una soluzione con un blocco Python
UserProxyAgent esegue il codice Python.
L'Assistente legge l'output della console e risponde con una soluzione modificata (con funzionalità di misurazione del tempo. Onestamente, mi sarei aspettato subito questa soluzione modificata, ma questo comportamento può essere ottimizzato tramite la progettazione tempestiva o utilizzando un LLM più potente).

Con AutoGen puoi anche visualizzare il costo del flusso di lavoro degli agenti.

chat_res.cost


({'total_cost': 0,
'gpt-3.5-turbo-0125': {'cost': 0,
'prompt_tokens': 14578,
'completion_tokens': 3460,
'total_tokens': 18038}}

Osservazioni conclusive:

Pertanto, utilizzando gli agenti conversabili di AutoGen:

Abbiamo verificato automaticamente che il codice Python suggerito dal LLM funzioni effettivamente.
E ha creato un framework mediante il quale LLM può rispondere ulteriormente a errori di sintassi o logici leggendo l'output nella console.

Grazie per aver letto! Per favore seguimi e iscriviti per essere il primo quando pubblico un nuovo articolo! 🙂

Dai un'occhiata agli altri miei articoli:

Fonte: towardsdatascience.com