Attempting to improve the code generation tool
Published in · 9 minutes of reading · 18 houses
--
In previous articles (1,2), we found that LLMs can generate and execute strings of encryption instructions - but often get stuck on errors, especially those related to package installation.
I wanted something similar to the Langchain Python REPL, but instead:
- Allowed to save the generated source code to a file.
- The agent is allowed to install pip packages.
- Executing source code with full output/tracing enabled
- Allowed use of linters and appropriate sampling of several generations of code.
- It provides the ability to fully delegate code generation to a tool, so the master agent doesn't necessarily need to be good at coding (although the model used in the code still needs to be good at coding).
To meet these requirements, I opened my own library,code-it - still in early development, missing some documentation and more testing. However, the basic functionality works and I would like to demonstrate it.
In this article we discuss:
- General architecture/idea of the library
- The inner workings of packaging
- Summary and next steps
Remark:this library doesn't support the OpenAI API because I only used native LLMs - it can be easily extended to support those models.
You can findlibrary at this link.
Basically it's a very simple library. It uses a collection of queries to query the completion of the LLM, executes basic control logic to build an encryption solution.
Contractor of the contract
This takes up almost the entire box. This is simple code that calls each LLM in turn and cleans the output to use for some purpose.
It also initializes the code editor class that contains the virtualenv manager. The code editor provides simple methods like updating code, running code, etc.
For example, two convenient methods replace the source code:
def overwrite_code(self, new_source: str):
new_lines_of_code = [line by line in new_source.split("\n") as line]
self.source_code = new_lines_of_code
self.save_code()
zwróć self.display_code()
And code execution:
def run_code(self, *args, **kwargs):
interrupted_process = subprocess.start(
[self.interpreter, self.filename], capture_output=Istina, timeout=10
)print(process_terminated, process_terminated.stderr)
succeeded = "Uspjeh" if complete_process.returncode == 0 else "Neuspjeh"
stdout = finished_process. standard output
stderr = finished_process.stderr
return f"Program {uspjeh}\nStdout:{stdout}\nStderr:{stderr}"
Take a look at the source code of this class if you want to understand it better.
Planner-instruction
The first step in implementation is the creation of an action plan. This is the most important part of the planner query:
You are an AI master at planning and breaking down coding work into smaller manageable chunks.
You have been given a task, help us to think step by step.First, let's see an example of what we expect:
Task: Download the content of the endpoint "https://example.com/api" and save it to a file
Steps:
1. I should import the application library
2. I should use the requests library to fetch the results from the endpoint "https://example.com/api" and store them in a variable called response
3. I should access the response variable and parse the content by decoding the JSON content
4. I should open a local file in write mode and use a json library to output the results.
In short, we take a user request and break it down into smaller tasks.
Let's see it in action with an example:
Your task is to make jokes about cats and save them in a file called "cat_jokes.txt". Be creative!
This is the scheduler output:
[
"1. Import any library.",
'2. Define a function to generate a random cat joke.',
'3. Define a function that writes a cat joke to a file.',
"4. Call the save function after generating the joke."
]
This helps LLM to actually define the function and define the dependencies. Let's look at another example that I copied and pasted from one of my senior langchain agents (hence the strange statement).
Task
Your job is to plot an example plot using matplotlib. Create your own random data.
Run this code only when you're done.
DO NOT add any code or follow a single step.
Generated draft
[
"1. I should import the necessary libraries: matplotlib and random.",
'2. I should define the data points for the chart.',
'3. I should create a function that generates random data.',
"4. I should create a function to plot the graph.",
'5. Finally, I should mention the plot.
]
Follow the addiction
Now that we have the blueprint, we can feed it into the next LLM so that it can get the necessary dependencies from the blueprint. This is what the query looks like:
You are an AI master at understanding code.
You will receive a task plan, help us find the necessary Python packages to install.First, let's see an example of what we expect:
Plan:
1. import the request library
2. use the request library to get the content pattern
3. analyze the results in the dictionary
4. save the dictionary to a file
Requirements:
ask
END OF THE PLANNING PROCESS
Looking at the example matplotlib blueprint, we can see that there are libraries defined in the blueprint:
"1. I should import the required libraries: matplotlib and random."
Since the blueprint calls a random library, the dependency extractor will think it's also a python package to install.
So we need a magic check to reject poorly extracted requests:
dep = self.dependency_tracker.execute_task(plan="\n".join(plan))
for d u d:
d = d.replace("-", "").strip()
I find:
d = d.split(" ")[0]jeśli self.config.check_package_is_in_pypi:
url = f'https://pypi.org/project/{d}'
res = request.get(url)
such as res.status_code != 200:
passage
as len(d) < 2 of d in DEPENDENCY_BLACKLIST:
Buckle up
dependencies.add(d)
This tries to restore the output format, checks that the package exists and that pypi is not blacklisted (egcoincidental
Lubjson
moduli).
virtualenv-manager
After analyzing the requirements, we can install using our virtualenv manager while creating a virtual environment for our code editor/executor:
for dependencies within dependencies:
self.code_editor.add_dependency(ovisnost)self.code_editor.create_env()
proces = self.code_editor.install_dependencies()
if process.return code != 0:
logger.error("Unable to install dependencies for: %s", "\n".join(dependencies))
try += 1
FromVirtualenvManager
itself is a very simple class that uses the power of a methodvirtual
package:
application import
string import
accidental import
import us
imported wire
z importu virtualenv cli_runlogger = logging.getLogger(__name__)
RANDOM_NAME_LENGTH = 16
the VirtualenvManager class:
def __init__(self, name: str = "", base_path="/tmp") -> Rem:
if not name:
inside = ""
for _ in range (RANDOM_NAME_LENGTH):
population = ascii_letter.string + number.string
char = random.sample(population, k=1)
name += character [0]
name = name
self.path = os.path.join(base_path, ime)
self.python_interpreter = os.path.join(self.path, "bin/python3")
addiction. separated = []
def add_dependency(self, dependency):
logger.info("Add dependency '%s' ", dependency)
self.dependencies.append(ovisnost)
def create_env(self):
logger.info("Create virtualenv on '%s'", self.path)
cli_run([self.path], setup_logging=False)
def install_dependencies(self):
logger.info("Installation Dependencies")
process = subprocess.run(
[self.python_interpreter, "-m", "pip", "install"] + self.dependencies,
capture_output=Prawda,
)
return trial
Generate code
We are now ready to run the coder query to generate the code. Please note that this query is of very low quality! However, it seems to work quite well in context.
You are an expert in Python programming, an AI agent. You solve problems using Python code,
and you can provide code snippets, debugging, and more on request. It is usually given to you
existing source code that is poorly written and contains many duplicates. You should improve this by refactoring and debugging.You must fulfill your role in the following example:
Goal: Write code that displays "Hello World".
Plan: 1. Call the print function with the "hello world" parameter.
Source code:
import us
import us
import us
print('Hello World')
Thought: The code contains duplicate and unused imports. Here is the improved version.
new code:
print('Hello World')
that:
Note that after completing the subtask you need to add the word "Target:" on a new line,
as in the example above.
You should ALWAYS display the full code.
Note that we insert the source task as "Goal" and "Plan" generated by the Scheduler. This helps us ensure that a developer isn't trying something completely different that would require different packages.
With the result we can save it to a local file using our code editor and try to remove the markup because the model sometimes wraps the codePython
tip
zelf.code_editor.overwrite_code(nowy_code)
_trim_md(self.code_editor)
Lining and sampling
Now that we have the source code, we can use linter to see how bad our code is. Here is the relevant excerpt:
with pylint import pylint as ribbon(...)
# Start pylint
(pylint_stdout, _) = lint.py_run(self.code_editor.filename, return_std=True)
pylint_stdout = pylint_stdout.getvalue()
pylint_lines = pylint_stdout.splitlines()
# Extract output from pylint
linting_score_str = Rem
for lines in pylint_lines:
if PYLINT_SCORE_SUBSTRING is OK:
division_1 = line.division(PYLINT_SCORE_SUBSTRING)[1]
linting_score_str = split_1.split("/")[0]
# If we can't extract the kernel, we probably have a syntax error, so we set the result to -1
if not linting_score_str:
logger.warn(f"Unable to parse pylint stdout: {pylint_stdout}")
score = -1 # The code probably won't compile
# Otherwise, complete the conversion of the result to a floating point number
anders:
wynik = float(linting_score_str)
As a result, in the records we see:
2023-05-18 14:34:35,523 [INFO] Coding example: 0 (Temperature: 0.0)
2023-05-18 14:34:40.259 [INFO]
import matplotlib.pyplot kao plt
accidental import
def generate_data():
return [random.randint(1, 10) before and in range(10)]
def plot_chart(daan):
plt.plot(dane)
plt.xlabel('X-as')
plt.ylabel('Y-as')
plt.title('Example Chart')
plt.show()
data = generate_data()
plot_chart(daan)2023-05-18 14:34:40,259 [INFO] Linter application...
2023-05-18 14:34:40.558 [INFO] Sample result: 0.83
2023-05-18 14:34:44,189 [INFO]
By default we take three code samples. We do this every time by increasing the temperature and regenerating a new plan. Note that we don't try to reinstall dependencies for simplicity, even though that might result in errors (although the sample would then be filtered by -1).
This is our second generated plan/example:
2023-05-18 14:34:55,841 [INFO] Plan analyzed: ['1. Import the required libraries: matplotlib and random.', '2. Define the data points as a list of tuples. Each tuple contains x and y values.', '3. Define chart type as array.', '4. Create a figure and a subplot.", "5. Use the scatter function to plot the data points.', '6. Use the xlabel and ylabel functions to label the axes.', '7. Use the title function to give the chart a title.', '8. Display the chart with the show function.', '9. Clean up the figure and subplot.']
2023-05-18 14:34:55,842 [INFO] Code sample: 2 (Temperature: 0.2)
2023-05-18 14:35:01.783 [INFO]
import matplotlib.pyplot kao plt
accidental import
def generate_data():
return [random.randint(1, 10) before and in range(10)]
def plot_chart(daan):
plt.plot(dane)
plt.xlabel('X-as')
plt.ylabel('Y-as')
plt.title('Example Chart')
plt.show()
plt.text(50, 90, "Random", font-size = 16)
data = generate_data()
plot_chart(daan)2023-05-18 14:35:01,783 [INFO] Linter application...
2023-05-18 14:35:02.096 [INFO] Example of result: 1.54
2023-05-18 14:35:02,096 [INFO] Highest sample score: 1.54
So the code is mostly the same, but for some reason pylint decided to have a higher score. The third example is just repeating with the same result.
Code execution
Okay, now that we have the monsters, let's sort and pick the one with the highest score.
Then we write it back to the file and run!
coding_samples.sort(key=lambda x: x["wynik"], reverse=True)
hoogste_score = sample_coding[0]
logger.info("Example highest result: %s", highest_result["result"])
self.code_editor.overwrite_code(hoogste_score["code"])
ako ne self.config.execute_code:
zwróć self.code_editor.display_code()wynik = self.code_editor.run_code()
In this example, the execution was successful (not always the case)!
CompletedProcess(args=['/tmp/2Gh5EfN6MNxahZcI/bin/python3', 'persistent_source.py'], returncode=0, stdout=b'', stderr=b'/home/paolo/code-it/persistent_source.py: 10: User warning: Matplotlib currently uses agg, which is not a GUI backend, so it cannot display the image.\n plt.show()\n') b'/home/paolo/code-it/persistent_source. py: 10: User Warning: Matplotlib is currently using agg, which is not a GUI backend, so it cannot display the image.\n plt.show()\n'
2023-05-18 14:35:02,439 [INFO] Code generation complete!
2023-05-18 14:35:02,439 [INFO] Source code works!
Therefore, this library sketch shows that successfully using LLM for complex tasks is not as easy as it seems at first glance. That's why libraries love himconductivity
were developed:https://github.com/microsoft/guidance
I mean thiskod-to
went in a similar direction, but not as well structured and flexible as Microsoft's library.
Integration would be a good next stepconductivity
Radekod-to
or try a whole new approach using it.
It would be very interesting to index local code with builtin so that this package can reuse local code snippets to help generate code.
One thing that didn't work as well as I expected was sampling different patterns based on Linter results, so maybe there are better ways:
- Increasing the variety of monsters
- Use a more efficient code evaluation tool
- Learn how to use lint in your code instead of sampling
However, I think there is a lot of potential in what can be achieved with these models - time will tell how good these autocoding tools will be :).
FAQs
How does LangChain work? ›
At its core, LangChain is a framework built around LLMs. We can use it for chatbots, Generative Question-Answering (GQA), summarization, and much more. The core idea of the library is that we can “chain” together different components to create more advanced use cases around LLMs.
What is the difference between chain and agent in LangChain? ›Whereas a chain defines an immediate input/output process, the logic of agents allows a step-by-step thought process. The advantage of this step-by-step process is that the LLM can work through multiple reasoning steps or tools to produce a better answer.
What is the use case of LangChain? ›One of the most common use cases for LangChain and LLMs is summarization. You can summarize any piece of text, but use cases span from summarizing calls, articles, books, academic papers, legal documents, user history, a table, or financial documents.
What is an agent in LangChain? ›Agents generally refer to the idea of using a language model as a reasoning engine and connecting it to two key components: tools and memory. Tools help connect the LLM to other sources of data or computation. Examples of tools include search engines, APIs, and other datastores.