Build AI Code Generation Tools For Large Scale Project in Python? Part 3 - Development Diary and Discussion

Building AI Code Generation Tools from Scratch in Python 3: A Journey from Zero to Everything (Part 3)

Welcome back! If you missed the previous parts, you can find them here:

  • Part 1: Covered the basic workflow, fundamental code generation, directory creation, file generation, and folder automation.
  • Part 2: Addressed the limitations of the prompts and Large Language Models (LLMs) used in Part 1.

In this installment, we'll focus on the compilation pipeline.

Disclaimer:

I won't be sharing the complete project code directly. My goal is to encourage discussion and collaboration.

  • For Beginners: Use this as a guide to build your own project and join the conversation. I'm happy to help with any learning challenges.
  • For Experienced Developers: Please share your insights and discuss potential improvements.

Recap

In Part 1, we established the foundation for the compilation phase. This phase is crucial for identifying syntax errors, warnings, and coding inconsistencies. Our workflow will be as follows:

  1. Obtain the compilation report.
  2. Address issues from the report, one by one.

Compilation Report

Ideally, the compilation report is generated as output from the compile command. Examples include:

  • C++: g++ ...
  • Flutter: flutter analyze
  • JavaScript: eslint

We can leverage LLMs to group problems and generate a JSON report. Potential errors include:

  1. Syntax errors: Missing semicolons or other violations of coding rules.
  2. Inconsistencies: Function names differing between files (e.g., "abc" in file A and "Abc" in file B) or undefined variables.

The prompt will aim to generate a report for each file, detailing the errors that need to be addressed. This will allow us to tackle them systematically in subsequent prompts.

File Handling

Given the error descriptions and file content, we can employ two primary handling methods:

  1. Immediate code regeneration: Suitable for resolving syntax errors.
  2. Contextual correction: Involves reading other files to understand and fix inconsistencies. After reading the relevant context, the LLM should have sufficient information to resolve the error.

The code generation process is basically the same mentioning in part 2.  So we may reuse the same code here.

The Iterative Process

This process should loop until the compilation report no longer identifies any errors. At this point, the code should be runnable and free of syntax errors.

Current Status

Parts 1 and 2 successfully established a basic code generation framework and project structure. The generated code is now usable. However, limitations remain:

  1. Missing assets: Asset folders are not generated.
  2. Backend dependencies: Database setup and SaaS API integration are absent.
  3. User acceptance: The generated output may not meet user expectations.
  4. Runtime errors: The current process does not detect runtime problems.

Next Steps

I'm exploring potential next steps. Initial thoughts include:

  • Providing clearer requirements.
  • Generating test code.
  • Automating infrastructure code generation.

These areas are somewhat beyond my current knowledge, so I welcome your feedback and expertise.

Stay Tuned!

I encourage you to share your comments and thoughts on this implementation. Please leave a comment to continue the discussion.

I'm unsure how many parts this blog series will have. If you're interested, please show your support by sharing or commenting!









Comments

Popular posts from this blog

Resume Generator - Software Intro

Build AI Code Generation Tools For Large Scale Project in Python? Part 1 - Development Diary and Discussion

Expense Tracker - Software Intro