What AI App Builders Actually Generate (A Code Review)
We reviewed the actual code output from multiple AI app builders. Here is what the generated code looks like, where it shines, and where it breaks.
Everyone talks about what AI app builders can do. Nobody shows what they actually produce. We took the same app concept (a habit tracker with streaks, reminders, and analytics) and ran it through multiple AI building tools. Then we reviewed the generated code the way a senior developer would review a pull request.
The results were mixed. Some tools produced code we would ship. Others produced code we would rewrite from scratch.
## What we tested
We gave each tool the same brief: a mobile habit tracker with user authentication, daily check-ins, streak tracking, push notification reminders, and a simple analytics dashboard. This is a common app pattern that tests a range of capabilities.
We evaluated the output on five dimensions: code structure, type safety, error handling, security practices, and maintainability.
## Code structure: how the files are organized
The best-generated codebases follow established conventions. React components in a components directory. Screens in a screens or app directory. Services and API calls separated from UI code. Shared types in a types file.
Some tools get this right. The output follows React Native conventions that any developer would recognize. You open the project and immediately understand the structure.
Other tools dump everything into a few large files. A single 800-line file containing multiple screens, inline API calls, and hardcoded values is technically functional but painful to maintain. When you need to fix a bug in the settings screen, you are scrolling through 800 lines of unrelated code to find it.
At [Goodspeed](/features/building), we start every app from a 68-file template that establishes the structure before any AI-generated code is added. The generated screens, services, and types slot into a predefined architecture. This means the project structure is always consistent, regardless of the app type.
## Type safety: TypeScript done right vs. TypeScript as decoration
TypeScript is only valuable if the types are meaningful. We saw a wide range here.
The best output defined proper interfaces for API responses, screen props, and state objects. When you hover over a variable in VS Code, you see its actual shape, not "any" or a generic "object."
The worst output used TypeScript syntax but typed everything as "any" or used loose types that do not catch errors. This gives you the overhead of TypeScript (the compilation step, the type annotations) without the benefit (catching bugs before runtime).
We also checked for null safety. Database values can be null. API calls can fail. The best-generated code handles these cases explicitly with null checks, optional chaining, and fallback values. The worst-generated code assumes everything succeeds, which produces apps that crash the first time a network request fails or a database field is empty.
### A real example
Here is a pattern we see in poorly typed generated code:
```typescript const score = data.score; return <Text>{score.toFixed(1)}</Text>; ```
If `data.score` is null (and database values often are), this crashes. Well-typed generated code handles it:
```typescript const score = data.score ?? 0; return <Text>{score.toFixed(1)}</Text>; ```
Small difference. Huge impact in production.
## Error handling: the gap between demo and production
Every AI builder can produce code that works in the happy path. The difference shows up when things go wrong. Network failures, invalid user input, expired auth sessions, and empty states are all things that happen in production.
The best output includes:
- Try/catch blocks around API calls with user-facing error messages - Loading states while data is being fetched - Empty states when lists have no items - Retry mechanisms for transient failures - Graceful degradation when optional features are unavailable
The worst output has none of these. The app works perfectly on a fast connection with perfect data. The first time something unexpected happens, users see a white screen or a cryptic error message.
## Security: the non-negotiable
Security issues in generated code are the most concerning finding. Some patterns we flagged:
**API keys in source code.** Multiple tools embedded API keys directly in the client code. These keys are visible to anyone who downloads your app. They should be in environment variables, never in the source.
**Missing row-level security.** When the tool generates a database schema, does it include access control policies? Without RLS (Row-Level Security), any authenticated user can read or modify any other user's data. This is not a nice-to-have. It is a requirement.
**Client-side auth checks only.** Some tools check whether a user is logged in on the client side but do not verify authentication on the server. A determined user can bypass client-side checks by modifying the app or calling the API directly.
The best tools generate proper security patterns by default. Auth tokens are stored securely. Database access is restricted by user. API keys are loaded from environment variables. This is table-stakes for production code, and it is where template-based generation has a clear advantage over from-scratch generation. Our [template](/features/building) includes RLS policies, secure auth flows, and environment variable management out of the box.
## Maintainability: can you actually work with this code?
You will need to modify the generated code. Features need updating. Bugs need fixing. New requirements emerge. The question is whether the codebase makes this easy or hard.
Maintainable generated code has:
- Clear component boundaries (one component per file, single responsibility) - Consistent naming conventions (camelCase for variables, PascalCase for components) - Comments on non-obvious business logic - Separation of concerns (UI separate from data fetching separate from business logic) - Standard library choices that a new developer would recognize
Unmaintainable generated code has:
- Deeply nested ternaries instead of early returns - Copy-pasted code blocks instead of shared utilities - Inconsistent patterns across different screens - Dependencies on obscure or abandoned packages
## Our recommendation
Before choosing an AI app builder, ask to see the generated code. Download a sample project. Open it in VS Code. Read it the way you would read code from a new team member.
Check the five dimensions: structure, types, error handling, security, and maintainability. If the code passes your review, the tool is worth using. If you find yourself rewriting more than 30% of the output, the tool is slowing you down instead of speeding you up.
At Goodspeed, 76% of every app comes from our [production-tested template](/features/building). The AI generates only the app-specific 24%: screens, business logic, and data models. This approach minimizes the surface area for generated code bugs while still delivering custom apps for every idea.
Want to see how different tools compare? Check our [comparison page](/compare) for side-by-side evaluations of popular AI app builders.