Risk Level: π’ Essential

Guide Entry
TESTING (n.): The practice of verifying that code works. With AI-generated code, testing is your primary defense against confident wrongness. The AI will never tell you something is broken. Tests will.

The Testing Paradox
AI can generate tests. But:
- AI-generated tests test AI-generated assumptions
- If both are wrong in the same way, you won't know
Solution: Generate tests, but verify they test what matters.

Testing AI Code Workflow
Step 1: Specify First
Before generating code:
"I need a function that:
- Takes a list of numbers
- Returns the average
- Returns 0 for empty lists
- Ignores non-numeric values"
Step 2: Generate Tests First
"Write tests for that function before implementing it.
Use pytest.
Include edge cases."
Step 3: Review Tests
Do the tests match your specification?
- Test for empty list?
- Test for non-numeric values?
- Test for normal case?
Do the tests make sense?
# Good test
def test_average_ignores_strings():
assert average([1, "two", 3]) == 2.0
# Suspicious test - why would this be expected?
def test_average_returns_none_for_empty():
assert average([]) is None
# Wait, spec said return 0, not None...
Step 4: Generate Implementation
"Now implement the function to pass these tests."
Step 5: Run Tests
pytest
If tests fail, you learned something. Fix and repeat.

What to Test
The Happy Path
def test_average_normal_case():
assert average([1, 2, 3, 4, 5]) == 3.0
Edge Cases
def test_average_empty_list():
assert average([]) == 0
def test_average_single_element():
assert average([42]) == 42
Error Cases
def test_average_with_invalid_input():
assert average([1, "two", 3]) == 2.0
Boundaries
def test_average_very_large_numbers():
assert average([10**100, 10**100]) == 10**100

Testing AI-Specific Concerns
Hallucination Testing
def test_uses_real_library():
# If this import fails, AI hallucinated the library
from actual_library import actual_function
Scope Testing
def test_function_only_does_what_asked():
result = process_data(input_data)
# Verify it didn't add extra fields
assert set(result.keys()) == {"expected", "fields", "only"}
Integration Testing
def test_works_with_existing_code():
# Test that AI code integrates with your codebase
existing_result = existing_function()
ai_result = ai_generated_function(existing_result)
assert ai_result is not None

AI Test Generation Prompts
Good Prompt
"Generate pytest tests for this function.
Include:
- 3 happy path tests
- 3 edge case tests
- 2 error case tests
Follow this format:
def test_descriptive_name():
'''What this tests.'''
# Arrange
input = ...
# Act
result = function(input)
# Assert
assert result == expected"
Review the Output
AI might generate:
- Tests that always pass (useless)
- Tests that test implementation, not behavior
- Tests missing critical edge cases
You catch these by reading the tests.

The Testing Safety Net
Level 0: No tests
AI says it works, you hope it works
π΄ Dangerous
Level 1: AI-generated tests, unreviewed
Better than nothing
π‘ Risky
Level 2: AI-generated tests, reviewed
You verified they test the right things
π’ Good
Level 3: Spec-first tests, then implementation
Tests define correctness, code follows
π’ Better
Level 4: Test + review + CI/CD
Automated verification on every commit
π’ Best

The Street Rule
"The AI cannot test its own correctness. Tests are how you test the AI."

Move to Make
For your next AI-generated function:
- Write the spec
- Generate tests from spec
- Review tests manually
- Generate implementation
- Run tests
- Note what the tests caught
Build the muscle memory of test-first AI development.