Fix Claude validation response format parsing

The Claude AI validator was receiving detailed explanations with markdown formatting (e.g., '**PASS**') instead of the expected simple format. Updated the validation prompt to explicitly require responses to start with either 'PASS' or 'FAIL: <reason>' without any additional formatting, explanations, or markdown before the verdict. This fixes the 'Warning: Unexpected Claude response format' error that was causing valid test results to be incorrectly marked as unclear.
2025-12-09 23:37:06 +00:00 · 2025-10-30 12:34:02 +08:00
parent c8b7015a2c
commit 46f1038724
1 changed files with 7 additions and 2 deletions
--- a/cmd/test-runner/validate.go
+++ b/cmd/test-runner/validate.go
@@ -228,11 +228,16 @@ func (v *Validator) validateWithClaude(prompt *PromptTest, simpleCheckPassed boo
 4. Appears to be from a working LLM model (not system errors or failures)
 5. Has reasonable quality for a 4B parameter model

-Respond with ONLY one of these formats:
+CRITICAL: Your response MUST start with exactly one of these two words:
 - "PASS" if the response is valid and acceptable
 - "FAIL: <brief reason>" if the response has issues

-Be concise. One line only.`)
+Example valid responses:
+- "PASS"
+- "FAIL: Response is gibberish"
+- "FAIL: Contains error messages instead of proper response"
+
+Do NOT include explanations, markdown formatting, or additional text before the verdict. Start your response with PASS or FAIL: only.`)

 	// Write to temp file
 	promptFile := filepath.Join(v.claudeTempDir, fmt.Sprintf("prompt_%d.txt", os.Getpid()))