Click legend items below to toggle model visibility and reveal overlapping profiles.
Score Comparison
Modus Ponens (Valid)
Ability to follow basic valid logic (If A then B; A is true; therefore B). All models should score 1.0.
Fallacy Trap (Affirming Consequent)
A trap! (If A then B; B is true; therefore A?). The correct answer is "Unknown" or "Cannot Determine". Simple models incorrectly say "Yes".
Multi-Hop Reasoning
Evaluates the ability to chain multiple logical steps (If A then B; If B then C; A is true; therefore C).
Modus Tollens
Tests inverse logic (If A then B; B is false; therefore A is false).
Disjunctive Syllogism
Assesses "either/or" logic (Either A or B; A is false; therefore B is true).
ICE Score (Avg)
The combined "Reasoning Integrity" score. Penalizes models that are decisive but logically wrong.