๐ Global Dataset Metrics
Total Problems
1335
Text Uniqueness
100.0%
Signature Uniqueness
99.25%
Duplicate Texts
0
Avg Formulas/Problem
3.05
Valid Answers
100.0%
Unrealistic Values
2
Avg Code Length
2752.8
Diversity (TTR)
5.94%
Unique Formulas
101
Unique Unknowns
452
Avg Word Count
82.3
๐ Global Visualizations
๐ Chapter-wise Details
| Chapter (click to expand) | Size | Difficulty | Avg Formulas | Text Unique % | Diversity (TTR) |
|---|---|---|---|---|---|
| 10.Rigid Body Dynamics | 355 | Hard (n>3) | 3.38 | 100.0% | 8.91% |
๐ 10.Rigid Body Dynamics - Detailed MetricsProblems355 Valid Results100.0% Unrealistic Values0 Duplicate Texts0 Sig Unique %100.0% Avg Code Length2902.2 Unique Formulas53 Unique Unknowns130 Avg Word Count80.1 |
|||||
| 2-4.Kinematics | 185 | Medium (n=2) | 2.88 | 100.0% | 13.38% |
๐ 2-4.Kinematics - Detailed MetricsProblems185 Valid Results100.0% Unrealistic Values0 Duplicate Texts0 Sig Unique %100.0% Avg Code Length2347.5 Unique Formulas32 Unique Unknowns75 Avg Word Count69.2 |
|||||
| 5.Newton's Laws of Motion | 149 | Medium (n=2) | 2.85 | 100.0% | 12.27% |
๐ 5.Newton's Laws of Motion - Detailed MetricsProblems149 Valid Results100.0% Unrealistic Values1 Duplicate Texts0 Sig Unique %100.0% Avg Code Length2718.0 Unique Formulas17 Unique Unknowns52 Avg Word Count91.5 |
|||||
| 6.Friction | 87 | Medium (n=2) | 2.7 | 100.0% | 13.19% |
๐ 6.Friction - Detailed MetricsProblems87 Valid Results100.0% Unrealistic Values0 Duplicate Texts0 Sig Unique %100.0% Avg Code Length2882.6 Unique Formulas9 Unique Unknowns40 Avg Word Count94.6 |
|||||
| 7.Work, Power & Energy | 200 | Hard (n>3) | 3.02 | 100.0% | 11.74% |
๐ 7.Work, Power & Energy - Detailed MetricsProblems200 Valid Results100.0% Unrealistic Values1 Duplicate Texts0 Sig Unique %100.0% Avg Code Length2977.2 Unique Formulas42 Unique Unknowns77 Avg Word Count77.2 |
|||||
| 8.Circular Motion | 178 | Hard (n>3) | 3.01 | 100.0% | 11.76% |
๐ 8.Circular Motion - Detailed MetricsProblems178 Valid Results100.0% Unrealistic Values0 Duplicate Texts0 Sig Unique %100.0% Avg Code Length2284.8 Unique Formulas25 Unique Unknowns68 Avg Word Count74.0 |
|||||
| 9.Centre of Mass | 181 | Medium (n=2) | 2.97 | 100.0% | 13.21% |
๐ 9.Centre of Mass - Detailed MetricsProblems181 Valid Results100.0% Unrealistic Values0 Duplicate Texts0 Sig Unique %100.0% Avg Code Length3052.7 Unique Formulas46 Unique Unknowns114 Avg Word Count100.3 |
|||||
๐ Metric Explanations
Total Problems: Total number of physics problems in the dataset or chapter.
Text Uniqueness: Percentage of problems with unique wording computed as
(unique_texts / total_texts) ร 100. Higher values indicate less repetition in problem statements.
Signature Uniqueness: Percentage of distinct problem signatures computed as
(unique_signatures / total_signatures) ร 100, where each signature represents a unique combination of formulas and unknown variable. Higher values indicate more diverse problem structures.
Duplicate Texts: Number of problems with non-unique wording, calculated as
total_texts - unique_texts. Lower is better for dataset diversity.
Avg Formulas/Problem: Mean number of physics formulas used per problem, computed as
mean(formula_counts_per_problem). Indicates problem complexity.
Valid Answers: Percentage of problems with non-null numerical results, computed as
(problems_with_results / total_problems) ร 100. Should ideally be 100%.
Unrealistic Values: Count of numerical results that are either extremely large
(|value| > 10ยนโต) or extremely small (|value| < 10โปยนโต), which may indicate computational errors or physically implausible scenarios.
Avg Code Length: Mean character count of solution code snippets, computed as
mean(len(code)). Provides insight into solution complexity.
Diversity (Type-Token Ratio / TTR): Vocabulary richness measured as
(unique_words / total_words) ร 100 across all problem texts. Higher values indicate more varied language and less repetitive wording.
Unique Formulas: Total count of distinct formula identifiers used across all problems, computed as
len(set(all_formula_ids)). Indicates breadth of physics concepts covered.
Unique Unknowns: Total count of distinct unknown variables being solved for, computed as
len(set(all_unknown_vars)). Shows variety in problem objectives.
Avg Word Count: Mean number of words per problem statement, computed as
mean(word_counts). Indicates average problem description length.
Difficulty Level: Categorical assessment based on average formulas per problem: Easy (< 2), Medium (2-3), or Hard (> 3).
๐ง Raw Global Data
{
"size": 1335,
"numerical_quality": {
"valid_results_pct": 100.0,
"unrealistic_values": 2
},
"distinctness": {
"text_uniqueness_pct": 100.0,
"signature_uniqueness_pct": 99.25,
"duplicate_texts": 0
},
"code_stats": {
"avg_char_length": 2752.8
},
"difficulty": {
"avg_formulas_per_prob": 3.05,
"level": "Hard (n>3)"
},
"content_balance": {
"unique_formulas_used": 101,
"unique_unknowns_used": 452,
"top_15_formulas": {
"5_A": 467,
"8_C": 180,
"8_E": 167,
"2_D": 148,
"7_A": 146,
"2_E": 143,
"10_N": 126,
"10_O": 121,
"5_B": 103,
"9_K": 96,
"7_G": 92,
"7_F": 85,
"8_R": 84,
"10_R": 80,
"5_J": 80
},
"top_10_unknowns": {
"acceleration": 33,
"displacement": 27,
"mass": 23,
"normal_force": 22,
"angular_acceleration": 21,
"work_done": 21,
"v": 20,
"a": 20,
"total_acceleration": 20,
"average_speed": 18
}
},
"generation_quality": {
"type_token_ratio": 5.94,
"avg_word_count": 82.3
},
"formula_analysis": {
"formula_count_distribution": {
"2": 261,
"3": 814,
"4": 198,
"5": 60,
"6": 2
},
"avg_code_length_by_formula_count": {
"2": 2419.9,
"3": 2635.0,
"4": 3277.8,
"5": 4011.5,
"6": 4420.5
}
}
}