Appendix A: Supplementary Data
This appendix provides supplementary data for the educational evaluation in the BattleInTheSky project, including team structure, evaluation instruments, raw data summaries, calculation methods, and validation rationale, as referenced in Section 6 of the article.
A.1 Participants and Team Structure
The study involved 15 Mechatronics Engineering students organized into 8 teams (7 pairs and 1 solo participant).
| Team | Members |
|---|---|
| Team 1 | Student A, Student B |
| Team 2 | Student C, Student D |
| Team 3 | Student E, Student F |
| Team 4 | Student G, Student H |
| Team 5 | Student I, Student J |
| Team 6 | Student K, Student L |
| Team 7 | Student M, Student N |
| Team 8 | Student O (solo) |
Note: Student names are anonymized to protect privacy, as per ethical guidelines outlined in the article (Section 7.4).
A.2 Evaluation Instruments Overview
- Pre/Post-Test: Knowledge assessment on expert systems and embedded AI (20 items).
- Motivation Survey: 15 Likert-scale items across three subscales (Intrinsic Motivation, Perceived Competence, Autonomy).
- Usefulness Survey: 10 Likert-scale items + 2 open-ended questions.
- Game Artifact Analysis: Rubric evaluating logic, code clarity, functionality, and design.
A.3 Summary of Key Results
- Pre-Test Mean ± SD: 68.0 ± 4.8 (95% CI [65.4, 70.6])
- Post-Test Mean ± SD: 83.2 ± 4.2 (95% CI [80.9, 85.5])
- Average Learning Gain: 22.4%
- Motivation Mean: 4.0 (95% CI [3.9, 4.1])
- Perceived Usefulness Mean: 4.05 (95% CI [3.97, 4.13])
- Artifact Quality Mean: 15.9/20 (95% CI [14.3, 17.5])
A.3.1 Raw Data Summaries
The following table summarizes team-averaged raw data for Pre/Post-Test, Motivation, Usefulness, and Artifact scores.
| Team | Pre-Test Mean | Post-Test Mean | Motivation Mean | Usefulness Mean | Artifact Score |
|---|---|---|---|---|---|
| Team 1 | 66.5 | 81.0 | 4.0 | 4.05 | 14 |
| Team 2 | 71.0 | 86.5 | 3.8 | 3.95 | 15 |
| Team 3 | 61.5 | 77.5 | 4.3 | 4.25 | 18 |
| Team 4 | 77.0 | 89.5 | 3.8 | 3.95 | 14 |
| Team 5 | 68.0 | 82.0 | 4.0 | 4.15 | 16 |
| Team 6 | 69.5 | 82.5 | 3.9 | 4.05 | 12 |
| Team 7 | 65.0 | 80.5 | 3.6 | 3.85 | 16 |
| Team 8 | 73.0 | 87.0 | 4.2 | 4.2 | 19 |
Note: Individual student scores are aggregated to team averages to protect privacy. Full raw data are available upon request via pablo_alcaraz@ucol.mx.
A.4 Calculation Methods
- Means, Standard Deviations (SD), and 95% Confidence Intervals (CI) calculated using standard statistical formulas.
- Paired t-test confirmed significant learning gains (p < 0.001).
- Cohen's d effect size for learning gain: 1.3 (large).
- Pearson correlation between Post-Test and Artifact Scores: r = 0.62 (p = 0.015).
Example Calculation (Pre-Test Mean): Scores: 65, 68, 72, 70, 60, 63, 79, 75, 67, 69, 71, 68, 64, 66, 73. Mean = (65 + 68 + ... + 73) / 15 = 1020 / 15 = 68.0. SD = √[Σ(score - 68)² / 14] ≈ 4.8.
A.5 Validation Rationale
- Content Validity: Alignment with Bloom’s taxonomy for cognitive skills.
- Reliability: Standardized scoring and piloted rubric.
- Construct Validity: Measuring intended constructs: learning, motivation, perceived utility.
- Statistical Rigor: Proper treatment of interval and ordinal data.
A.6 Full Instruments Used
Sample questions and structures for each instrument are provided below. Complete question sets for Pre/Post-Test, Motivation Survey, and Usefulness Survey are available in this PDF or upon request via pablo_alcaraz@ucol.mx.
A.6.1 Pre-Test and Post-Test (Individual)
Purpose: Measure students’ understanding of expert systems and embedded AI concepts.
Format: 20 items: 10 Multiple Choice, 5 Short Answer, 5 Problem-Solving. Total: 100 points.
Key Topics: Expert system components, IF-THEN logic, embedded AI, ethical issues.
Example Questions:
- Which component contains domain knowledge? (Knowledge Base)
- Describe the role of the inference engine.
- Write an IF-THEN rule to control an LED based on temperature.
- Design a knowledge base for "raining → wet ground → plant growth."
- Explain how sensors and actuators can be combined in IF-THEN logic.
Scoring: Full points for complete and accurate answers; partial points for incomplete or partially correct answers.
A.6.2 Motivation Survey (Individual)
Purpose: Assess intrinsic motivation, perceived competence, and autonomy related to the project.
Format: 15 Likert-scale items (1 = Strongly Disagree to 5 = Strongly Agree).
Subscales and Sample Items:
- Intrinsic Motivation: "I found the BattleInTheSky project enjoyable."
- Perceived Competence: "I feel confident in designing expert systems."
- Autonomy: "I had the freedom to design the game my way."
A.6.3 Usefulness Survey (Individual)
Purpose: Measure perceived usefulness of the project as a learning tool.
Format: 10 Likert-scale items + 2 open-ended questions.
Sample Items:
- "The BattleInTheSky project helped me understand the practical applications of expert systems."
- "The project increased my interest in AI."
- Open-ended: "What were the most helpful aspects?" and "Suggestions for improvement?"
A.6.4 Game Artifact Analysis (Team)
Purpose: Evaluate the final expert system game created by each team.
Format: Rubric with four criteria, each scored 1–5 points.
Criteria:
- Rule Logic: Correctness, completeness, efficiency.
- Code Clarity: Readability, organization, documentation.
- Functionality: Implementation of game mechanics.
- Design: Creativity and effectiveness.
Scoring Example: 5 = Excellent; 4 = Good; 3 = Fair; 2 = Poor; 1 = Very Poor.