Here we provide supplementary data supporting RQ1 results.


Requirements Quality Results:

Initial Submissions: Table below shows the average score for each requirement attribute for initial submissions (denoted by H) in processes A+ and A-.

Req. Quality MetricA+A-
Complete3.13.3
Consistent3.53.3
Unambiguous3.53.3
Focused3.43.3
Relevant3.23.1
Feasible4.85.0
Verifiable/Measurable4.84.1
Correctly classified4.74.2
Well-formatted3.92.6
Total Quality (H)77%72%

ChatGPT’s Outputs: Table below shows the average score for each requirement attribute for ChatGPT’s outputs (denoted by G) in processes A+, A-, and B.

Req. Quality MetricA+A-B
Complete4.14.64.2
Consistent2.61.73.2
Unambiguous2.92.93.6
Focused3.93.73.6
Relevant3.232.8
Feasible2.72.32.3
Verifiable/Measurable2.91.43.0
Correctly classified4.64.434.5
Well-formatted1.60.33.5
Total Quality (G)63%54%68%

Final Submissions: Table below shows the average score for each requirement attribute for final submissions (denoted by F) in processes A+, A-, and B.

Req. Quality MetricA+A-B
Complete4.74.74.6
Consistent4.23.24.3
Unambiguous3.53.13.3
Focused4.23.74.1
Relevant3.332.8
Feasible4.54.143.5
Verifiable/Measurable4.53.83.6
Correctly classified4.64.54.5
Well-formatted3.63.53.8
Total Quality (F)82%75%77%

Prompt Quality Results:

Table below shows the average score for each prompt quality metric in processes A+, A-, and B.

Prompt Quality MetricA+A-B
Course Setup1.10.61.9
Project Setup2.81.03.0
Explicit Requests3.83.33.9
Expected Content3.02.34.4
Expected Format1.90.93.3
Personas0.50.30.5
Examples0.200.3
Avg. “How to ask”1.91.22.4
What to ask43.63.9
Total Quality30%17%38%

The raw data obtained in this study cannot be shared because of privacy issues.