Usage¶
Assuming that you’ve followed the installations steps, you’re now ready to use this package.
The AI Essay Evaluator provides two main command-line interfaces:
Evaluator: Grade student essays using OpenAI models
Trainer: Generate, validate, and fine-tune custom grading models
Command Structure¶
uv run python -m ai_essay_evaluator <command> <subcommand> [OPTIONS]
Available commands:
evaluator- CLI for grading student responsestrainer- Generate, validate, merge, upload, and fine-tune JSONL files
Evaluator: Grading Student Essays¶
Quick Start with Project Folder¶
The simplest way to use the evaluator is with a project folder structure:
project_folder/
├── input.csv # Student responses
├── question.txt # The essay question/prompt
├── story/ # Folder containing story text files
│ ├── story1.txt
│ └── story2.txt
├── rubric/ # Folder containing rubric text files
│ ├── rubric1.txt
│ └── rubric2.txt
└── output/ # Results will be saved here (auto-created)
Basic command:
uv run python -m ai_essay_evaluator evaluator grader \
--project-folder ./project_folder \
--scoring-format extended \
--api-key YOUR_OPENAI_API_KEY
Input CSV Format¶
Your input CSV file must contain these required columns:
Local Student ID- Unique student identifierEnrolled Grade Level- Student’s grade levelTested Language- Language of the test (e.g., “English”, “Spanish”)Student Constructed Response- The student’s essay text
Optional column:
Passes- Number of times to process each essay (for consistency checking)
Example CSV:
Local Student ID,Enrolled Grade Level,Tested Language,Student Constructed Response,Passes
12345,5,English,"The story demonstrates courage...",3
12346,5,English,"In the passage, the author shows...",3
Scoring Formats¶
The tool supports three scoring formats, each with different fine-tuned models:
1. Extended Format (extended)¶
Provides detailed scoring across multiple dimensions:
Idea Development Score (0-4)
Idea Development Feedback (detailed comments)
Language Conventions Score (0-4)
Language Conventions Feedback (detailed comments)
Default model: ft:gpt-4o-mini-2024-07-18:securehst::B6YDFKyO
2. Item-Specific Format (item-specific)¶
Provides a single score and feedback:
Score (0-4)
Feedback (targeted comments)
Default model: ft:gpt-4o-mini-2024-07-18:securehst::B72LJHWZ
3. Short Format (short)¶
Provides concise scoring:
Score (0-4)
Feedback (brief comments)
Default model: ft:gpt-4o-mini-2024-07-18:securehst::B79Kzt5H
Full Parameter Reference¶
Required Parameters¶
--scoring-format- Scoring format:extended,item-specific, orshort(required)--api-key- Your OpenAI API key (required)
Project Folder Mode (Simplified)¶
--project-folder- Path to folder containing all required filesAutomatically discovers CSV file, story/, rubric/, and question.txt
Creates output/ folder for results
Manual Mode (Advanced)¶
When not using --project-folder, you must specify:
--input-file- Path to input CSV file--export-folder- Where to save results--export-file-name- Base name for output files--story-folder- Folder containing story text files--rubric-folder- Folder containing rubric text files--question-file- Path to question text file
Optional Parameters¶
--openai-project- OpenAI project ID for organization--ai-model- Override the default fine-tuned model--log / --no-log- Enable/disable logging (default: enabled)--cost-analysis / --no-cost-analysis- Track token usage and costs (default: enabled)--passes- Number of times to process each essay (overrides CSV column)--merge-results / --no-merge-results- Merge multiple pass results (default: enabled)--show-progress / --no-show-progress- Display progress bar (default: enabled)--calculate-totals / --no-calculate-totals- Calculate total scores (default: enabled)
Output Files¶
The evaluator generates several output files in the export folder:
{filename}_pass_1.csv- Results from the first pass{filename}_pass_2.csv- Results from subsequent passes (if--passes > 1){filename}_merged.csv- Merged results (if--merge-resultsenabled){filename}_cost_analysis.csv- Token usage and cost breakdown (if--cost-analysisenabled){filename}.log- Detailed processing log (if--logenabled)
Output CSV columns include:
All original input columns
Score columns (based on scoring format)
Feedback columns (based on scoring format)
Processing metadata (if enabled)
Usage Examples¶
Example 1: Basic Evaluation with Project Folder¶
uv run python -m ai_essay_evaluator evaluator grader \
--project-folder ./my_essays \
--scoring-format extended \
--api-key sk-...
Example 2: Multiple Passes for Consistency¶
uv run python -m ai_essay_evaluator evaluator grader \
--project-folder ./my_essays \
--scoring-format item-specific \
--api-key sk-... \
--passes 3 \
--merge-results
Example 3: Manual Mode with Custom Settings¶
uv run python -m ai_essay_evaluator evaluator grader \
--input-file ./data/responses.csv \
--export-folder ./results \
--export-file-name final_grades \
--scoring-format short \
--story-folder ./data/stories \
--rubric-folder ./data/rubrics \
--question-file ./data/question.txt \
--api-key sk-... \
--no-progress
Example 4: Using Custom Fine-Tuned Model¶
uv run python -m ai_essay_evaluator evaluator grader \
--project-folder ./my_essays \
--scoring-format extended \
--api-key sk-... \
--ai-model ft:gpt-4o-mini-2024-07-18:org:YOUR_MODEL_ID \
--openai-project proj_...
Trainer: Fine-Tuning Custom Models¶
The trainer component helps you create and fine-tune custom grading models using your own data.
Workflow Overview¶
Generate JSONL training data from your graded examples
Validate the JSONL file format
Merge multiple JSONL files (optional)
Upload to OpenAI
Fine-tune a custom model
1. Generate Training Data¶
Create a JSONL training dataset from your existing graded essays.
Command:
uv run python -m ai_essay_evaluator trainer generate \
--story-folder ./training_data/story \
--question ./training_data/question.txt \
--rubric ./training_data/rubric.txt \
--csv ./training_data/graded_essays.csv \
--output training_dataset.jsonl \
--scoring-format extended
Parameters:
--story-folder- Folder containing story text files (required)--question- Path to question text file (required)--rubric- Path to rubric text file (required)--csv- Path to CSV with graded examples (required)--output- Output JSONL filename (default:fine_tuning.jsonl)--scoring-format- Output format:extended,item-specific, orshort(required)
Input CSV for training must include:
Local Student IDEnrolled Grade LevelTested LanguageStudent Constructed ResponseScore columns (matching your chosen scoring format)
Feedback columns (matching your chosen scoring format)
For extended format:
Idea_Development_ScoreIdea_Development_FeedbackLanguage_Conventions_ScoreLanguage_Conventions_Feedback
For item-specific/short formats:
ScoreFeedback
2. Validate Training Data¶
Validate that your JSONL file is properly formatted for OpenAI fine-tuning.
Command:
uv run python -m ai_essay_evaluator trainer validate \
--file training_dataset.jsonl \
--scoring-format extended
Parameters:
--file- Path to JSONL file to validate (required)--scoring-format- Scoring format used:extended,item-specific, orshort(default:extended)
The validator checks:
JSON structure validity
Required fields presence
Message format compliance
Response structure matching scoring format
3. Merge Multiple Datasets¶
Combine multiple JSONL training files into a single dataset.
Command:
uv run python -m ai_essay_evaluator trainer merge \
--folder ./training_datasets \
--output merged_training.jsonl
Parameters:
--folder- Folder containing JSONL files to merge (required)--output- Output merged JSONL filename (default:merged_fine_tuning.jsonl)
4. Upload to OpenAI¶
Upload your validated JSONL file to OpenAI for fine-tuning.
Command:
uv run python -m ai_essay_evaluator trainer upload \
--file training_dataset.jsonl \
--api-key YOUR_OPENAI_API_KEY
Parameters:
--file- Path to JSONL file to upload (required)--api-key- OpenAI API key (optional, can use environment variable)
Output: Returns a file ID (e.g., file-abc123...) needed for fine-tuning.
5. Start Fine-Tuning Job¶
Create a fine-tuning job with your uploaded dataset.
Option A: Upload and fine-tune in one step
uv run python -m ai_essay_evaluator trainer fine-tune \
--file training_dataset.jsonl \
--scoring-format extended \
--api-key YOUR_OPENAI_API_KEY
Option B: Use existing file ID
uv run python -m ai_essay_evaluator trainer fine-tune \
--file-id file-abc123... \
--api-key YOUR_OPENAI_API_KEY
Parameters:
--file- Path to JSONL file (validates, uploads, then fine-tunes)--file-id- Existing OpenAI file ID (skips validation and upload)--scoring-format- Required if using--file--api-key- OpenAI API key (optional, can use environment variable)
Output: Returns a fine-tuning job ID (e.g., ftjob-xyz789...)
Complete Training Workflow Example¶
# Step 1: Generate JSONL from graded essays
uv run python -m ai_essay_evaluator trainer generate \
--story-folder ./data/stories \
--question ./data/question.txt \
--rubric ./data/rubric.txt \
--csv ./data/graded_samples.csv \
--output my_training_data.jsonl \
--scoring-format extended
# Step 2: Validate the generated file
uv run python -m ai_essay_evaluator trainer validate \
--file my_training_data.jsonl \
--scoring-format extended
# Step 3: Upload and start fine-tuning
uv run python -m ai_essay_evaluator trainer fine-tune \
--file my_training_data.jsonl \
--scoring-format extended \
--api-key sk-...
# Step 4: Monitor your fine-tuning job on OpenAI dashboard
# Once complete, use the model ID with the evaluator:
uv run python -m ai_essay_evaluator evaluator grader \
--project-folder ./new_essays \
--scoring-format extended \
--api-key sk-... \
--ai-model ft:gpt-4o-mini-2024-07-18:org:YOUR_NEW_MODEL_ID
Features¶
Cost Analysis¶
When enabled (default), the tool tracks:
Token usage (prompt tokens, completion tokens, total)
Estimated costs based on model pricing
Per-essay cost breakdown
Summary statistics
Output saved to {filename}_cost_analysis.csv
Logging¶
Comprehensive async logging tracks:
Processing progress
API call success/failures
Error messages and retry attempts
Performance metrics
Output saved to {filename}.log
Rate Limiting¶
Built-in adaptive rate limiting:
Respects OpenAI API rate limits (5000 RPM, 4M TPM)
Automatic backoff on rate limit errors
Configurable retry logic with exponential backoff
Progress Tracking¶
Real-time progress bar shows:
Number of essays processed
Current processing speed
Estimated time remaining
Troubleshooting¶
Common Issues¶
1. “Missing required columns” error
Ensure your CSV has:
Local Student ID,Enrolled Grade Level,Tested Language,Student Constructed ResponseCheck column names for exact spelling and capitalization
2. “No CSV input file found in project folder”
Ensure you have at least one
.csvfile in your project folderUse
--input-fileto specify the file explicitly
3. Rate limit errors
The tool handles these automatically with retries
For large batches, processing may slow down temporarily
Consider using
--passes 1for faster initial runs
4. OpenAI API authentication errors
Verify your API key is correct
Check that your organization has access to the model
Use
--openai-projectif you have multiple projects
5. Fine-tuning validation errors
Ensure your graded CSV has all required score/feedback columns
Check that scoring format matches your data structure
Run
trainer validateto get detailed error messages
Getting Help¶
For additional support:
Check the API Reference for module documentation
Review the Contributing Guide for development setup
Open an issue on GitHub