Usage¶

Assuming that you’ve followed the installations steps, you’re now ready to use this package.

The AI Essay Evaluator provides two main command-line interfaces:

Evaluator: Grade student essays using OpenAI models
Trainer: Generate, validate, and fine-tune custom grading models

Command Structure¶

uv run python -m ai_essay_evaluator <command> <subcommand> [OPTIONS]

Available commands:

evaluator - CLI for grading student responses
trainer - Generate, validate, merge, upload, and fine-tune JSONL files

Evaluator: Grading Student Essays¶

Quick Start with Project Folder¶

The simplest way to use the evaluator is with a project folder structure:

project_folder/
├── input.csv              # Student responses
├── question.txt           # The essay question/prompt
├── story/                 # Folder containing story text files
│   ├── story1.txt
│   └── story2.txt
├── rubric/                # Folder containing rubric text files
│   ├── rubric1.txt
│   └── rubric2.txt
└── output/                # Results will be saved here (auto-created)

Basic command:

uv run python -m ai_essay_evaluator evaluator grader \
  --project-folder ./project_folder \
  --scoring-format extended \
  --api-key YOUR_OPENAI_API_KEY

Input CSV Format¶

Your input CSV file must contain these required columns:

Local Student ID - Unique student identifier
Enrolled Grade Level - Student’s grade level
Tested Language - Language of the test (e.g., “English”, “Spanish”)
Student Constructed Response - The student’s essay text

Optional column:

Passes - Number of times to process each essay (for consistency checking)

Example CSV:

Local Student ID,Enrolled Grade Level,Tested Language,Student Constructed Response,Passes
12345,5,English,"The story demonstrates courage...",3
12346,5,English,"In the passage, the author shows...",3

Scoring Formats¶

The tool supports three scoring formats, each with different fine-tuned models:

1. Extended Format (`extended`)¶

Provides detailed scoring across multiple dimensions:

Idea Development Score (0-4)
Idea Development Feedback (detailed comments)
Language Conventions Score (0-4)
Language Conventions Feedback (detailed comments)

Default model: ft:gpt-4o-mini-2024-07-18:securehst::B6YDFKyO

2. Item-Specific Format (`item-specific`)¶

Provides a single score and feedback:

Score (0-4)
Feedback (targeted comments)

Default model: ft:gpt-4o-mini-2024-07-18:securehst::B72LJHWZ

3. Short Format (`short`)¶

Provides concise scoring:

Score (0-4)
Feedback (brief comments)

Default model: ft:gpt-4o-mini-2024-07-18:securehst::B79Kzt5H

Full Parameter Reference¶

Required Parameters¶

--scoring-format - Scoring format: extended, item-specific, or short (required)
--api-key - Your OpenAI API key (required)

Project Folder Mode (Simplified)¶

--project-folder - Path to folder containing all required files
- Automatically discovers CSV file, story/, rubric/, and question.txt
- Creates output/ folder for results

Manual Mode (Advanced)¶

When not using --project-folder, you must specify:

--input-file - Path to input CSV file
--export-folder - Where to save results
--export-file-name - Base name for output files
--story-folder - Folder containing story text files
--rubric-folder - Folder containing rubric text files
--question-file - Path to question text file

Optional Parameters¶

--openai-project - OpenAI project ID for organization
--ai-model - Override the default fine-tuned model
--log / --no-log - Enable/disable logging (default: enabled)
--cost-analysis / --no-cost-analysis - Track token usage and costs (default: enabled)
--passes - Number of times to process each essay (overrides CSV column)
--merge-results / --no-merge-results - Merge multiple pass results (default: enabled)
--show-progress / --no-show-progress - Display progress bar (default: enabled)
--calculate-totals / --no-calculate-totals - Calculate total scores (default: enabled)

Output Files¶

The evaluator generates several output files in the export folder:

{filename}_pass_1.csv - Results from the first pass
{filename}_pass_2.csv - Results from subsequent passes (if --passes > 1)
{filename}_merged.csv - Merged results (if --merge-results enabled)
{filename}_cost_analysis.csv - Token usage and cost breakdown (if --cost-analysis enabled)
{filename}.log - Detailed processing log (if --log enabled)

Output CSV columns include:

All original input columns
Score columns (based on scoring format)
Feedback columns (based on scoring format)
Processing metadata (if enabled)

Usage Examples¶

Example 1: Basic Evaluation with Project Folder¶

uv run python -m ai_essay_evaluator evaluator grader \
  --project-folder ./my_essays \
  --scoring-format extended \
  --api-key sk-...

Example 2: Multiple Passes for Consistency¶

uv run python -m ai_essay_evaluator evaluator grader \
  --project-folder ./my_essays \
  --scoring-format item-specific \
  --api-key sk-... \
  --passes 3 \
  --merge-results

Example 3: Manual Mode with Custom Settings¶

uv run python -m ai_essay_evaluator evaluator grader \
  --input-file ./data/responses.csv \
  --export-folder ./results \
  --export-file-name final_grades \
  --scoring-format short \
  --story-folder ./data/stories \
  --rubric-folder ./data/rubrics \
  --question-file ./data/question.txt \
  --api-key sk-... \
  --no-progress

Example 4: Using Custom Fine-Tuned Model¶

uv run python -m ai_essay_evaluator evaluator grader \
  --project-folder ./my_essays \
  --scoring-format extended \
  --api-key sk-... \
  --ai-model ft:gpt-4o-mini-2024-07-18:org:YOUR_MODEL_ID \
  --openai-project proj_...

Trainer: Fine-Tuning Custom Models¶

The trainer component helps you create and fine-tune custom grading models using your own data.

Workflow Overview¶

Generate JSONL training data from your graded examples
Validate the JSONL file format
Merge multiple JSONL files (optional)
Upload to OpenAI
Fine-tune a custom model

1. Generate Training Data¶

Create a JSONL training dataset from your existing graded essays.

Command:

uv run python -m ai_essay_evaluator trainer generate \
  --story-folder ./training_data/story \
  --question ./training_data/question.txt \
  --rubric ./training_data/rubric.txt \
  --csv ./training_data/graded_essays.csv \
  --output training_dataset.jsonl \
  --scoring-format extended

Parameters:

--story-folder - Folder containing story text files (required)
--question - Path to question text file (required)
--rubric - Path to rubric text file (required)
--csv - Path to CSV with graded examples (required)
--output - Output JSONL filename (default: fine_tuning.jsonl)
--scoring-format - Output format: extended, item-specific, or short (required)

Input CSV for training must include:

Local Student ID
Enrolled Grade Level
Tested Language
Student Constructed Response
Score columns (matching your chosen scoring format)
Feedback columns (matching your chosen scoring format)

For extended format:

Idea_Development_Score
Idea_Development_Feedback
Language_Conventions_Score
Language_Conventions_Feedback

For item-specific/short formats:

Score
Feedback

2. Validate Training Data¶

Validate that your JSONL file is properly formatted for OpenAI fine-tuning.

Command:

uv run python -m ai_essay_evaluator trainer validate \
  --file training_dataset.jsonl \
  --scoring-format extended

Parameters:

--file - Path to JSONL file to validate (required)
--scoring-format - Scoring format used: extended, item-specific, or short (default: extended)

The validator checks:

JSON structure validity
Required fields presence
Message format compliance
Response structure matching scoring format

3. Merge Multiple Datasets¶

Combine multiple JSONL training files into a single dataset.

Command:

uv run python -m ai_essay_evaluator trainer merge \
  --folder ./training_datasets \
  --output merged_training.jsonl

Parameters:

--folder - Folder containing JSONL files to merge (required)
--output - Output merged JSONL filename (default: merged_fine_tuning.jsonl)

4. Upload to OpenAI¶

Upload your validated JSONL file to OpenAI for fine-tuning.

Command:

uv run python -m ai_essay_evaluator trainer upload \
  --file training_dataset.jsonl \
  --api-key YOUR_OPENAI_API_KEY

Parameters:

--file - Path to JSONL file to upload (required)
--api-key - OpenAI API key (optional, can use environment variable)

Output: Returns a file ID (e.g., file-abc123...) needed for fine-tuning.

5. Start Fine-Tuning Job¶

Create a fine-tuning job with your uploaded dataset.

Option A: Upload and fine-tune in one step

uv run python -m ai_essay_evaluator trainer fine-tune \
  --file training_dataset.jsonl \
  --scoring-format extended \
  --api-key YOUR_OPENAI_API_KEY

Option B: Use existing file ID

uv run python -m ai_essay_evaluator trainer fine-tune \
  --file-id file-abc123... \
  --api-key YOUR_OPENAI_API_KEY

Parameters:

--file - Path to JSONL file (validates, uploads, then fine-tunes)
--file-id - Existing OpenAI file ID (skips validation and upload)
--scoring-format - Required if using --file
--api-key - OpenAI API key (optional, can use environment variable)

Output: Returns a fine-tuning job ID (e.g., ftjob-xyz789...)

Complete Training Workflow Example¶

# Step 1: Generate JSONL from graded essays
uv run python -m ai_essay_evaluator trainer generate \
  --story-folder ./data/stories \
  --question ./data/question.txt \
  --rubric ./data/rubric.txt \
  --csv ./data/graded_samples.csv \
  --output my_training_data.jsonl \
  --scoring-format extended

# Step 2: Validate the generated file
uv run python -m ai_essay_evaluator trainer validate \
  --file my_training_data.jsonl \
  --scoring-format extended

# Step 3: Upload and start fine-tuning
uv run python -m ai_essay_evaluator trainer fine-tune \
  --file my_training_data.jsonl \
  --scoring-format extended \
  --api-key sk-...

# Step 4: Monitor your fine-tuning job on OpenAI dashboard
# Once complete, use the model ID with the evaluator:
uv run python -m ai_essay_evaluator evaluator grader \
  --project-folder ./new_essays \
  --scoring-format extended \
  --api-key sk-... \
  --ai-model ft:gpt-4o-mini-2024-07-18:org:YOUR_NEW_MODEL_ID

Features¶

Cost Analysis¶

When enabled (default), the tool tracks:

Token usage (prompt tokens, completion tokens, total)
Estimated costs based on model pricing
Per-essay cost breakdown
Summary statistics

Output saved to {filename}_cost_analysis.csv

Logging¶

Comprehensive async logging tracks:

Processing progress
API call success/failures
Error messages and retry attempts
Performance metrics

Output saved to {filename}.log

Rate Limiting¶

Built-in adaptive rate limiting:

Respects OpenAI API rate limits (5000 RPM, 4M TPM)
Automatic backoff on rate limit errors
Configurable retry logic with exponential backoff

Progress Tracking¶

Real-time progress bar shows:

Number of essays processed
Current processing speed
Estimated time remaining

Troubleshooting¶

Common Issues¶

1. “Missing required columns” error

Ensure your CSV has: Local Student ID, Enrolled Grade Level, Tested Language, Student Constructed Response
Check column names for exact spelling and capitalization

2. “No CSV input file found in project folder”

Ensure you have at least one .csv file in your project folder
Use --input-file to specify the file explicitly

3. Rate limit errors

The tool handles these automatically with retries
For large batches, processing may slow down temporarily
Consider using --passes 1 for faster initial runs

4. OpenAI API authentication errors

Verify your API key is correct
Check that your organization has access to the model
Use --openai-project if you have multiple projects

5. Fine-tuning validation errors

Ensure your graded CSV has all required score/feedback columns
Check that scoring format matches your data structure
Run trainer validate to get detailed error messages

Getting Help¶

For additional support:

Check the API Reference for module documentation
Review the Contributing Guide for development setup
Open an issue on GitHub

Usage¶

Command Structure¶

Evaluator: Grading Student Essays¶

Quick Start with Project Folder¶

Input CSV Format¶

Scoring Formats¶

1. Extended Format (extended)¶

2. Item-Specific Format (item-specific)¶

3. Short Format (short)¶

Full Parameter Reference¶

Required Parameters¶

Project Folder Mode (Simplified)¶

Manual Mode (Advanced)¶

Optional Parameters¶

Output Files¶

Usage Examples¶

Example 1: Basic Evaluation with Project Folder¶

Example 2: Multiple Passes for Consistency¶

Example 3: Manual Mode with Custom Settings¶

Example 4: Using Custom Fine-Tuned Model¶

Trainer: Fine-Tuning Custom Models¶

Workflow Overview¶

1. Generate Training Data¶

2. Validate Training Data¶

3. Merge Multiple Datasets¶

4. Upload to OpenAI¶

5. Start Fine-Tuning Job¶

Complete Training Workflow Example¶

Features¶

Cost Analysis¶

Logging¶

Rate Limiting¶

Progress Tracking¶

Troubleshooting¶

Common Issues¶

Getting Help¶

1. Extended Format (`extended`)¶

2. Item-Specific Format (`item-specific`)¶

3. Short Format (`short`)¶