Score with Ocsai

Ocsai uses fine-tuned LLMs to score divergent thinking responses, achieving up to r = .81 correlation with human raters (Organisciak et al., 2023). Large files are processed in chunks with live progress.

Subscribe to The Creativity Byte for updates and alerts. Newsletter Archive

Input

Enter your prompt/response data, one per line, with a COMMA after the prompt

Options

Model
Originality Scoring Method
Prompt Label Style
Language
Task Type
API Key (optional)
Elaboration Scoring Methods

Results

Originality ranges from 1-5, where 1 is minimally original, and 5 is maximally original.

ReadyNo rows queued

No results yet.


Cite

Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. Thinking Skills and Creativity, 49, 101356. https://doi.org/10.1016/j.tsc.2023.101356

Large Language Model Scoring Details

Models

  • ocsai-1.6: Update to the multi-lingual, multi-task 1.5 model, trained on GPT 4o instead of 3.5.
  • ocsai1-4o: GPT-4o-based model, trained with more data and supporting multiple tasks. Last update to the Ocsai 1 models (i.e. the original ones).
  • ocsai-chatgpt2: GPT-3.5-size chat-based model, trained with more data and supporting multiple tasks. Scoring is slower, with slightly better performance than ocsai-davinci.
  • ocsai-davinci3: GPT-3 Davinci-size model. Trained with the method from Organisciak et al. 2023, but with the additional tasks (uses, consequences, instances, complete the sentence) from Acar et al 2023, and trained with more data.
  • ocsai-1.5: Beta version of new multi-lingual, multi-task model, trained on GPT 3.5.
  • ocsai-chatgpt: GPT-3.5-size chat-based model, trained with same format and data as original models. Scoring is slower, with slightly better performance than ocsai-davinci2. For more tasks and trained on more data, use davinci-ocsai2
  • ocsai-babbage2: GPT-3 Babbage-size model from the paper, retrained with new model API. Deprecated, mainly because other models work better.
  • ocsai-davinci2: GPT-3 Davinci-size model from the paper, retrained with a new model API.

How does it work?

Ocsai uses supervised learning: models are fine-tuned on thousands of human-judged divergent thinking responses so they learn what originality looks like. Scores use a 1–5 scale, where 1 is very unoriginal, 5 is very original, and 3 is the median.

LLM-based scoring is a newer approach, and research into its strengths and edge cases is ongoing. Contact us with questions at peter.organisciak@du.edu.

Learn more about the research and methodology behind Ocsai