Score with Ocsai

Ocsai uses fine-tuned LLMs to score divergent thinking responses, achieving up to r = .81 correlation with human raters (Organisciak et al., 2023). Large files are processed in chunks with live progress.

Subscribe to The Creativity Byte for updates and alerts. Newsletter Archive

Input

Enter your prompt/response data, one per line, with a COMMA after the prompt

Options

Model

Tip: this model supports custom task types and languages — type your own into the dropdowns below.

Originality Scoring Method
Prompt Label Style
Language
Task Type
API Key (optional)
Elaboration Scoring Methods

Results

Originality ranges from 1-5, where 1 is minimally original, and 5 is maximally original.

ReadyNo rows queued

No results yet.


Cite

Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. Thinking Skills and Creativity, 49, 101356. https://doi.org/10.1016/j.tsc.2023.101356

Large Language Model Scoring Details

Models

  • ocsai2: Cross-lingual originality scoring model, trained on all available data. Uses cluster-based deduplication to score semantically equivalent responses the same across languages. GPT-4.1-mini base.
  • ocsai2-paper: Ocsai 2 model as reported in the paper (train split only, GPT-4.1-mini). Available for replicability of published results.
  • ocsai2-paper-xs: Ocsai 2 model as reported in the paper (train split only, GPT-4.1-nano). Available for replicability of published results.
  • ocsai-1.6: Ocsai 1.5 multilingual model retrained on GPT-4o-mini base. Available for replicability of published results.
  • ocsai1-4o: Ocsai 1 English-focused model on GPT-4o-mini base. Available for replicability of published results.
  • ocsai-1.5: Beta version of new multi-lingual, multi-task model, trained on GPT 3.5.
  • ocsai-davinci3: GPT-3 Davinci-size model. Trained with the method from Organisciak et al. 2023, but with the additional tasks (uses, consequences, instances, complete the sentence) from Acar et al 2023, and trained with more data.
  • ocsai-chatgpt2: GPT-3.5-size chat-based model, trained with more data and supporting multiple tasks. Scoring is slower, with slightly better performance than ocsai-davinci.
  • ocsai-chatgpt: GPT-3.5-size chat-based model, trained with same format and data as original models. Scoring is slower, with slightly better performance than ocsai-davinci2. For more tasks and trained on more data, use davinci-ocsai2
  • ocsai-babbage2: GPT-3 Babbage-size model from the paper, retrained with new model API. Deprecated, mainly because other models work better.
  • ocsai-davinci2: GPT-3 Davinci-size model from the paper, retrained with a new model API.

How does it work?

Ocsai uses supervised learning: models are fine-tuned on thousands of human-judged divergent thinking responses so they learn what originality looks like. Scores use a 1–5 scale, where 1 is very unoriginal, 5 is very original, and 3 is the median.

LLM-based scoring is a newer approach, and research into its strengths and edge cases is ongoing. Contact us with questions at peter.organisciak@du.edu.

Learn more about the research and methodology behind Ocsai