Score with Ocsai

Ocsai uses fine-tuned LLMs to score divergent thinking responses, achieving up to r = .81 correlation with human raters (Organisciak et al., 2023). Large files are processed in chunks with live progress.

Results

Originality ranges from 1-5, where 1 is minimally original, and 5 is maximally original.

ReadyNo rows queued

No results yet.

Cite

Organisciak, P., Acar, S., Dumas, D., & Berthiaume, K. (2023). Beyond semantic distance: Automated scoring of divergent thinking greatly improves with large language models. Thinking Skills and Creativity, 49, 101356. https://doi.org/10.1016/j.tsc.2023.101356

Large Language Model Scoring Details

Models

ocsai2: Cross-lingual originality scoring model, trained on all available data. Uses cluster-based deduplication to score semantically equivalent responses the same across languages. GPT-4.1-mini base.
ocsai2-paper: Ocsai 2 model as reported in the paper (train split only, GPT-4.1-mini). Available for replicability of published results.
ocsai2-paper-xs: Ocsai 2 model as reported in the paper (train split only, GPT-4.1-nano). Available for replicability of published results.
ocsai-1.6: Ocsai 1.5 multilingual model retrained on GPT-4o-mini base. Available for replicability of published results.
ocsai1-4o: Ocsai 1 English-focused model on GPT-4o-mini base. Available for replicability of published results.
ocsai-chatgpt2: GPT-3.5-size chat-based model, trained with more data and supporting multiple tasks. Scoring is slower, with slightly better performance than ocsai-davinci.
ocsai-davinci3: GPT-3 Davinci-size model. Trained with the method from Organisciak et al. 2023, but with the additional tasks (uses, consequences, instances, complete the sentence) from Acar et al 2023, and trained with more data.
ocsai-1.5: Beta version of new multi-lingual, multi-task model, trained on GPT 3.5.
ocsai-chatgpt: GPT-3.5-size chat-based model, trained with same format and data as original models. Scoring is slower, with slightly better performance than ocsai-davinci2. For more tasks and trained on more data, use davinci-ocsai2
ocsai-babbage2: GPT-3 Babbage-size model from the paper, retrained with new model API. Deprecated, mainly because other models work better.
ocsai-davinci2: GPT-3 Davinci-size model from the paper, retrained with a new model API.

How does it work?

Ocsai uses supervised learning: models are fine-tuned on thousands of human-judged divergent thinking responses so they learn what originality looks like. Scores use a 1–5 scale, where 1 is very unoriginal, 5 is very original, and 3 is the median.

LLM-based scoring is a newer approach, and research into its strengths and edge cases is ongoing. Contact us with questions at peter.organisciak@du.edu.

Learn more about the research and methodology behind Ocsai