SCORM Shenanigans - PART DEUX

KnowBe4 Commissioned

Because of my work described in Brain Games - SCORM/suspend_data and xAPI/state, I gained the attention of another company looking to divine some wisdom from data they had collected. It has been a number of years since I did this work, but I just got another email thanking me for my suspend data work and I thought this might be helpful to someone.

KnowBe4 had commissioned a SCORM module with a quiz, and the module was created in Articulate Storyline. It delivered the quiz as intended, storing results in its internal suspend data storage - but the results were not accessible to anyone except Articulate. They commissioned a module to deliver a quiz, but could not access the results. How Articulate manages to keep convincing people to buy their product escapes me.

Gathering SCORM Test Data

My process began by requesting a series of very specific test results. I needed known results so I could map them to the data structures present within the suspend data. The datasets requested were as follows:

All correct answers
All incorrect answers
1st half incorrect, 2nd half correct
Alternating correct/incorrect starting with correct
Alternating correct/incorrect starting with incorrect

From these datasets I was able to tease out the data structure, but learned some valuable lessons along the way. Along with the quiz and the requested result datasets, I also requested a 65 question SCORM module so I could learn the base numbering system - but the created quiz stored data completely differently than the 19 question quiz they gave to their employees.

This Isn't Consistent - at All

The manner in which the data is stored is directly related to the way the quiz is created, the types of each question. I was able to write software to decode the data for THIS PARTICULAR quiz, but it wouldn't work on any other quiz data...sorry. The 2 different SCORM quizzes taught me that each will use its own delimiter. I present this code as inspiration, or possibly a starting point to solve YOUR version of this problem. Make no assumptions, what you think might be a delimiter could be data - and what you think could be data might be a delimiter.

One thing did become clear, the quiz data is stored separately of the progress data. Articulate stores it at the end.

After processing, this is an example of the SCORM quiz data extracted from a single test dataset:

Array
(
    [0] => g_default_Visited340034003400340034000000~20232103391Y34003400
    [1] => g_default_Visited0000021000~20132102olQ340034003400z7w0801k1
    [2] => g_default_Visited0000212000~26232102Uk~201340034003400q70020141
    [3] => g_default_Visited000021100~2s132102mc~2d1340034003400z7w0801k1
    [4] => g_default_Visited0000212000~26232102il~20134003400g600101
    [5] => g_default_Visited000021200~2j132102sf~2413400340034003400z7w0801k1
    [6] => g_default_Visited00000211000~27232103fW0~2013400q70020141
    [7] => g_default_Visited0000212000~28232103uj1~20134003400q70020141
    [8] => g_default_Visited000021200~2r132102vq~2c1340034003400z7w0801k1
    [9] => g_default_Visited000021200~2g132103HF0~2013400340034003400z7w0801k1
    [10] => g_default_Visited0000212000~2W132103Q31S34003400g600101
    [11] => g_default_Visited000021100~27232103ZY3~20134003400q70020141
    [12] => g_default_Visited00000210000~27232103Vw1~2013400g600101
    [13] => g_default_Visited000021100~24132102noU34003400z7w0801k1
    [14] => g_default_Visited0000211000~27232103WF1~20134003400q70020141
    [15] => g_default_Visited000021200~28132102DmY3400340034003400z7w0801k1
    [16] => g_default_Visited00000211000~2c132103eA0$340034003400340034003400340034003400q70020141
    [17] => g_default_Visited0000~26232102$g~20134003400q70020141
    [18] => g_default_Visited000021100~2f132102ii~201340034003400z7w0801k1
    [19] => g_default_Visited00000210000~2b332103XE0~242340034003400g600101
    [20] => g_default_Visited0000021000~2j132102Ms~241340034003400z7w0801k1
)

An Amusing Anomaly

(That had me scratching my head for hours)

Array items 6&7 (questions 7&8) were anomalous in the dataset. My theory is that they were originally in the opposite order, and were changed in Storyline - but the data was stored in the original order. So looking at the ->grade property of the class for this particular dataset, you would see NOT alternating correct/incorrect for items 6&7. Something happened to the module to transpose those two answers, and it was very confusing until I noticed that it occurred identically in both of the alternating datasets.

Array
(
    [0] => 0
    [1] => 1
    [2] => 0
    [3] => 1
    [4] => 0
    [5] => 1
    [6] => 1
    [7] => 0
    [8] => 0
    [9] => 1
    [10] => 0
    [11] => 1
    [12] => 0
    [13] => 1
    [14] => 0
    [15] => 1
    [16] => 0
    [17] => 1
    [18] => 0
)

My Apologies

I really wanted to create a class that answered the big question, but sadly I could not. It turns out that the data structures change module-to-module. It would take a long time to figure this problem out, but that's not what I was hired to do. I would need many different examples of every kind of quiz question and combinations of each. Maybe someone would like to pay me to completely ruin Articulate's storage algorithm.

The Class

KnowBe4 was interested in one thing - getting the scores. They deployed their SCORM quiz, made every employee take it - and then were forced to sit on unusable data until they contacted me to request assistance. In case you're wondering - I love this kind of stuff. This was a great puzzle to solve, and it took me the better part of a week to do it.

I made the properties public for debugging, the only public function is score, which should be self explanatory. If only Articulate would be the good guys and release a spec on their encoder/decoder - this wouldn't be necessary. Everyone else stores SCORM data in easily accessible formats.

<?php

/* KnowB4 iterativeParser Class
* author: Michael Richey
* license: GPLv3
*/

class iterativeParser {
public $raw = '';
public $lines = array();
public $grade = array();
private $regex = array(
// regex to identify the quiz start marker
'q1'=>'/(?<data>g_default_Visited3[340]+~(.*?)(3210|1541)(.*?)(\^8_default(.*?)){0,1}(?=(\^|r7|q7)))/',
'qa'=>'/(?<data>g_default_Visited0000(.*?)(?=(\^|$)))/',
);
public function __construct($string) {
// save the string for later processing
$this->raw = $string;
// first, grab all of the relevant data items
$this->capture();
// iterate through them to determine status
$this->crawl();
}
private function capture() {
$matches = array();
// grabbing the quiz start marker so we can ignore the slides/questions preceeding the quiz
// PREG_OFFSET_CAPTURE offers the starting character position of the quiz
preg_match($this->regex['q1'],$this->raw,$matches,PREG_OFFSET_CAPTURE);
$this->lines[] = $matches['data'][0];

$offset = $matches['data'][1];

// start capturing at $offset to prevent capturing the opening slides and video data
preg_match_all($this->regex['qa'],$this->raw,$matches,0,$offset);
// appending the found data to the lines property
$this->lines = array_merge($this->lines,$matches['data']);
}
private function crawl() {
$first = false;
foreach($this->lines as $key=>$line) {
// question 16 has a 2nd page that always displays and records data, so we skip it
if($key == 17) continue;
switch($key) {
case 0:
// question 1 records incorrect answers on a entry that does not exist when answered correctly
$grade = (int)preg_match('/8_default/',$line);
if($grade) {
$this->grade[] = $grade;
$first = true;
}
break;
case 1:
if(!$first) {
$this->grade[] = $this->zTest($line);
}
break;
case 11:
$this->grade[] = (int)!preg_match('/(1|2)00~/',$line);
break;
default:
// if the question line contains z7 - it's incorrect, we send the inverse integer of true/false to the grade array
$this->grade[] = $this->zTest($line);
break;
}
}
}
private function zTest($line) {
return (int)!preg_match('/(z7)/',$line);
}
public function score() {
return array_sum($this->grade)/count($this->grade);
}
}

Development