Forum: Author Hangout

Some Interesting Statistics / Observations

DF 🚫

As a coding exercise, I looked at various relationships in the data from my stories (n=28). I used standard readability measures such as Flesch Reading Ease (FRE) and Flesch Kincaid Grade Level (FKGL). I also looked at the story's score (from SOL), the number of words in the story's description, the number of "dirty words" (a brain dump of ~55 sexual or curse words), and the count of poly- and mono-syllable words.

The ease-to-read factors indicate that the easier a text is to read the higher the score. Note: The higher the FRE the easier the text. The higher the FKGL the more difficult the text (so a negative correlation).

FRE | score : correlation = 0.29, p>0.05, not significant (p=0.14)
FKGL | score : correlation = -0.17, p>0.05, not sigificant (p=0.38)

The number of "dirty words" that appear in the copy has a positive correlation with score.

dirty words | score : correlation = 0.11, p>0.05, not significant (p=0.56)

REP questioned the relationship between a story's score and the number of words in the description so I added that variable to the mix. It turns out there is a strong positive correlation.

number words in desc | score : correlation = 0.40, p

Replies: Switch Blayde DF

Switch Blayde 🚫

@DF

The ease-to-read factors indicate that the easier a text is to read the higher the score.

I would say that is true for reading comprehension (as in showing vs telling) in addition to things like vocabulary and sentence structure.

Of course the why is unknown. It may be reading level. It may simply be English not being the reader's primary language.

Statistics can create false truths. I got drunk drinking scotch and water. I also got drunk drinking bourbon and water. And also vodka and water. Scotch was only involved 33% of the time. Bourbon only 33% of the time. Vodka, 33%. But water was involved 100% of the time so, statistically, it must have been the water (that was tongue-in-cheek).

For those of us who are not statisticians, can you give your findings in layman's English?

Replies: DF

DF 🚫

@Switch Blayde

In general, the easier a text is to read (based off the reading ease scores), the higher the SOL score. This is not a significant relationship so we cannot infer too much on this one. However, it is consistent with early readability research. Magazine and newspaper editors started to use these formulae to measure the ease of reading their content. They found that when the content was easier to read, they sold more product.

More interesting is the correlation between the number of words in the description and the score. For my stories, this is a strong positive correlation -- the longer a story's description the higher the score. I think this is being driven (mostly) by a couple of very short descriptions that are also my lowest-rated stories.

If I get motivated, then I may try and do an analysis with a larger sample size and across multiple authors to see if these patterns hold or something new emerges.

Hope that helps ... a little maybe?

Replies: Dominions Son Switch Blayde

Dominions Son 🚫

@DF

More interesting is the correlation between the number of words in the description and the score. For my stories, this is a strong positive correlation

You left off the p value so we can't judge how significant this is. P value is the probability that the correlation could occur by random chance. No mater how strong the correlation, if it lacks significance, it's meaningless.

However, I will say that I think there's a plausible causal mechanism for a relationship between length of description and score.

Specifically, a longer description is more likely to attract the intended audience and less likely to attract people outside the intended audience.

Replies: DF

DF 🚫

@Dominions Son

You left off the p value so we can't judge how significant this is.

No I didn't. Go back to the first post. All the p-values are reported. The third post is a follow-up to Switch Blayde to describe in more layman terms.

Switch Blayde 🚫

@DF

Hope that helps ... a little maybe?

Yeah, thanks.

Maybe the author who spends the effort writing the description also spends the effort writing the story. Maybe that's the correlation

Longer stories seem to get better scores. Maybe longer stories have longer descriptions.

Or maybe it's what DS said.

DF 🚫

@DF

number words in desc | score : correlation = 0.40, p

Ahh, sorry DS - I have p-values on everything but this one! Don't know how/why this one got dropped.

number of words in desc | score: correlation = 0.40, p

Replies: DF

DF 🚫

@DF

OK - There is a bug here ... I definitely just entered the p-value in my follow-up post which means I probably had it in the original, too. The system is automatically dropping the last line.

The p-value for number of words | score is less than 0.05 (I think it is ~0.03 but that is from memory so not 100% sure).

One more line to make sure if my post is cropped nothing important is lost.

Replies: awnlee jawking

awnlee jawking 🚫

@DF

One more line to make sure if my post is cropped nothing important is lost.

You may be a victim of SOL's tendency to assume a less than sign is an opening angle bracket.

AJ

Replies: helmut_meukel

helmut_meukel 🚫

@awnlee jawking

You may be a victim of SOL's tendency to assume a less than sign is an opening angle bracket.

To avoid this enclose the < in blanks.

HM.

Reply to topic

Forum: Author Hangout

Some Interesting Statistics / Observations

WARNING! ADULT CONTENT...