Help — The Bookalyser

General questions

What is The Bookalyser?

The Bookalyser is a text analysis tool which combs through your text – which might be an entire book (fiction or non-fiction, up to about 200,000 words), a report or any other kind of long-form document) – and provides information about the words used in it, including helping to identify errors and inconsistencies. It is not a grammar or spelling checker – Microsoft Word and many other tools do those adequately. It does things that those grammar and spelling checkers don’t or can’t do!

Who is this for?

Anyone working with long-form text can benefit from The Bookalyser. It contains numerous reports which authors will find useful for improving or streamlining their writing; it is of great use to copy editors for ironing out inconsistencies of usage and identifying the text’s house style; for fiction, there are features currently in development to help structural editors get an overview of a narrative; and publishers will benefit from all of the above. No editor is perfect, and anyone can miss details – this software helps find many of those instances.

Will this tool mean I don’t need an editor?

The short answer is ‘no’! Any text will be improved by a human editor, and this tool can’t replace that – but what it can do is help polish a text before it goes to an editor (which can save self-publishing authors or publishers money) and spot things that even the best of editors might have missed (which therefore helps editors too). It was created by a copy editor (available for hire!), who uses it in his own work regularly, so it has been developed from a wealth of experience in the world of publishing.

What are the different packages? How much does this cost?

You can try a cut-down version of The Bookalyser for free. But the paid versions don’t cost much! It was created to help people, and pay for its own future development, not to fleece people. There are no ongoing subscription costs – what you pay is a one-off fee, and you will benefit from new features added in future.

You can find out more details, and pricing, here – for many users, the Author/Editor package will be sufficient; the Publisher option stores all of your reports for future reference (you might want to run a report before and after a new draft of a book, for example), and will have some exclusive extra features in the near future. Of course, you don’t have to be a publishing company – anyone can sign up to that option and get the extra benefits.

Who is behind this?

Hello! I’m Andrew Chapman, and I’ve been a professional editor for almost 30 years, working across a wide variety of books and magazines. I started writing this because I wanted a tool to help spot things like hyphenation inconsistencies which are very easy for even eagle-eyed editors to miss, and because I’m fascinated by text analysis generally. I also, rather inevitably, love books. You can find out more about my work here, and I’m available to hire – you’re welcome to contact me.

How do I use The Bookalyser? Does it work with Word?

Simply copy and paste your text into the form field and click the ‘Analyse’ button! The free version supports up to 10,000 words at once, but the paid version can analyse up to 200,000 words at a time in about 20–30 seconds. For details of the individual reports, see below.

In future, we hope to allow users to upload Word files. But not everyone uses Word anyway (many writers use Google Docs, Open Office or Scrivener, for example) – but everything works with copy and paste!

How you then make changes to your text is up to you – The Bookalyser flags up a mixture of definite errors/inconsistencies and potential ones. No software can automate the judgement required to know which is which. If you go through your report section by section, you can then decide which things to change in your text; these are often easily achieved by using your software’s find and replace feature.

If you are a Word user, we’re working on new features here that will speed up the correction process.

How to use the reports

Key statistics

These stats give a quick overview of your document. You may want to pay particular attention to the number of long sentences and average sentence length – a large number of long sentences can compromise the readability. Short sentences are a typical hallmark of the most commercial popular fiction.

Punctuation

This report indicates further raw statistics, and draws attention to potential inconsistencies that might need fixing. Exclamations are best used sparingly (and multiple exclamations should generally be avoided altogether). A high use of colons and/or semicolons might indicate the text is complex, and would therefore be best avoided in commercial genre fiction, for example (but might be fine in other genres). Hyphens shouldn’t be spaced – when used for parenthesis it should be an en- or em-dash. Some uses of hyphens and dashes also come down to house style.

Serial commas are a controversial subject. Also known as Oxford or Harvard commas, they are loved by some writers and editors and hated by others! Even if your house style is not to use them, they can still be useful to give a breathing space between long clauses. The statistics here at least give an indication whether they are widely used in the text or not.

Word choices

This section opens with a list of potentially overused words (if any are found): these are all common adjectives and adverbs which are used at least 10 times more than one would expect compared to written English in general. This is a good way – in fiction or non-fiction – to spot some habits which might weaken your writing.

The rest of this section is particularly aimed at writers of fiction (although adverbs and ‘filler words’ can equally be overused in non-fiction). There are five areas to consider:

Most creative writing experts advise against too many ‘-ly’ adverbs, favouring more creative use of verbs: eg ‘she ran quickly down the street’ becomes, perhaps, ‘she thundered down the street’.
It’s easy to reach for ‘filler words’ such as ‘really’ or ‘very’ – these can often simply be removed.
Dialogue tags: here, writing experts favour simplicity. A simple ‘said’, ‘asked’ or ‘replied’ is typically sufficient, and allows readers to imagine more of the scene for themselves. A particular bête noire of editors is the use of dialogue tags which don’t actually refer to speech at all, eg ‘That’s great!’ he grinned.
And then there’s the combination of (1) and (3): using adverbs with dialogue tags (he added smugly).
Further, writers are generally advised to avoid overuse of the passive voice. In fiction, in particular, it can rob a scene of immediacy and agency – although, that said, it can often have its place.

‘Overuse’ is hard to define, of course – the purpose of these reports is simply to show the data and allow an author or editor to exercise their own judgement. It’s easy to miss the fact that you’ve used the word ‘really’ 78 times, for example.

Frequencies

The frequency analyses offer various ways of seeing which words are used most, or are most ‘important’ (when compared to written English texts as a whole). Every text will tend to use words such as ‘the’, ‘a’, ‘and’, ‘of’ and ‘to’ most of all (and in fact research has shown that these can actually provide a ‘fingerprint’ of an individual writer’s style across their oeuvre). But the other reports here will reveal the most referred-to characters in a novel, for example, the main ‘subject keywords’ of a piece of non-fiction, phrases that are perhaps overused, and so on. The report on the opening words of sentences can be a useful way of spotting your habits as a writer, which you might want to vary. Everyone has them!

The information here might not reveal any cause for concern – but it will also show patterns you might not have noticed in your writing.

Parts of speech

This report provides a basic overview of the use of different parts of speech in the text. Full parsing of a natural language text would take too long, so this is an approximation for speed. This is mostly for general information, though could also highlight over-use of adjectives and adverbs (which here includes forms there than the ‘-ly’ adverbs reported on elsewhere), and possibly inconsistencies in a fictional text which is meant to be set entirely in the past or the present.

‘Glue words’ (a list of them is in the report footnote) are those little words such as articles, conjunctions and prepositions which keep the text hanging together – but too many of them means the text lacks information-bearing content. Lexical density is one way of measuring the proportion of information-bearing words, by totalling nouns, verbs, adjectives and adverbs (we include pronouns too); the 50 ‘glue words’ counted here are the most common of the other parts of speech (articles, conjunctions, prepositions etc).

Hyphenation

Hyphenation inconsistencies are among the hardest errors for authors and editors to spot across a large text. Testing this software has revealed plenty of cases where they have crept into published bestsellers! This report finds all cases where a hyphenated phrase has also been used elsewhere without hyphenation (either with spaces or with the words run together).

Note that not every case indicated will be an error – a human being still has to go through the list and check which ones are indeed mistakes and which are ‘false positives’. The latter are common simply because of the grammatical rules of English. For example, these sentences all use the same two words, but are all correct, despite using a hyphen in some and not in others:

That was a grown-up thing to do.
She was a grown-up now.
He had grown up since the incident.
Now they felt much more grown up.

Whether noun phrases should be hyphenated is often a question of house style – should it be ‘seatbelt’, ‘seat-belt’ or ‘seat belt’? If you are using a particular style book (the New Oxford Style Manual, for example, favours ‘seat belt’), this will indicate potential deviations; if not, it will help you decide what your own house style should be.

Spacing inconsistencies

This report is very similar to the one for hyphenation, except it will find inconsistencies between word pairs which are spaced in some places and not spaced in others. As with hyphenations, these can be false positives (‘Look out! There’s a lookout post over there.’) – but this tool can also help find inconsistencies which are extremely hard to spot by eye. As with the hyphenation tool, simply go through the list and check using your word processor’s ‘find’ feature.

US/UK English

These reports look for cases where the US spelling has been used in one place and the UK spelling in another, as well as looking for thousands of instances of British or American spelling and terminology. Once again, these tools are extremely helpful for spotting inconsistencies. Language evolves, of course – some ‘Americanisms’ are increasingly common in British usage, and other words and phrases may be down to house style. Note that the ‘-ize’ verb endings associated with US English are actually perfectly permissible in the UK too, and are favoured (or favored) by Oxford style (but not Cambridge). But the ‘-ise’ versions are only used in non-US English.

(Tools for spotting hallmarks of Canadian, Australian and New Zealand English are in the pipeline, by the way.)

The creator of this tool (hello!) is British, by the way, but has done his best to make it useful to all writers and editors working in English.

Numbers

Should numbers be expressed in words or figures? It’s often down to house style – typically, non-fiction uses words from one to nine or ten, and then figures; but fiction often works better with words up to a hundred. Every publisher has its own rules. This tool will spot inconsistencies, as well as potential errors such as the year 1603 being used in one place and 1604 in another.

There are further number-related tests in the ‘house style’ section.

House style

‘House style’ has been mentioned a lot in these FAQs. That’s simply a set of rules that a publisher – or indeed an individual author – has adopted for the way the text is presented. It typically encompasses issues such as how times of day and dates are written, whether certain phrases should be hyphenated, whether you should use St. James’s Church or St James’ Church and so on. Many editors work to particular style manuals such as the Chicago Manual of Style or the New Oxford Style Manual.

The reports here identify many common areas where house style is applied, and help you check whether that has been done so consistently or not. Note: if the report hasn’t found anything, it won’t be shown, so the number of reports shown in this section varies from text to text.

Variant spellings

This report is another of the ‘house style’ type – it finds cases where two different variations of the same word, such as ‘while’ and ‘whilst’, have been used. You may tolerate these, or your house style may demand that they are consistent. Once again, these can be easy to miss, and this tool will spare you the effort of looking for them.

Common confusions

Even the best writers occasionally use ‘affect’ when they meant ‘effect’, and so on. You may have used everything indicated by this tool correctly, but it checks the text for these sorts of words just in case. It’s up to whether you want to go through the text and check!

Misused phrases

This test will flag up common mistakes such as ‘could of’ instead of ‘could have’ and the like, so you can fix them using your word processor’s find and replace feature.

Redundancies

This report is perhaps about questions of taste – does it matter that you used ‘exact same’ instead of just ‘same’? Writing tutors advise avoiding superfluous words if you can, and keeping your language tight. This tool reveals potentially ‘slack’ usage where you could tighten things up, and it suggests the usual way in which one might do so.

Clichéd similes

There are thousands of clichés in common use. At the moment, The Bookalyser doesn’t look for general clichés in the interest of efficiency – this is something we may revisit later. For now, though, it will analyse your text for more than 360 common clichés of comparison – enough to make you as happy as a pig in clover.

Names and abbreviations

This tool typically generates a very long report. It combs through your text and lists (in alphabetical order) anything that looks like it’s a name or abbreviation. These may all be spot on, but casting an eye through the list can help to identify a spelling error, inconsistency or other inaccuracies. You can also see the frequency of each name or abbreviation, which might help identify where something has been overused, too.

Similar capitalised words

This is another report where it may turn out that you have got everything correct already – but it’s worth looking through. It finds every capitalised word in the text (apart from obvious ‘stop words’ such as ‘The’, etc) and matches it up with others that sound phonetically similar. There’s no way to avoid this listing a lot of ‘false positives’ – but when it does spot a genuine error, it can be incredibly useful. For example, it can help spot where the name of a character in a novel has been changed slightly (this is a real-life example) or spelling errors in names which many spellcheckers would not be able to identify.

General questions

How to use the reports

Share this: