100 Posts on OnlyOnesAndZeros – Behind the Scenes

100 Posts on OnlyOnesAndZeros – Behind the Scenes

I recently celebrated 100 posts on OnlyOnesAndZeros and created this post analyzing the posts I have written so far and sharing some statistics about them. Here’s how I did it.

First, I used a WordPress plugin called WordPress Exporter to perform a custom XML export of the 100 posts that I have published so far. If you’re wondering, the archive was about 492 kilobytes and contained 4,708 lines and 382,657 total characters. Next, I created a JavaScript program called XMLtoJSON1 to convert my XML document to a JSON-formatted array. This allowed me to more easily manipulate the data using JavaScript.

Next, I had to analyze the text data from each post. There are existing text analysis programs, but they are mostly designed for a single text corpus, while I wanted to compare the content of each post. For this purpose, I created another JavaScript program called WordData2.

First, I removed the HTML content from each of the posts to avoid issues when analyzing the text. In the future, I may include images, links, and specially formatted content in the program.

function removeHtml(html) {
	var temp = document.createElement("div");
	temp.innerHTML = html;
	return temp.textContent || temp.innerText || "";
}

A JSON tree is created to store the data as it is generated:

statistics = {
	items: [

	],
	total: {
		words: [

		],
		totalWordCount: 0,
		averageWordCount: 0,
		mostWords: {
			item: undefined,
			wordCount: 0
		},
		leastWords: {
			item: undefined,
			wordCount: 0
		}
	},
	words: {}
};
The most intricate UI I have ever designed. Featuring this post.

Finally, the content of each post is looped through to determine information such as the average word count, most commonly used words, and posts with the most words3.

for (var i = 0; i < input.length; i ++) {
	statistics.items.push(
		{
			words: [

			],
			wordCount: 0
		}
	);
	statistics.items[i].words = input[i].split(" ");
	statistics.items[i].wordCount = statistics.items[i].words.length;

	for (var j = 0; j < statistics.items[i].words.length; j ++) {
		var word = statistics.items[i].words[j];
		statistics.total.words.push(statistics.items[i].words[j]);
		if (statistics.words[word]) {
			statistics.words[word] ++;
		}
		else {
			statistics.words[word] = 1;
		}
	}
	statistics.total.averageWordCount += statistics.items[i].wordCount / statistics.items.length;

	if (statistics.items[i].wordCount > statistics.total.mostWords.wordCount) {
		statistics.total.mostWords.wordCount = statistics.items[i].wordCount;
		statistics.total.mostWords.item = statistics.items[i];
	}
}

The output is a JSON object containing the data from the analysis of the posts.

The source code for both programs is available on GitHub (I may do more with them in the future):

I actually first started working on the 100 posts post in early May 2018, and not finishing it was the main reason I fell out of the rhythm of posting for a few months. It feels good to finally be finished with it and moving forward. Happy Holidays!

  1. Originally called “XMLclean.”
  2. I was originally going to call the program “WordStats,” but “WordStat” was already taken, and I don’t really feel like dealing with a lawsuit.
  3. Apologies for not being more specific, but most of the work I did on this program was over 7 months ago and I’m not really sure what I was doing.