Weapons, a poem by KevBot:

When the reticle turns red,
The reticle turns red,
The guy is dead.

WARNING: This post assumes you have at least some idea of what conditional probability is, at least in the context of Markov chains. For those who do not know, here is my poor rough explanation: let's say someone has split a deck of cards into two 26-card piles. If I pick a card from either pile, I have 25% chance to get a diamond. But if I knew that the pile on the left only has red cards, and the pile on the right only has black cards, I know that the probably of picking a diamond from the left pile is 50% and 0% from the right pile. That's conditional probability: the change in probability of an outcome given some extra information. Probability is fun stuff, guys.

As you'll recall, I have something of a passion for creating things whose primary purpose is to cause general aggravation in my peers. The last time I updated my IRC bot, I added a Markov chaining algorithm whose output resembles English statistically rather than syntactically or grammatically (words that end in -tically are fun to say aloud).

Since my primary instance of KevBot ("living" in #keveX on irc.slashnet.org) has a database of nearly 100 000 sentences, it is infeasible to search through every single one of them to generate a Markov chain. Because of this I made a little heuristic that randomly picks a subset of sentences from its database and only checks that subset every time it needs to find the next word. Basically what it would do is search 4000 sentences for the current word, and if it was found, it would record the next word. Once it finds some nice next words, it picks one of those at random. Then it continues, using the last word as the new current word. It stops when it reaches 20 words or when it finds a period, exclamation mark or question mark. It's not truly a Markov chain because it doesn't work with the full set of sentences.

[Aside: While I could generate a true conditional probability table (you know, an actual Markov chain), it would take a very, very long time to calculate and would have to be recalculated at least partially every time a new sentence is added to the database. New sentences are added often enough that I never gave this option more than a few seconds' thought.]

Because the number of sentences is so large, KevBot has a very wide vocabulary and therefore is rather unlikely to create a phrase that makes any sense at all. Because it uses a different set of sentences for each word, there is a chance that it won't be able to find the current word in this set, which means it can't find the next word.

These things make the old Markov algorithm both incorrect and unnecessarily incoherent. The guys from #clone-army (on irc.rizon.net) asked me about why their instance of the bot was babbling such nonsense at them, and made a few suggestions about modifications to the algorithm.

I took a few of the ideas and got to work. An hour and two tacos later I had the new algorithm figured out. It was still a Markov chain, but worked on a rather different fundamental principle than the previous version: instead of resampling the full database of sentences for each word, it creates a set of sentences at the very beginning, and creates its chain from just those sentences.

This method has the advantage of being much faster than the previous version (since it works with many fewer sentences). It also can't miss a word, since by definition any word it finds is in a sentence it will be using on the next iteration (so even if the word "zebra" is only used once in KevBot's entire database, the fact that KevBot found "zebra" once means it will find it again). Since it's faster, I was able to increase the maximum sentence size from 20 to 30. And since it generates the sample list of sentences just once, I can parameterize how big that list is to change how random KevBot's responses are.

The fewer sentences KevBot uses to create output, the more likely it is to make sense (the poem above was generated from only two source sentences). The more it uses, the bigger the vocabulary and the longer the output.

Its biggest disadvantage is the small vocabulary domain. The wisdom command takes an optional parameter which is used as the first word in the chain if it is supplied. Since KevBot would only search, say, 100 sentences for that word, it is very unlikely that it would find it. I solved this problem by taking 5000 sentences at random, looking for the first word. If it finds it, it uses the last 100 sentences it found as a source. If it doesn't, then it gives up (to avoid an infinite loop).

The end result is very interesting. It's a true Markov chain because it uses all sentences in its sample (the conditional probabilities are correct for the sample, if not the whole database), so the output tends to be a lot more coherent than the previous version. Here are some examples:

<KevBot> a slow push through little plot, blam blam, action, go somehwere else retarted.
<KevBot> drum solo is also in the antenna
<KevBot> mine is to a brilliant idea, where you take into account a) how many dfeaths you have
<KevBot> an IQ of respects is stronger
<KevBot> a very thing is it more like a case-by-case statistical thing is C, 80-90 gets C, one standard deviation better is A, case-by-case statistical thing is supposed to the letters related
<KevBot> stpeeed on you are filled with cabinets, a lifeguard at math.

As always, you are welcome to appear in keveX to enjoy my bot's silly output as much as you'd like. It is indeed a lifeguard at math.