Saturday, September 20, 2008

Why gambling on online Poker sites is a bad idea

On top of being forbidden by law in many countries, gambling online is not the smartest idea one may have.

Poker is meant to be played in real life: you're supposed to see, even feel, your opponents. This process implies close surveillance: would you allow your real-life Poker friends to have their 'Poker maths' and 'Poker stats for dummies' at arm's reach. I'd certainly not.

When you're playing online, you may or may not be cheated in such ways. How would you? For instance, I know two people playing Hold'em together online, exchanging their hands over IM and sharing their profit. OK, it's far from organized crime... but we could easily imagine programs analyzing the board, eventually exchanging information with other programs playing at the same table, and reaching a decision together in an automated fashion.

Simpler, a bot could easily analyze the board, compute winning odds, and give the human player a decision hint.

For fun (and NOT profit), I coded a proof-of-concept program. It's meant to analyze tables of a well- known online Texas Hold'em program, compute and display odds. Inoffensive, but this would be the first and most important step of a "poker bot".

I prefer not to give names here... Anybody familiar with online poker will easily figure them out anyways. Here's the table (with many elements blurred out or removed):



The analyzer program output is:



Here, I'm triggering an analysis manually. Primitive OCR routines read table information, such as player positions, who's got the button, private cards, common cards, etc. Then, a separate component simulates thousands of games based on the information fed by the OCR component, and finally displays more or less useful statistics.

In our example, we have As,6h, the stage is (River;9s,Ad,4s,Tc,Jc) and we're competing against 3 other players. The estimated odds are 24.6% (for an average 25% since we're 4 players). This ratio may be used as a hint by a human player, or, if the program is extended (pot and players' stacks analysis, etc.), could be used as a building block for a fully automated poker-playing program...

This is a basic PoC code, nothing exceptionnal here. We could easily imagine that bad guys have poker bots in place; now, how would you feel being the sucker competing against 9 bots exchanging cards... play money's fine, real money's a different story.

Wednesday, September 17, 2008

Getting free Velib' w/o getting tired

A cool thing about Paris is its public bicycle system, called Velib'. The concept is fairly simple: for 30 euro a year, you can take and use any bicycle available around the ~1000 stations spread across the city.

The rules: You may take your Velib' at point A and give it back at point B. You may use it up to 30 minutes for free; if you use it more than that, you will pay (1 euro for the next half-hour slice, and so on).

The downside of the system is that a few stations, mainly these located at higher altitudes, were usually empty, whereas stations close to them, but at lower altitudes, were packed very quickly (for obvious reasons).

This problem was solved a few months back, when about 10% of the stations (*mainly*, the highest-located ones) were converted to V+ stations. The concept is simple: When you borrow a bicycle in a regular station and bring it back to a V+ one, you get extra credit. This virtual money can then be used to pay your extra time when you borrow a Velib' for more than 30 minutes.

Now, you see where I'm going! Some V+ stations are really close to regular stations, as you can see on this page. But how close exactly? And which ones are the best spots, ie, V+ super close to a V stations? Filling up an account with extra credit might not be that exhausting after all...

The GMap map used on the above page is fed with Velib' stations data from the following XML feed. It contains the station names, addresses, geo-coordinates, as well as flags (is the station in operation? is it a bonus station?).

If we calculate primitive euclidean distances between 2 stations using their (longitude; lattitude) coordinates, omitting the altitude, we could easily figure out, for each of the 90 operating V+ stations, which regular station is the closest...

Now, I'll spare you the code (very easy to write, check it out here), and will go straight to the best results. The format is (V+, V, distance):


09018 - PLACE PIGALLE > 09019 - VICTOR MASSE (0.000919)
09004 - ROCHECHOUART GERANDO > 09006 - TRUDAINRE ROCHECHOUART (0.000945)
18043 - BLANCHE > 09027 - FONTAINE DOUAI (0.001091)
14030 - LOSSERAND - PERNETY > 14104 - LOSSERAND BOYER-BARRET (0.001106)
18041 - MARTYRS 2 > 09017 - TRUDAINE MARTYRS (0.001140)
...


Let's visually check our best couple on the map, which are stations 09018 and 09019:



Yep, seems OK to me. One might go to Pigalle for *different* reasons after all...

Sunday, September 14, 2008

Rar(e) juicy bits

I recently spent time working on the Rar algorithm. Though the compression routines are not disclosed, Eugene Roshal freely provides the sources for the command line version of its decompressor. It’s the only real pieces of information available out there – the other documents you may find concern the archive file format itself.

A few context elements first: Several versions of the Rar program exist, along with different major versions of the algorithm behind it. The one we're interested in is version 2.9, the algorithm in use in Rar version 3. Version 2.9 compresses blocks of data using either a Lempel/Ziv or a PPM (Prediction by Partial Matching) algorithm.

The interesting thing is that special symbols can be found in the compressed data stream to notify the presence of VM bytecode. Yes, you read right: v29 makes use of a virtual machine. The VM code is interpreted after decompression, and is used for unfiltering the data.

To achieve better compression ratios, data is usually filtered/modified by a compressor program - ie, a filter's goal is to decrease the entropy of the data to be compressed. A very common example is when compressing x86 binary files: relative jumps and calls (0xE8, 0xE9) can be replaced by their fake, absolute equivalents. A jmp $+5 (E8 00 00 00 00 ) at offset 0x100 would be replaced by E8 05 01 00 00. This filter is used by various executable packers (ex: AsProtect), as well as compressors (ex: SevenZip, Rar) or installers (ex: InnoSetup).

Version 2.9 makes use of several filters, for images, videos, audio files, binary files, etc. However, in this very case, the unfiltering routine is encoded in the compressed stream itself, with the use of the VM bytecode. That’s where v29 potentially allows more flexibility than other compressor programs.

Available VM opcodes are the following (check out vmrar.hpp):
VM_MOV, VM_CMP, VM_ADD, VM_SUB, VM_JZ, VM_JNZ, VM_INC, VM_DEC, VM_JMP, VM_XOR, VM_AND, VM_OR, VM_TEST, VM_JS, VM_JNS, VM_JB, VM_JBE, VM_JA, VM_JAE, VM_PUSH, VM_POP, VM_CALL, VM_RET, VM_NOT, VM_SHL, VM_SHR, VM_SAR, VM_NEG, VM_PUSHA,VM_POPA, VM_PUSHF,VM_POPF, VM_MOVZX,VM_MOVSX,VM_XCHG, VM_MUL, VM_DIV, VM_ADC, VM_SBB, VM_PRINT, VM_MOVB, VM_MOVD, VM_CMPB, VM_CMPD, VM_ADDB, VM_ADDD, VM_SUBB, VM_SUBD, VM_INCB, VM_INCD, VM_DECB, VM_DECD, VM_NEGB, VM_NEGD, VM_STANDARD

Looks very similar to x86 opcodes, isn't it? If you go through this whole list, you may wonder what VM_STANDARD stands for? Well, without this opcode, reversing standard filters would be incredibly slooooow! VM_STANDARD instructs the VM that a standard filter is used, and that a hard-coded decoding routine can be executed to speed the process.

VM_STANDARD is not found in the bytecode stream per se, but is generated by the program when the bytecode stream exhibits characteristics such as particular length and crc:

VM_StandardFilters RarVM::IsStandardFilter(byte *Code,int CodeSize)
{
struct StandardFilterSignature
{
int Length;
uint CRC;
VM_StandardFilters Type;
} StdList[]={
53, 0xad576887, VMSF_E8,
57, 0x3cd7e57e, VMSF_E8E9,
120, 0x3769893f, VMSF_ITANIUM,
29, 0x0e06077d, VMSF_DELTA,
149, 0x1c2c5dc8, VMSF_RGB,
216, 0xbc85e701, VMSF_AUDIO,
40, 0x46b9c560, VMSF_UPCASE
};
uint CodeCRC=CRC(0xffffffff,Code,CodeSize)...


Here’s an example: The stream is being decompressed; the current symbol is 0x101, which instructs the decompressor that VM bytecode is to be read (check out unpack.cpp). The bytecode is processed; it contains VM initialization values, as well as a VM opcode sequence. Now, when the filter is to be applied, the opcodes could be processed one by one. Emulating hundreds of VM_MOV, VM_CMP and others is a computationally expensive task. Therefore, what the decompressor does first is to check the opcode sequence length and CRC against a list of known VM routines. These routines are for commonly used filters. If one of these standard routines is matched, then a VM_STANDARD opcode is generated, its operand being the VMSF_Xxx constant referring to the hardcoded routine executed instead of bytecode emulation.

So far, 6 standard filters routines are available:
- x86 jumps and/or call filters (VMSF_E8, VMSF_E8E9)
- Intel Itanium filter (VMSF_ITANIUM)
- RGB filter (VMSF_RGB)
- Audio files filters (VMSF_AUDIO)
- Image filters (VMSF_DELTA)
- Case filters (VMSF_UPCASE)

This choice of having a VM to run the filters offers extension possibilities yet to be seen, maybe via the use of plugins in later versions of Rar? One could then create a special filter, designed to achieve very good compression ratios with particular files. Not to mention that the filters can be organized in a filter stack, reused, and applied one after each other, etc. Quite powerful.

Last thing: how do you decide which filters are applied to your compressed data? Well, you quite don’t. The compressor will decide for you (let me remind you that they’re all closed source). Hmm... going back to Unrar source code, it appears cmddata.cpp contains the code responsible for handling command line parameters of Unrar. In fact, cmddata.cpp contains command line options for Rar and Unrar (it’s very likely that the sources are a subset of Rar+Unrar sources, and the compilation of a specific program is decided via a pre-processor #define).

You may notice that the switch –mc allows the user to force-enable/disable the use of a particular filter:

case 'M':
switch(etoupper(Switch[1]))
{
case 'C':
// ...cut...
switch(etoupper(*(Str++)))
{
case 'T': Type=FILTER_PPM; break;
case 'E': Type=FILTER_E8; break;
case 'D': Type=FILTER_DELTA; break;
...

PPM is a compression mode; it has the role of a dummy filter here. Let’s fire Rar command-line now, to see if these options are advertised:

$ rar -h
Usage: rar command -switch1 -switchN archive files ...
(cut)
mcX Set advanced compression parameters

What does "advanced compression parameters" mean? It seems RarLab is not quite ready to let their users decide what filters they would like to use, let alone code their own filters via the use of the VM.

If you want to know more, well, check out the Unrar source code, more specifically unpack.cpp/hpp, rarvm.cpp/hpp and cmddata.cpp.