Porfolio status: ONLINE

So, I’ve just created a portfolio page for listing some things I did, just click the link named “Portfolio” (duh) at the top of the page. Eventually new stuff will be added, some day.

Tricky Python bugs: comparison madness

Python is a great language for a lot of tasks which are not performance sensitive, and even with this kind of task it can somewhat be efficient (think Cython or NumPy). In this post I’ll talk about a problem that can be easily overlooked and can turn into a debugging madness if you’re not aware of it: comparison errors.

For the sake of argument, let’s say you need to implement a function to compare two integers. This can be easily achieved:

def compare(n1, n2):
    return n1 < n2

This function returns True if n1 < n2, and False otherwise. Simple enough, right? There is a crucial detail, though: it will only work properly if both parameters are integers. What about one of them is a string?

In [1]: compare(1, 2)
Out[1]: True

In [2]: compare(2, 1)
Out[2]: False

In [3]: compare('1', 2)
Out[3]: False

In [4]: compare(1, '2')
Out[4]: True

In [5]: compare(3, '2')
Out[5]: True

That may seem an absurd use of the function, but think about a larger problem or system: you will be passing a lot of things around inside variables, and Python just won’t complain if the variables are not of the same type, resulting in a silent and hardly visible bug in your code. In fact, anything can be compared, and the function will happily run and return with no error:

In [6]: compare(list(), 3)
Out[6]: False

In [7]: compare(3, dict())
Out[7]: True

In [8]: compare(list(), dict())
Out[8]: False

In [9]: compare(str(), dict())
Out[9]: False

In [10]: compare('str', dict())
Out[10]: False

This seemingly innocent function will work regardless of what you pass to it, and it takes only one unnoticed wrong parameter to skew the result. The best solution is the obvious one, which is passing the proper types, but if you cannot have this guarantee you can detect these kind of errors using the type() builtin function. An example follows (not the most efficient one, but it will suffice):

def compare(n1, n2):
    if type(n1) != int or type(n2) != int:
        # handle error, e.g. throw an exception
        raise RuntimeError("one of the parameters is not an integer")
    return n1 < n2

In [23]: compare(1, 2)
Out[23]: True

In [24]: compare(1, '2')
--------------------------------------------------
RuntimeError
/tmp/ in ()
----> 1 compare(1, '2')

/tmp/ in compare(n1, n2)
      1 def compare(n1, n2):
      2     if type(n1) != int or type(n2) != int:
----> 3         raise RuntimeError("one of the params is not an integer")
      4     return n1 < n2

RuntimeError: one of the params is not an integer

Always remember to use comparisons with the proper care in Python, or you may end losing a lot of time tracking a nearly invisible problem (true story).

UPDATE: msi provides this helpful bit in the comments:

BTW: This is fixed in Python 3.X:

The ordering comparison operators (<, =, >) raise a TypeError exception when the operands don’t have a meaningful natural ordering. Thus, expressions like 1 < '', 0 > None or len <= len are no longer valid, and e.g. None < None raises TypeError instead of returning False.

Unity: from zero to Pong

In my last post I’ve talked about some game APIs, and could achieve some progress with MonoGame: an animated sprite, controllable with an X360 controller. It took some effort but I’ve learned a lot of things and also regained some knowledge of C# (I’ve worked as a .Net developer a long time ago). But from there, I realized that writing a game practically from scratch is no walk in the park, and even if I were able to spend my whole day coding I’d still need a immense amount of time and effort to do something even remotely playable.

So, why not stand in the shoulder of giants? Enter Unity, which in their words is a “game development ecosystem”. Although it is meant for 3D games, it is still possible with a bit of effort to make 2D ou 2.5D ones, and it uses C# (also JS and Boo). A perfect fit for my next step of doing, er, something.

After searching for some tutorials and watching these videos I was able to make a basic Pong clone in the space of three days. I’ve put the project online at GitHub (https://github.com/baraujo/pong-clone), and this is what it looks like:

pong-clone

You can try for yourself on http://baraujo.net/pong. I was able to run it on WIndows, Linux, Android and the Web player, where I took the screenshot above. Running it on Android was tricky because of the touchscreen, which makes the paddle control not so trivial, but with a bit of math and patience it became somewhat playable. I intend to keep improving it, adding a big score display, just like the classic Pong, an improved CPU player (and the option to have a second player in the same keyboard/mobile) and various adjustements. I’ll update this post when these improvements are finished.

short story: a single shot

So, while I’m still gathering material for the technical post, I’ll leave here a short story written in a very idle moment. It’s written in Portuguese (my native language, BTW), maybe one day I’ll translate it to English, but not today.

*.*.*

a single shot

Essa era a sua chance. Sua última chance. Caso acertasse, poria fim a esta situação inteira, salvaria diversas vidas, poderia ganhar diversas condecorações e ainda sairia como o heroi da história, pela primeira vez em toda a sua vida.

Exceto que não seria apenas a primeira, mas também a última, e ele não seria um heroi, mas um mártir.

Esta guerra está perdida. Escondido nas sombras, com o que restou dos seus companheiros, os pensamentos aceleram e uma ideia começa a ganhar força. Ele poderia incapacitar permanentemente o acampamento inimigo, mas a qual custo? Uma missão suicida era a única alternativa, se esgueirar durante a noite e plantar explosivos no depósito de armas. A confusão seria tamanha que um ataque surpresa destruiria o acampamento de homens cansados e mal nutridos, não muito diferentes dele mesmo, e garantiria a passagem pela fronteira, a apenas alguns quilômetros naquela direção, objetivo que poderia ser cumprido antes do amanhecer. Quanto a ele próprio, estaria no meio de um monte de soldados apenas esperando para puxar o gatilho na direção de alguém com um uniforme ligeiramente diferente. E não demoraria até que isso acontecesse, teria sorte se tivesse uma morte rápida.

Mesmo com uma grande chance de sucesso, seu plano enfrentara a resistência ferrenha dos seus compatriotas. “Não vale a pena se arriscar por algo que tem pouquíssimas chances de dar certo”, dizia um deles. De fato, caso o plano não funcionasse, ele já tinha pleno conhecimento que seus inimigos não costumam fazer prisioneiros de guerra.

O que fazer? Ele teria que agir duplamente escondido – dos inimigos e de seus compatriotas. Só poderia contar consigo mesmo para planejar e executar o próprio plano à revelia de todos. Precisava pensar rápido porque o dia se aproximava e então já seria tarde demais.

Quase todos estão no chão, alguns dormindo e outros exaustos. Poucos estão de pé, vigiando. E ele, passando ao largo de todos sem alertar uma única alma da sua presença. Nunca o treinamento de guerra na selva tinha sido tão útil, ele era praticamente uma sombra passando de uma árvore a outra. Carregava a mochila secretamente preparada, com explosivos suficientes para derrubar um pequeno prédio, e ia em direção ao seu objetivo. No acampamento inimigo uma situação semelhante ao seu próprio acampamento, com poucos de pé, e ele facilmente chega até a larga tenda abrigando vários tipos de munição e granadas. Joga a mochila lá dentro, vai até a distância máxima que o transmissor funciona, algumas dezenas de metros, se põe atrás de uma árvore, e fecha os olhos. Em seguida, fecha o mecanismo que carregava na mão direita.

Era claro como o próprio sol, e ruidoso como um vulcão. Os companheiros levantam-se repentinamente sem saber o que fazer naquela confusão mas em poucos segundos tudo fica claro. O plano suicida tinha sido executado. Nada mais poderia ser feito em relação ao futuro mártir, senão reunir o que restava de forças e partir com tudo pra cima do acampamento completamente desordenado. Em questão de minutos, os poucos soldados agindo de vigias foram facilmente alvejados e os restantes rendidos facilmente ou derrubados como moscas. Enfim, uma vitória, uma pequena vitória no meio daquela guerra. E praticamente todos conseguiram ultrapassar as trincheiras e cruzar a fronteira pesadamente defendida pelos seus aliados.

Praticamente. Só um soldado não cruzou a fronteira, e foi dado como desaparecido. Ao invés de condecorações, uma condenação à revelia por insubordinação grave e deserção, crimes que o conduziriam à corte marcial caso estivesse presente. Fuzilamento. Ao invés de herói ou mártir, insubordinado e desertor. Ele havia se tornado um pária, rejeitado no seu próprio país e caçado como um animal pelo país inimigo.

O clarão da explosão mostrou a ele um pouco da selva além, onde ele podia observar um riacho e um pequeno bote amarrado às margens. Desamarrou o bote e seguiu em frente, ciente de todos os crimes que havia cometido aos olhos dos seus compatriotas. Sem ter pra onde ir, passou dias na mata, utilizando tudo o que aprendeu no curso de sobrevivência na selva para manter-se vivo. Quando estava já à beira do colapso físico e mental, chegou a uma pequena vila, isolada e alheia aos acontecimentos dos dias anteriores, onde foi recebido simplesmente como um homem perdido na mata e nada mais. Foi dado a ele um lugar para dormir e algo para comer, e depois de muitos anos ele foi tratado não como subordinado ou inimigo, mas como igual. E decidiu, naquele momento, iniciar uma nova vida, naquele lugar distante onde o horror da guerra não havia ainda chegado.

Vários anos depois ele era apenas um nome num arquivo morto do tribunal militar do seu país, e o inimigo havia desistido de procurá-lo. Mas sabe-se que ele viveu um por um bom tempo depois disso e que talvez ele até tenha formado uma família, segundo relatos dos habitantes da antiga vila – hoje cidade – onde ele passou o resto dos seus dias. E finalmente ele foi reconhecido como um herói de guerra devido a quantidade de vidas que ajudou a salvar. Apenas com o tempo ele teve o seu esforço reconhecido, assim como muitos.

No breve momento em que sua mão fechava o mecanismo, só havia um pensamento na sua cabeça: “Essa é a minha única chance de salvar, ao invés de destruir”.

Developing games: the first steps

So, after playing great indie games like Fez, Super Meat Boy and Braid, and also seeing a documentary about the people behind them, I finally decided to start making games, instead of just playing them. In fact, the real motivation for me to become a Computer Scientist was just that: make games. But as time passes, so does your goals, and sometimes you’re too distracted by the other aspects of your life to remember the ideals of the past. And so, I studied many years to become a Scientist and worked other many years accumulating experience as a software developer, working on projects big and small, open source and proprietary ones.

I’ve played hundreds, maybe a thousand games over the course of my life and various videogames and PCs, and around 9 years old I was copying BASIC games from small books into my Hotbit computer, watching in awe how these hours of inserting code could become something I could control and interact. Fast forward 21 years and I’m here again, with modern computers, languages and algorithms, but the same purpose: transform code into something I can play. But with a fundamental difference: I’ll make my own games now.

After searching for some tools, these are the ones I’m trying to learn and code right now:

  • cocos2d-x: A multiplatform library for making 2D games. As a port from cocos2d-iphone, its focus seems to be mobile platforms, and most of the examples and tutorials around the ‘Web talk about just that. Its C++ API looks very complete but also very complex, with classes to do almost anything for you; lots of examples show the power of this framework for the ones willing to learn.
  • Polycode: According to the official Web page, “a free open-source framework for creating crossplatform games & interactive applications”. It has C++ and Lua APIs, based on various open source libraries, along with an IDE for managing game content and developing Lua programs. Although it is not quite yet finished its API looks very nice, with less verbosity than cocos2d-x for the same tasks; while I wait for a more proper release, I’m learning to do some simple examples. Has 2D and 3D capabilities.
  • MonoGame: A open source implementation of Microsoft XNA, which itself is a API for game development in C#. Most of my life I’ve worked with Linux and open source software and platforms, and although technically I could use MonoGame in Linux, it is more easily done using VS2010/2012 on Windows, which in itself is a departure from these many years. But, as I started to learn how it works, I’m trying to be more open-minded and not dismiss it right away because it is a port of a Microsoft technology; instead, I’m approaching MonoGame with a positive attitude and trying to do something concrete with it. As a matter of perspective, Fez and Bastion were made with XNA and later ported to other platforms with MonoGame.

I’m approaching each one of them in turn, trying to make a simple movable sprite in each API. So far, I’ve managed to do it with MonoGame, which is where I could advance more, and stumbled in some hurdles in cocos2d-x and Polycode. Still, I think these are all powerful tools for easing the process of making a game, and their creators have invested (or are still investing) a lot of time and work. Maybe I could use other tools in the near future or join skilled people with the same goal, allowing me to focus on the programming itself. But for now, back to the drawing board and to the code editor.

Parsing a very large XML file with Python

In the process of doing some experiments, I was looking for a suitable text corpus to use; the ideal candidate would have a few GB and would not cause any monetary loss on my wallet. Stumbling around I found out that Wikipedia has various dumps of its pages, and decided that the 2013 dump would be a suitable bag of text for my experiments – a single XML of 9GB, which by the way is the compressed size. The actual size is 42GB. I had in my computer the largest XML file I’ve seen in my life, and I didn’t have space left in disk to even extract it – parsing it would be a hell of a job.

Fortunately I already knew how to parse XML with the nice lxml Python module, but how to proceed when I just cannot extract the file? It turns out that there is a module called bzr which exposes an interface for opening .bz2 files in the same fashion the open() function does. After some hours of struggling, searching around several sites, blogs and forums and filling all of memory + swap more than once, I managed to parse the file without blowing up the RAM with this Python script:

from lxml import etree
import sys
import bz2
import unicodedata

TAG = '{http://www.mediawiki.org/xml/export-0.8/}text'

def fast_iter(context, func, *args, **kwargs):
    # http://www.ibm.com/developerworks/xml/library/x-hiperfparse/
    # Author: Liza Daly
    # modified to call func() only in the event and elem needed
    for event, elem in context:
        if event == 'end' and elem.tag == TAG:
            func(elem, *args, **kwargs)
        elem.clear()
        while elem.getprevious() is not None:
            del elem.getparent()[0]
    del context

def process_element(elem, fout):
        global counter
        normalized = unicodedata.normalize('NFKD', \
                unicode(elem.text)).encode('ASCII','ignore').lower()
        print >>fout, normalized.replace('\n', ' ')
        if counter % 10000 == 0: print "Doc " + str(counter)
        counter += 1

def main():
    fin = bz2.BZ2File(sys.argv[1], 'r')
    fout = open('2013_wikipedia_en_pages_articles.txt', 'w')
    context = etree.iterparse(fin)
    global counter
    counter = 0
    fast_iter(context, process_element, fout)

if __name__ == "__main__":
    main()

Update (06/07/2013): This code is now available as a Github gist.

The output format is a single file with one document per line. Probably it would be better to put all these lines in a DBMS or a Berkeley DB file, as the lines can be really, really big, but for now the plain text format will suffice. Every document is normalized to contain only ASCII characters, avoiding potential encoding problems, but besides that no further processing is done. I parsed only the tag “text” from the XML, but if you need to parse more than one tag change TAG into a dict:

TAG = {
    '{http://www.mediawiki.org/xml/export-0.8/}text',
    '{http://www.mediawiki.org/xml/export-0.8/}title'
}
(...)
        if event == 'end' and elem.tag in TAG:

The URL enclosed in brackets is the namespace of the tags, which lxml annoyingly insists in using, so I just ended using the “full name” of the tag I needed. Also, I know that globals are evil, I just wanted to easily count the number of documents processed and I was too lazy to properly write a class. I killed the process when the output file reached around 20GB, which I considered big enough for my experimental needs. Now all that is left is to actually run the benchmarks and see what happens.

And just for the record: I’m not a Python specialist, feel free to point eventual problems or bottlenecks on this code.

Welcome! Welcome to City 17…

Yeah, I know I’ve used this little Half-Life joke before, but it never gets old. Anyway, this will be here for some time, while everything is being calibrated. Ciao!

 

Image159-001