Sunday, November 20, 2011

cps (count per second)

I just published this small linux utility named "cps" that let you count the number of lines and bytes per second piped into the utility. I wanted to count the number of lines appended to a file (like a log), but couldn't find any handy tool to do it, so here it is: https://github.com/appaquet/cps

Monday, February 21, 2011

GoStore going OpenSource

This post pretty much signals the end of my software engineering degree: finally! In a couple of weeks, I'll be done with school and will be working full time :) !!!

Anyway, this post isn't really about the end of my degree, but about my end of degree project. My work at Wajam made me discover the world of distributed and scalable systems and I decided to dive into this world by creating my own distributed system. As I like learning new technologies, I decided to create a distributed file system in Go, a new programming language announced by Google at the end of 2009. Go combines the simplicity of a garbage collected language, the speed of a natively compiled language and new concurrency mechanisms. It's not an object-oriented language since it doesn't support inheritance, but it supports polymorphism using interfaces.

Originally, GoStore was only a distributed file system with basic features (read, write, delete, etc.), but as I was programming it, I decided to make its communication core generic. Using this generic core, it's possible to easily create distributed systems in a clustered environment. GoStore uses remote procedure calls (RPC) between each node that can be sent over UDP or TCP, depending on calls payload, and uses a custom binary protocol.

You can browse GoStore code on GitHub, but as the disclaimer says, it's a toy project and really not mature.

Tuesday, January 5, 2010

Blocking django admin to non-admin

Django is a fantastic web framework that makes web programming a breeze. One of its key features is the admin site which lets you create an admin panel almost without a line of code. But one of the problem I had with it is that the admin login page can be accessed from everyone, even if you are not an admin. Since I don't want people to see that page, I was looking for a way to disable the admin site and show it only if your have the superuser flag on your user, but didn't find anything useful. The solution is quite simple... Simply create a middleware that will disable any admin view unless your are logged in with an user with a superuser flag. Here is how your middleware should look like:

from django.http import Http404

class AdminDisableMiddleware(object):
    def process_view(self, request, view_func, view_args, view_kwargs):
        full_view_name = '%s.%s' % (view_func.__module__, view_func.__name__)
        if full_view_name.startswith('django.contrib.admin') and not request.user.is_superuser:
            raise Http404()
Note that I'm raising a http404 error, which should fool people trying to find the admin pages.

Thursday, December 3, 2009

Midealy

Finally, the project I have been working on for months now is online! Ok, it’s a closed alpha release, but it’s a major step for us! So, what is this all about?

The idea behind Midealy is to connect innovators, resources and organizations together and help innovators develop their idea collectively. First, it uses the power of social networking to match innovators to the people they need to develop their ideas. It also provides tools for those innovators to develop their ideas (wiki, checklists, etc.) and a way to tell the community what they need to develop their ideas (human resource, technical resource or financial resource). Those needs are shown on a dedicated section of Midealy on which people can find need they can fulfill and then join the idea. Finally, organizations can use Midealy to resolve their problems by submitting challenges that can be resolved by ideas that people suggest. This concept is called Open Innovation.

Maybe it’s hard to see what Midealy is all about from my description (maybe because I’m not that good to explain it in English), but giving it a try should help you find out. You can get an invitation by submitting your email address on Midealy and you will receive an email as soon as we are ready for some more users!

Thursday, July 16, 2009

Firefox + FirePHP

After a couple of hours of debugging, I have finally found a solution to a problem I was getting with Firefox not rendering one of my form submit (it was working fine in Chromium on Ubuntu). When I was submiting the form, Firefox was simply not handling the submit and therefor not rendering anything. The server was getting the request and was generating a response, but it seemed like Firefox had problem handling it. The same problem appeared when I was trying to submit the form through AJAX using the jQuery form plugin, but now the XHR was simply returning with a status code "0" (zero).

The problem was in fact special headers sent by the server to be outputed in Firebug through the FirePHP addon. Since the form was generating many SQL queries and that those queries were generally outputed to Firebug by my framework (through firepy), Firefox was just failing at handling them correctly. I don't know if it was because one of them was invalid or simply because there were too many, but after disabling this logging, Firefox was back to normal.

Monday, March 9, 2009

Hbase: Too many open files

While running a job that was inserting a lot of records in HBase (~7,000,000 records), I was getting this exception (in datanode log file):

2009-03-09 20:26:21,072 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(127.0.0.1:50010, storageID=DS-1758498865-127.0.1.1-50010-1236539641254
, infoPort=50075, ipcPort=50020):DataXceiveServer: java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:145)
at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:130)
at java.lang.Thread.run(Thread.java:619)


The error is verbose, but I had problem raising this limit until I found this post. It seems like you have to modify two files to raise the limit on ubuntu. First of all, you need to add the following line to /etc/pam.d/common-session (as root):
  • session required pam_limits.so
After enabling the pam limits module, you can simply edit the limits.conf file located at /etc/security/limits.conf to add a specific number of opened files limit to the hadoop user (in my case, hadoop). Add the following lines according to your installation:
  • hadoop hard nofile 65000
  • hadoop soft nofile 30000
Logout and login back into your hadoop user account and your limit should be raised. To check if it's really the case, execute the following command: ulimit -n. It should return 30000 (the soft limit we set)

Saturday, November 29, 2008

Thrift + HBase + PHP

Thrift is a framework initially developped by Facebook that allows RPC communication between a client and a server written in C++, Java, Python, PHP or Ruby. It generates, using a definition file, data types and services interfaces in the programming language you specify.

Hbase database has a Thrift interface, so it can be called from any language supported by Thrift. It's the easiest way right now to call Hbase from PHP. Here is a little tutorial on how to install Thrift on Ubuntu and how to generate PHP client files used for Hbase access.

First of all, for any information about Thrift, you can have a look at Thrift wiki. This tutorial also take in account that you have apache and php5 installed and working in your ubuntu system.

Install requirement
(From: http://wiki.apache.org/thrift/GettingUbuntuPackages)
You need to install the following packages in order to compile Thrift:
  • Automake
  • LibTool
  • Flex
  • Bison
  • Boost Libraries
In a shell, type:
sudo apt-get install build-essential automake libtool flex bison libboost*
Installing Thrift
(From: http://wiki.apache.org/thrift/ThriftInstallation)
Get latest Thrift sources, unzip and go into the directory:
wget -O thrift.tgz "http://gitweb.thrift-rpc.org/?p=thrift.git;a=snapshot;h=HEAD;sf=tgz"
tar -xzf thrift.tgz
cd thrift
Take note that the above source couldn't compile on Ubuntu 8.10. I had to use a special snapshot from http://gitweb.thrift-rpc.org/?p=thrift.git;a=snapshot;h=1c8c4bb279578cb76bfcaa419d5b06fb7a187614;sf=tgz

Let's now configure, compile and Install Thrift
./boostrap.sh
./configure
make
sudo make install
Generating Thrift client libraries
You should now have a fresh Thrift installation. We now need to generate PHP files that will be included in your application in order to access Hbase. Hbase definition file has been included in Hbase sources, so we will not have to write it. If you have installed Hbase into /usr/local/hbase/, you can copy Thrift definition file into your home:
cp -r /usr/local/hbase/src/java/org/apache/hadoop/hbase/thrift ~/thrift_src
cd ~/thrift_src
Else, you need to modify the above command to match your Hbase installation path. We can now generate PHP files:
thrift -php Hbase.thrift
If you have followed all above the steps correctly, Thrift should have generated a directory named gen_php wich contains 2 php files. Those two files contains classes you will use to access hbase. But those files also depends on Thrift base files that you can find in thrift source directory. Following steps assume that your apache home directory is /var/www. Let's copy Thrift base files and create a "packages" directory wich will contains previously generated files.
cp -r ~/thrift/lib/php/src /var/www/thrift
mkdir /var/www/thrift/packages
cp -r ~/thrift_src/gen_php /var/www/thrift/packages/Hbase

Let's now start Hbase thrift server.
/usr/local/hbase/bin/hbase thrift start
All you need to access Hbase from PHP is now ready to be used. To test the installation, let's use the demo client from Hbase sources.
cp /usr/local/hbase/src/examples/thrift/DemoClient.php /var/www/DemoClient.php
You need to modify the above file (/var/www/DemoClient.php) in order to change Thrift root path. Simply change the value of $GLOBALS['THRIFT_ROOT'] to /var/www/thrift. You should now be able to access the file through apache. The script simply test your hbase installation by creating a table, insert data in it, etc.

That's it. If you have any question, contact me!