Sunday, January 27, 2013

M101: MongoDB for Developers

I am pleased to announce that I've concluded with distinction the course of MongoDB. My final grade of 95% puts me in the top 5% of all students ( 21,116 in total) who registered for the course.

You can find other stats on the course on the link

And this is my certificate:

If you intend to start with MongoDB this course is the right place to start.

Wednesday, October 3, 2012

Before start coding in Node.JS

Following on the Node.js hype, I've noticed that a lot of people are talking about Node.js but do you really know how it works?
Before start coding in Node.js is necessary to understand how it works internally and understand a new programming paradigm. So, in my talk i'll start talking about the internal architecture of Node.js and which core libraries are being used in windows and linux. It is very important to understand the concept of event driven programming as well as how the event loop works internally and, therefore, prevent the development of bad code or apply Node.js for a function for which it was not designed. The fundamental concepts about Node.js are very simple, but first you have to learn them.

Finally, I will talk about how to do profiling in Node.js, some performance tips and present some real cases where Node.js is being used.

Presentation Topics

  • Internal Architecture
  • Asynchronous I / O
  • Event Driven Programming
  • EventLoop
  • Node.js is single threaded?
  • Flow control
  • Performance tips
  • WebSockets
  • Who & How is using Node.js

If you are interested in this talk, please contact me.

Friday, May 25, 2012

Hash read performance (perl,ruby,python,node.js)

After having read the chapter "How Hashes Scale From One To One Million Elements" from the book "Ruby Under A Microscope" (This post does not dispense the reading of the respective chapter), I decided to play around with the subject and make some tests with other languages and do a small analysis of the performance of each. The chosen languages ​​were PerlRubyPython and Javascript (node.js).

The tests focused on measuring the time in ms when retrieving an element from a hash with N elements 10.000 times and were based on tests that were performed on the book.

The tests were run on an Amazon EC2 server instance (free tier) with the following characteristics:
  • Ubuntu Server 12.04 LTS
  • Instance type micro
  • Intel(R) Xeon(R) CPU E5430 @ 2.66GHz
  • 604.364 KB of memory
  • 64-bit platform

Test description

Programs have been implemented for each of the languages ​​to be tested and they are similar as long as possible.The implementation passed by the creation of hashes with size equals to powers of 2, ranging from 1 to 20.

For each of the hashes created, are made 10.000 gets of the value for a key (key=target_index), and measuring the time in milliseconds.

The ​​measured values are placed in an output file for later to be consumed by a program that will generate the graphics.

Comparing Ruby Versions

From the graphs we can see greater improvements from ruby 1.8 to ruby 1.9 so 1.9 is the the way to go.

Time taken to retrieve 10.000 values (ms)

Comparing Perl Versions

Perl is a language with some years and performance has been stable at least since version 5.8, the results show it.Yet I hoped there were more commitment from the community in trying to optimize these values​​.

Time taken to retrieve 10.000 values (ms)

Comparing Python Versions

The versions of python show few variations and show some stability between versions. However python 3.2.3 shows a slight degradation in performance.
I may also say that it's performance is quite interesting.

Time taken to retrieve 10.000 values (ms)

Comparing Languages

Represented in the graph below are the tests for all languages ​​and respective versions.

Time taken to retrieve 10.000 values (ms)

Python was the winner for the interpreted languages. node.js has a great performance however the library used in node.js is not enough accurate to give us reliable values.
You can easily do the tests in your environment using my scripts, the source code used in this tests can be found on Github. I've used rvmperlbrew and pythonbrew to switch between versions, you can do the same if you want. Nevertheless, you can use your installed versions.

Run the tests:

$ ./experiment1.rb
$ ./
$ ./

The execution of these scripts will create an output file in the form "values.#{lang}-#{version}" (ex: values.perl-5.8.8) for each script.

To generate the graph just run the ruby script (requires gem googlecharts):

$ ./chart.rb

This script will output the link to the googlechart image.

In the near future I might do the same but for hash writes and that will be more interesting.

Friday, April 13, 2012

Count the number of 1's

I've participated in a Quiz where i had to implement a function that counts the number of 1's required to reach a positive integer passed as a parameter.

countOfOnes (12) = 5
1 - one 1
10 - two 1's
11 - four 1's
12 - five 1's

Since countOfOnes (1) = 1, they have requested to find the next number in which the parameter is equal to the return value of countOfOnes ( countOfOnes(n) = n ).

At first glance I've implemented a perl one-liner that returns the number of 1's for an input number. Here it is:

$ perl -e '$sum += () = /1/g for 1..shift; print "NumberOfOnes=$sum\n"' 111

However, this solution was inefficient due to the high number of invocations needed to find the solution to the quiz.

So, analyzing the evolution of the mathematical calculation of the function numOfOnes, I outlined the following recursive algorithm:

# NumOfOnes
sub num_of_ones{
  my $n = shift;

  return 0 if $n < 1;
  return 1 if $n <= 9;

  my @a = split('',$n);

  my $len = length($n);
  my $sum = 0;
  my $delta = 0;

  if($a[0] > 1){
    $delta += 10**($len-1);
    $delta += substr($n,1) + 1;

   $sum += $delta + $a[0]*num_of_ones(9x($len-1))+ num_of_ones(substr($n,1));
   return $sum;

Since, recursive functions were used, which were invoked repeatedly with the same parameters, I used the Memoize module, which makes caching for functions with the same parameters.

Although the previous function is more efficient for calculating the numberOfOnes just for a single number, it is not when you want to go all the way incrementing to find the next number that satisfies the requested condition. Once we have the numOfOnes for the previous number, we only need to count the numOfOnes that exists in the current number and increment numOfOnes of the previous number.

The following perl on-liner is a better approach to achieve that

perl -e 'while(1){$sum += () = ++$i =~ /1/g; print "Found:$i\n"  if $i == $sum;}'


The number after the 1 that satisfies the condition countOfOnes(n) = n is 199981

The source code is here.

Thursday, March 10, 2011

How to collect CPU values from WMI. #perl #win32 #wmi

This program will demonstrate how easy it is to obtain data from Windows Management Instrumentation (WMI), using a perl script. By analogy other information may be collected from WMI.

To obtain these data, I used the WMI Performance Counter class "Win32_PerfRawData_PerfOS_Processor" , which provides me the counters that measure aspects of processor(s) activity.

Since the data collected via WMI are percentages. I just needed to calculate the percentage of time used for each counter, depending on the time interval between measurements.

use strict;
use warnings;

use Win32::OLE;

my $interval = 1;
my $key = 'Name';
my @properties = qw(PercentIdleTime PercentProcessorTime PercentPrivilegedTime PercentUserTime PercentInterruptTime TimeStamp_Sys100NS);

my $hash1 = {};

my $wmi = Win32::OLE->GetObject("winmgmts://./root/cimv2")
or die "Failed to get object\n";

my $list = $wmi->InstancesOf('Win32_PerfRawData_PerfOS_Processor')
or die "Failed to get instance object\n";

my $v;
foreach $v (in $list) {
map{$hash1->{$v->{$key}}->{$_} = $v->{$_} }@properties;

while(sleep 1){

$list = $wmi->InstancesOf('Win32_PerfRawData_PerfOS_Processor')
or die "Failed to get instance object\n";

my $hash = {};
foreach $v (in $list) {
map{$hash->{$v->{$key}}->{$_} = $v->{$_} }@properties;

my $cpu_time = sprintf("%.2f", (1 - get_value($hash1->{'_Total'}, $hash->{'_Total'}, 'PercentProcessorTime' )) * 100);
my $cpu_idle = sprintf("%.2f", 100-$cpu_time);
my $cpu_user = sprintf("%.2f", get_value($hash1->{'_Total'}, $hash->{'_Total'}, 'PercentUserTime' )* 100);
my $cpu_priv = sprintf("%.2f", get_value($hash1->{'_Total'}, $hash->{'_Total'}, 'PercentPrivilegedTime' )* 100);
my $cpu_int = sprintf("%.2f", get_value($hash1->{'_Total'}, $hash->{'_Total'}, 'PercentInterruptTime' )* 100);
printf "CPU Time %s %% , privileged %s %% , user %s %%, interrupt %s %%\n", $cpu_time,$cpu_priv,$cpu_user,$cpu_int;

$hash1 = $hash;


sub get_value {
my $h1 = shift;
my $h2 = shift;
my $property = shift;
return (($h2->{$property} - $h1->{$property})/($h2->{'TimeStamp_Sys100NS'}-$h1->{'TimeStamp_Sys100NS'}));

Output sample:

CPU Time 2.03 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 1.87 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 2.16 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 1.76 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 2.19 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 1.77 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 1.98 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 1.93 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 2.08 % , privileged 1.96 % , user 0.00 %, interrupt 0.00 %
CPU Time 2.84 % , privileged 2.94 % , user 0.00 %, interrupt 0.00 %

Wednesday, September 8, 2010

Diff between two files (unordered and with different number of lines)

This program will find the differences between two files, and should be applied in the following situations:
  • The files have different number of lines
  • The files are unordered.
  • `diff` fails to get the differences
  • Large files (the reason to use shell `sort` function)
  • Single record per line
For example: Imagine you have two files with a list of users, and these listings were taken from a database at different time instants. These listings will be different because users have been added and removed from the database. This script will help you to find the differences when `diff` can't resolve it.

#!/usr/bin/perl -w

use strict;

my $file1= $ARGV[0];
my $file2= $ARGV[1];

unless($file1 and $file2){
print "Usage: $0 <file1> <file2>\n\n";
unless( -f $file1){
print "File 1 does not exist: [$file1]\n\n";
unless( -f $file2){
print "File 2 does not exist: [$file2]\n\n";

my $tmp_file1 = '/tmp/f1.tmp';
my $tmp_file2 = '/tmp/f2.tmp';

`sort $file1 > $tmp_file1`;
`sort $file2 > $tmp_file2`;

open(F1, $tmp_file1) or die "$!";
open(F2, $tmp_file2) or die "$!";

my $read_f1 = 1;
my $read_f2 = 1;

my $s1;
my $s2;

if (eof(F1)){print ">>$_" while <F2>;}
if(eof(F2)){print "<<$_" while <F1>;}

if($read_f1){$s1 = <F1>;}
if($read_f2){$s2 = <F2>;}

last unless $s1 and $s2;

$read_f1 = 1;
$read_f2 = 1;

next if ( lc($s1) eq lc($s2) );

if(lc($s1) gt lc($s2)){
print ">$s2";
$read_f1 = 0;
print "<$s1";
$read_f2 = 0;

unlink $tmp_file1 or die "$!";
unlink $tmp_file2 or die "$!";

Wednesday, May 12, 2010

How to kill TCP client connections

Several times we are faced with a large number of TCP connections from the same source IP.
How to kill that connections?

This is an easy way:

$ for i in `lsof -i tcp@client_ip_address| awk '{print $2}'`; do kill -9 $i; done

See also : tcpkill