Software and Design: October 2009

Friday, October 16, 2009

Build farm using Hudson

The Hudson can be used to create a build farm, which is convenient if your software has to be build/test in multiple platforms.

The Hudson works as a master/slave mode. It supports different ways to control slave: ssh, java web start, custom scripts and so on.

The squid built a farm using volunteer machines. The volunteers will run:
java -jar slave.jar -jnlpUrl http://build.squid-cache.org/computer/$NODENAME/slave-agent.jnlp
in their local.rc. Then They can create a Hudson account in squid web site and grant access to it. They also create jobs to build and test.

Wednesday, October 14, 2009

C++ STL revisit

I only used a small set of the STL in C++. Here are some re-visit:

Container

slist vs. list: list use bi-direction iterator but slist use forward iterator. insertion and splicing will not invalidate the iterator.
deque vs. vector: deque supports constant time insertion or removal of the elements at the beginning of the header.
map vs. hash_map: map is always implemented using self-balanced search tree, but hash_map is implemented using hash function. map is more appropriate for the element in particular order.
set vs. hash_set: similarly, the set is sorted using self-balanced search tree.
bitset vs. bitvector: bitset size is fixed and it is not an container at all.
rope: A scalable string implementation for assignment, concatenation, sub string. Single character replacement is expensive.
queue, stack is implemented on deque. prority queue is implemented on vector. All of them are container adapters.

Algorithm

Non-Mutation

for_each
find_if: The if functor is unary predictor.
adjacent_find: You can use a binary predicator. For example, if you want to find the first element that is greater than its successor.
count_if: The if functor is unary predictor
mismatch: Find the first positions where two ranges differ
equal: return true if two ranges are identical
search: find a sub-sequence in a range.

Friday, October 9, 2009

Google Secure Data Connector -2

Update on this one. I think the tunnel server can use small program to attach to the established the HTTPS connection by using the
ssh -o ProxyCommand="small program" -L port:localhost:1080 SDCip -N
The small program will pipeline stdin/stdout to established https connection.

This small program will take anything

This is a real cool command. For example, I can connect from client box (192.168.122.1) to service box (192.168.122.2) using a https poxy (192.168.122.3) if I enabled the 22 for SSL connection in proxy settings:
ssh -o ProxyCommand="nc -X connect -x 192.168.122.3:8001 %h %p" -L 8888:localhost:1344 192.168.122.2 -N

(1) connect from 192.168.122.1 to 192.168.122.3 using the https
(2) the 192.168.122.3 will try to establish a TCP connection to 192.168.122.2:22
(3) After the connection is established, it will in turn notify the 192.168.122.1 by using connect OK
(4) Now it works on the ssh protocols ...

Thursday, October 8, 2009

Google Secure Data Connector

I am looking at the Google Secure Data Connector, which can be used to retrieve the data stored in the intranet. Before starting, you are required to install the Secure Data Connector (SDC) inside your firewall.
(1). The Secure Data Connector send a https to tunnel server through port 443
(2). It will have https certificate authentication and then registration. Then the secure channel is established.
(3). A sshd process running with intetd mode (-i option) is spawned by the SDC. In this mode, the sshd expects a connection already established on the standard input. So the tunnel socket is bound to the sshd stdout/stdin. Refer to reverse ssh shell
(4) Also there is a local socks proxy is listened on default 1080 port.
(5) I am not sure what is running in the tunnel server side to forward port. There are well know technology to tunnel ssh through ssl.

REST vs. SOAP

Copy from the network:

What is a REST Web Service

The acronym REST stands for Representational State Transfer, this basically means that each unique URL is a representation of some object. You can get the contents of that object using an HTTP GET, to delete it, you then might use a POST, PUT, or DELETE to modify the object (in practice most of the services use a POST for this).

Who's using REST?

All of Yahoo's web services use REST, including Flickr, del.icio.us API uses it, pubsub, bloglines, technorati, and both eBay, and Amazon have web services for both REST and SOAP.

Who's using SOAP?

Google seams to be consistent in implementing their web services to use SOAP, with the exception of Blogger, which uses XML-RPC. You will find SOAP web services in lots of enterprise software as well.

REST vs SOAP

As you may have noticed the companies I mentioned that are using REST api's haven't been around for very long, and their apis came out this year mostly. So REST is definitely the trendy way to create a web service, if creating web services could ever be trendy (lets face it you use soap to wash, and you rest when your tired). The main advantages of REST web services are:

* Lightweight - not a lot of extra xml markup
* Human Readable Results
* Easy to build - no toolkits required

SOAP also has some advantages:

* Easy to consume - sometimes
* Rigid - type checking, adheres to a contract
* Development tools

For consuming web services, its sometimes a toss up between which is easier. For instance Google's AdWords web service is really hard to consume (in CF anyways), it uses SOAP headers, and a number of other things that make it kind of difficult. On the converse, Amazon's REST web service can sometimes be tricky to parse because it can be highly nested, and the result schema can vary quite a bit based on what you search for.

Which ever architecture you choose make sure its easy for developers to access it, and well documented.

Web service identity technology

I am looking at the Google Application Engine recently. The Google Data API can support the OAuth, an open standard to support secure API authentication. I will compare popular web service technologies:

OAuth: The technology can be used to enable one web site to access the user data data stored on another web site. For example,a photo print service web site may require to access the web site store these photos. It can also enable the user to grant the approval of that web site requests. It was started in the consumer-centric world, such as twitter, flickr, pownce etc.

WS-Trust: It provides for API interaction between web servers. It always used with SOAP based APIs in Enterprise. Most REST based API is hard to leverage the WS-* Stack.

SAML(Single Authentication Mark Language) Web SSO (Single Sign-on): An XML-based framework to identity and protocols. User only logon once and can be authenticated to other systems in the organization. SAML does not define how the authentication is implemented. Google, and recently Salesforce announced support for SAM.

OpenId: The technology is commonly used for SSO. Compared with SAML, it defines the user side behavior. It is a lightweight protocol, similar to OAuth, it is in the Consumer Centric world, such as log/consumer/social networking space (MySpace, Orange recently announced support for it).

Microsoft Geneva: It is based WS-* and SAML

Strong/2ndFactorAuth: The general concept of authenticating a user with more information then just a password. Such as cookie, biometric devices, phone call....

link to Overlap of identity technologies

Linux buffer cache

In Linux, when you write a file to a disk, it actually stay at cache buffer for a while (5 seconds by default in /proc/sys/vm/dirty_writeback_centisecs). The sync just issues a command to flush the cache, but when to do it still depends on the O.S. In Redhat 2.6 Kernel, you can use /proc/sys/vm/drop_caches to force the cache wrote back to the disk, but it is a system-wide settings.

If an application, like database, want to know that the file is actually written to the storage, it can use fsync system call. This system call will block until the device reported the operation completed. Another option is to open file using O_DIRECT flag, so the file buffer cache will be bypassed.

Wednesday, October 7, 2009

Install Python on Apache in Centos 5.4

The first thing, you need the Mod_python:

sudo yum install mod_python

After that, you need to configure Apache by modifying the configuration file:

LoadModule python_module libexec/mod_python.so

AddHandler mod_python .py .psp
PythonHandler mptest
PythonDebug On

where mptest is the python file you are trying to run. The .psp is for python server page

Restart the Apache, and now you can create your web page using python.

Qt3 menu in Centos 5.4

I installed the Qt3 in my Centos box:

sudo yum install qt-designer

The Qt3 designer is automatically added to my Application->Programming menu. When I click it, there is no respond. Maybe the path is wrong? So I modified the desktop file:

sudo vi /usr/share/applications/qt-designer3.desktop
sudo vi /usr/share/applications/qt-assistant3.desktop
sudo vi /usr/share/applications/qt-linguist3.desktop

by adding the path to the Exec. Now it worked like a charm.

Software and Design