I am trying to understand how the anti-virus software create/manage the virus signatures.
This article gives me some idea how the signature is created. When a new virus is found, you can run a 'decoy' program to collect as much as possible variance of the virus, and then compare them to get the candidate signatures. In order to avoid 'false positive', the candidate signature may run against a large volume of the good data to obtain unique signature for the virus.
Here is another blog regarding how to create the clamav signatures. A detail explain to eicar virus test file.
A nice web site regarding the virus analysis.
Tuesday, December 28, 2010
Friday, December 17, 2010
Linux FAQ
(1) What is the maximum memory for 32 bit Linux?
Normally, Linux split 4G address space into two parts: 3G for user space virtual address and 1G for kernel virtual address.Here are some solutions:
a). HIGHMEM solution for up to 4G
Use the kmap to map the memory in ZONE_HIGHMEM to ZONE_NORMAL
b). HIGHMEM solution for using 64G memroy (36bit bus address)
This is enabled via PAE(Physical Address Extension) extension of the Pentiumpro processor. Then map ZONE_HIGHMEM to ZONE_NORMAL
(2) What is the maximum memory for 64 bit linux?
Based on this, it is 16T, which is controlled by:
arch/x86/include/asm/sparsemem.h:
# define MAX_PHYSMEM_BITS 44
Normally, Linux split 4G address space into two parts: 3G for user space virtual address and 1G for kernel virtual address.Here are some solutions:
a). HIGHMEM solution for up to 4G
Use the kmap to map the memory in ZONE_HIGHMEM to ZONE_NORMAL
b). HIGHMEM solution for using 64G memroy (36bit bus address)
This is enabled via PAE(Physical Address Extension) extension of the Pentiumpro processor. Then map ZONE_HIGHMEM to ZONE_NORMAL
(2) What is the maximum memory for 64 bit linux?
Based on this, it is 16T, which is controlled by:
arch/x86/include/asm/sparsemem.h:
# define MAX_PHYSMEM_BITS 44
Wednesday, December 15, 2010
Obtained current logon user name in Windows
We had an application to retreive current logon user name in Windows. Since our application is an HTTP proxy, there are two approaches:
(1) Obtain user name using TCP connection
a). tcptable=GetExtendedTcpTable(...)
b). Get the process id of the socket
foreach(tcptable->dwNumEntries)
if (tcptable->table[i].dwLocalPort==port)
processid=tcptable->table[i].dwOwningpid;
c). Open process to get the process token
hrpocess=OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, processid)
OpenProcessToken(hprocess, TOKEN_QUERY, &hToken)
d) Then use the token to user info
GetTokenInformation(hToken, TokenUser, PUserInfo, dwSize, &dwSize)
e) Look up the user name from the user info
LookupAccountSidW(NULL, pUserInfo->User.sid, Name, &dwSize, lpDomain, &dwSize, &SidType)
(2) Obtain the user name using active session
a) enumerate the sessions
WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE, 0 ,1, &pSessionInfo, &dwCount)
b) foreach dwCount
if (pSessionInfo[i].State==WTSActive)
dwSessionId=pSessionInfo[i].SessionId;
c). Then query the user token use the session id
WTSQueryUserToken(dwSessionId, &hToken)
After that following the d) and e) in approach (1)
(1) Obtain user name using TCP connection
a). tcptable=GetExtendedTcpTable(...)
b). Get the process id of the socket
foreach(tcptable->dwNumEntries)
if (tcptable->table[i].dwLocalPort==port)
processid=tcptable->table[i].dwOwningpid;
c). Open process to get the process token
hrpocess=OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, processid)
OpenProcessToken(hprocess, TOKEN_QUERY, &hToken)
d) Then use the token to user info
GetTokenInformation(hToken, TokenUser, PUserInfo, dwSize, &dwSize)
e) Look up the user name from the user info
LookupAccountSidW(NULL, pUserInfo->User.sid, Name, &dwSize, lpDomain, &dwSize, &SidType)
(2) Obtain the user name using active session
a) enumerate the sessions
WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE, 0 ,1, &pSessionInfo, &dwCount)
b) foreach dwCount
if (pSessionInfo[i].State==WTSActive)
dwSessionId=pSessionInfo[i].SessionId;
c). Then query the user token use the session id
WTSQueryUserToken(dwSessionId, &hToken)
After that following the d) and e) in approach (1)
Tuesday, December 14, 2010
large file cache use mmap
We had a 3G file and fields position can be identified using hash value. We use the mmap to deal with the cache.
(1) open the file
(2) caculate the position, if the data is not in memory, then call:
mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0x210e8)
which loads the 16K into memory.
(3) for the data in the next request, call the mmap2 again (the mmap can be called multiple times).
(4) When you run out of cache (let's say 300M), find the least used fields, then run
munmap(....)
to cache the fileds out.
Perl provides a module Cache:FastMmap to share memory cross processes.
(1) open the file
(2) caculate the position, if the data is not in memory, then call:
mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0x210e8)
which loads the 16K into memory.
(3) for the data in the next request, call the mmap2 again (the mmap can be called multiple times).
(4) When you run out of cache (let's say 300M), find the least used fields, then run
munmap(....)
to cache the fileds out.
Perl provides a module Cache:FastMmap to share memory cross processes.
Wednesday, December 8, 2010
process virtual memory layout in 32 bit and 64 bit Linux
I did some experiment to compare the process virtual memory layout in 32 bit linux (2.6.9) and 64 bit linux(2.6.18).
This program is used:
const size_t arraySize=0x1000000; //16M
char global[arraySize*2];
int main(int argc, char** argv)
{
int i=0;
char local[arraySize];
char* heap=(char*)malloc(arraySize*3);
for (i=0; i for (i=0; i for (i=0; i printf("local %lx, global %lx, heap %lx\r\n",local,global, heap);
}
(1) In linux 32 bit, the output is:
local bee40ef0, global 08049800, heap 4000a008
The memory map is:
00734000-00749000 r-xp 00000000 03:03 32066 /lib/ld-2.3.4.so
00749000-0074a000 r--p 00015000 03:03 32066 /lib/ld-2.3.4.so
0074a000-0074b000 rw-p 00016000 03:03 32066 /lib/ld-2.3.4.so
00753000-0075c000 r-xp 00000000 03:03 38761 /lib/libgcc_s-3.4.6-20060404.so.1
0075c000-0075d000 rw-p 00009000 03:03 38761 /lib/libgcc_s-3.4.6-20060404.so.1
00774000-00898000 r-xp 00000000 03:03 32151 /lib/tls/libc-2.3.4.so
00898000-00899000 r--p 00124000 03:03 32151 /lib/tls/libc-2.3.4.so
00899000-0089c000 rw-p 00125000 03:03 32151 /lib/tls/libc-2.3.4.so
0089c000-0089e000 rw-p 0089c000 00:00 0
008a0000-008c1000 r-xp 00000000 03:03 38754 /lib/tls/libm-2.3.4.so
008c1000-008c3000 rw-p 00020000 03:03 38754 /lib/tls/libm-2.3.4.so
00af9000-00bb9000 r-xp 00000000 03:03 195327 /usr/lib/libstdc++.so.6.0.3
00bb9000-00bbe000 rw-p 000bf000 03:03 195327 /usr/lib/libstdc++.so.6.0.3
00bbe000-00bc4000 rw-p 00bbe000 00:00 0
08048000-08049000 r-xp 00000000 00:14 34996225 /mnt/centos/ut/a.out
08049000-0804a000 rw-p 00000000 00:14 34996225 /mnt/centos/ut/a.out
0804a000-0a04a000 rw-p 0804a000 00:00 0
40008000-4300d000 rw-p 40008000 00:00 0
bee40000-c0000000 rw-p bee40000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
The exeutable is start from 0x08048000, and then the data and bss section for global variable (32M):
0804a000-0a04a000 rw-p 0804a000 00:00 0
And then the heap starts from 1/3 of the user space (0x4000000), which is 48M:
40008000-4300d000 rw-p 40008000 00:00 0
The stack begins at 0xc0000000 and grows to low address, which is 18M:
bee40000-c0000000 rw-p bee40000 00:00 0
(2) in Linux 64 bit:
The output is:
local 7fffa88b2e40, global 00600b20, heap 2ab801210010
The memory map is:
00400000-00401000 r-xp 00000000 fd:00 34996225 /home/bliu/ut/a.out
00600000-00601000 rw-p 00000000 fd:00 34996225 /home/bliu/ut/a.out
00601000-02601000 rw-p 00601000 00:00 0
314ec00000-314ec1c000 r-xp 00000000 fd:00 11796527 /lib64/ld-2.5.so
314ee1b000-314ee1c000 r--p 0001b000 fd:00 11796527 /lib64/ld-2.5.so
314ee1c000-314ee1d000 rw-p 0001c000 fd:00 11796527 /lib64/ld-2.5.so
314f000000-314f14d000 r-xp 00000000 fd:00 11796545 /lib64/libc-2.5.so
314f14d000-314f34d000 ---p 0014d000 fd:00 11796545 /lib64/libc-2.5.so
314f34d000-314f351000 r--p 0014d000 fd:00 11796545 /lib64/libc-2.5.so
314f351000-314f352000 rw-p 00151000 fd:00 11796545 /lib64/libc-2.5.so
314f352000-314f357000 rw-p 314f352000 00:00 0
315ae00000-315ae0d000 r-xp 00000000 fd:00 11796784 /lib64/libgcc_s-4.1.2-20080825.so.1
315ae0d000-315b00d000 ---p 0000d000 fd:00 11796784 /lib64/libgcc_s-4.1.2-20080825.so.1
315b00d000-315b00e000 rw-p 0000d000 fd:00 11796784 /lib64/libgcc_s-4.1.2-20080825.so.1
3cb2600000-3cb26e6000 r-xp 00000000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb26e6000-3cb28e5000 ---p 000e6000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb28e5000-3cb28eb000 r--p 000e5000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb28eb000-3cb28ee000 rw-p 000eb000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb28ee000-3cb2900000 rw-p 3cb28ee000 00:00 0
3ecfe00000-3ecfe82000 r-xp 00000000 fd:00 35454980 /lib64/libm-2.5.so
3ecfe82000-3ed0081000 ---p 00082000 fd:00 35454980 /lib64/libm-2.5.so
3ed0081000-3ed0082000 r--p 00081000 fd:00 35454980 /lib64/libm-2.5.so
3ed0082000-3ed0083000 rw-p 00082000 fd:00 35454980 /lib64/libm-2.5.so
2ab8011f6000-2ab8011f7000 rw-p 2ab8011f6000 00:00 0
2ab80120d000-2ab804213000 rw-p 2ab80120d000 00:00 0
7fffa88b2000-7fffa98b4000 rw-p 7ffffeffd000 00:00 0 [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso]
Here the executable is start from 0x00400000, then 32M is allocated for the global bss section:
00601000-02601000 rw-p 00601000 00:00 0
After load the library at low address, the heap is started at 0x2ab80120d000, which is 48M:
2ab80120d000-2ab804213000
The stack is started at 7fffa98b4000, which is 16M:
7fffa88b2000-7fffa98b4000
Then VDSO(Virtual dynamic shared object):
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso]
the vdso mapped to linux-gate.so.1, which is a virtual shared object exposed by kernel. It is used to make virtual system call using sysenter/sysexit, and this is faster than regular interruption.
This program is used:
const size_t arraySize=0x1000000; //16M
char global[arraySize*2];
int main(int argc, char** argv)
{
int i=0;
char local[arraySize];
char* heap=(char*)malloc(arraySize*3);
for (i=0; i
}
(1) In linux 32 bit, the output is:
local bee40ef0, global 08049800, heap 4000a008
The memory map is:
00734000-00749000 r-xp 00000000 03:03 32066 /lib/ld-2.3.4.so
00749000-0074a000 r--p 00015000 03:03 32066 /lib/ld-2.3.4.so
0074a000-0074b000 rw-p 00016000 03:03 32066 /lib/ld-2.3.4.so
00753000-0075c000 r-xp 00000000 03:03 38761 /lib/libgcc_s-3.4.6-20060404.so.1
0075c000-0075d000 rw-p 00009000 03:03 38761 /lib/libgcc_s-3.4.6-20060404.so.1
00774000-00898000 r-xp 00000000 03:03 32151 /lib/tls/libc-2.3.4.so
00898000-00899000 r--p 00124000 03:03 32151 /lib/tls/libc-2.3.4.so
00899000-0089c000 rw-p 00125000 03:03 32151 /lib/tls/libc-2.3.4.so
0089c000-0089e000 rw-p 0089c000 00:00 0
008a0000-008c1000 r-xp 00000000 03:03 38754 /lib/tls/libm-2.3.4.so
008c1000-008c3000 rw-p 00020000 03:03 38754 /lib/tls/libm-2.3.4.so
00af9000-00bb9000 r-xp 00000000 03:03 195327 /usr/lib/libstdc++.so.6.0.3
00bb9000-00bbe000 rw-p 000bf000 03:03 195327 /usr/lib/libstdc++.so.6.0.3
00bbe000-00bc4000 rw-p 00bbe000 00:00 0
08048000-08049000 r-xp 00000000 00:14 34996225 /mnt/centos/ut/a.out
08049000-0804a000 rw-p 00000000 00:14 34996225 /mnt/centos/ut/a.out
0804a000-0a04a000 rw-p 0804a000 00:00 0
40008000-4300d000 rw-p 40008000 00:00 0
bee40000-c0000000 rw-p bee40000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
The exeutable is start from 0x08048000, and then the data and bss section for global variable (32M):
0804a000-0a04a000 rw-p 0804a000 00:00 0
And then the heap starts from 1/3 of the user space (0x4000000), which is 48M:
40008000-4300d000 rw-p 40008000 00:00 0
The stack begins at 0xc0000000 and grows to low address, which is 18M:
bee40000-c0000000 rw-p bee40000 00:00 0
(2) in Linux 64 bit:
The output is:
local 7fffa88b2e40, global 00600b20, heap 2ab801210010
The memory map is:
00400000-00401000 r-xp 00000000 fd:00 34996225 /home/bliu/ut/a.out
00600000-00601000 rw-p 00000000 fd:00 34996225 /home/bliu/ut/a.out
00601000-02601000 rw-p 00601000 00:00 0
314ec00000-314ec1c000 r-xp 00000000 fd:00 11796527 /lib64/ld-2.5.so
314ee1b000-314ee1c000 r--p 0001b000 fd:00 11796527 /lib64/ld-2.5.so
314ee1c000-314ee1d000 rw-p 0001c000 fd:00 11796527 /lib64/ld-2.5.so
314f000000-314f14d000 r-xp 00000000 fd:00 11796545 /lib64/libc-2.5.so
314f14d000-314f34d000 ---p 0014d000 fd:00 11796545 /lib64/libc-2.5.so
314f34d000-314f351000 r--p 0014d000 fd:00 11796545 /lib64/libc-2.5.so
314f351000-314f352000 rw-p 00151000 fd:00 11796545 /lib64/libc-2.5.so
314f352000-314f357000 rw-p 314f352000 00:00 0
315ae00000-315ae0d000 r-xp 00000000 fd:00 11796784 /lib64/libgcc_s-4.1.2-20080825.so.1
315ae0d000-315b00d000 ---p 0000d000 fd:00 11796784 /lib64/libgcc_s-4.1.2-20080825.so.1
315b00d000-315b00e000 rw-p 0000d000 fd:00 11796784 /lib64/libgcc_s-4.1.2-20080825.so.1
3cb2600000-3cb26e6000 r-xp 00000000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb26e6000-3cb28e5000 ---p 000e6000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb28e5000-3cb28eb000 r--p 000e5000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb28eb000-3cb28ee000 rw-p 000eb000 fd:00 18000239 /usr/lib64/libstdc++.so.6.0.8
3cb28ee000-3cb2900000 rw-p 3cb28ee000 00:00 0
3ecfe00000-3ecfe82000 r-xp 00000000 fd:00 35454980 /lib64/libm-2.5.so
3ecfe82000-3ed0081000 ---p 00082000 fd:00 35454980 /lib64/libm-2.5.so
3ed0081000-3ed0082000 r--p 00081000 fd:00 35454980 /lib64/libm-2.5.so
3ed0082000-3ed0083000 rw-p 00082000 fd:00 35454980 /lib64/libm-2.5.so
2ab8011f6000-2ab8011f7000 rw-p 2ab8011f6000 00:00 0
2ab80120d000-2ab804213000 rw-p 2ab80120d000 00:00 0
7fffa88b2000-7fffa98b4000 rw-p 7ffffeffd000 00:00 0 [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso]
Here the executable is start from 0x00400000, then 32M is allocated for the global bss section:
00601000-02601000 rw-p 00601000 00:00 0
After load the library at low address, the heap is started at 0x2ab80120d000, which is 48M:
2ab80120d000-2ab804213000
The stack is started at 7fffa98b4000, which is 16M:
7fffa88b2000-7fffa98b4000
Then VDSO(Virtual dynamic shared object):
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso]
the vdso mapped to linux-gate.so.1, which is a virtual shared object exposed by kernel. It is used to make virtual system call using sysenter/sysexit, and this is faster than regular interruption.
Tuesday, December 7, 2010
virtual memory vs. RSS in Linux
I wrote a small program to understand the difference between the VSZ and RSS. If you malloc, it only means that you can use the memory address. No real memory will be used until that page is accssed.
First I find out my page size using:
%getconf PAGE_SIZE
4096
(1)
size_t length=0x10000000; //256M ~const size_t pagesize=4096;
char* x=(char*)malloc(length);
after start the program:
%ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 25125 0.0 0.0 273552 912 pts/1 S+ 10:54 0:00 a.out
Here the VSZ just mean the maxium address space you can use, and the real memory RSS is still 912K although the VSZ is 270M
(2) and then execute
for (int i=0; i<1000; i++)
x[i]='a';
%ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 25395 0.0 0.0 273552 916 pts/1 S+ 10:56 0:00 a.out
Only one page is written, so the RSS just changed for an extra one page space (916K now).
(3) and then execute to write/read one character in 1000 pages
for (int i=0; i<1000; i++)
x[i*pagesize]='a';
%ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 25285 0.0 0.2 273552 4912 pts/1 S+ 10:55 0:00 a.out
Now every page written/read will be come the RSS (1000*4K+912K=4912K).
First I find out my page size using:
%getconf PAGE_SIZE
4096
(1)
size_t length=0x10000000; //256M ~const size_t pagesize=4096;
char* x=(char*)malloc(length);
after start the program:
%ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 25125 0.0 0.0 273552 912 pts/1 S+ 10:54 0:00 a.out
Here the VSZ just mean the maxium address space you can use, and the real memory RSS is still 912K although the VSZ is 270M
(2) and then execute
for (int i=0; i<1000; i++)
x[i]='a';
%ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 25395 0.0 0.0 273552 916 pts/1 S+ 10:56 0:00 a.out
Only one page is written, so the RSS just changed for an extra one page space (916K now).
(3) and then execute to write/read one character in 1000 pages
for (int i=0; i<1000; i++)
x[i*pagesize]='a';
%ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 25285 0.0 0.2 273552 4912 pts/1 S+ 10:55 0:00 a.out
Now every page written/read will be come the RSS (1000*4K+912K=4912K).
Monday, November 15, 2010
understand memory usage in Linux
Linux tries to optimize the memory management in different ways, which make it is hard to predicate the real memory usage:
(1) Binaries (including the libraries) are "demanded paged", only part of the library actually used by a process will be loaded from disk.
(2) Two different processes may share the same loaded library.
(3) The writable page will use COW(copy on write). So if you spawn a process, it will not allocate the memory for the sub-process until you try to write that page. Then a private copy of that page will be allocated.
If you run top in Linux:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3528 xxx 15 0 584m 233m 8752 S 0.7 11.6 32:40.23 gnome-terminal
Here the virtual set size is 584M (which include all code, data, shared library and swap out pages) and resident set size is 233M (the non-swap physical memory used). Well, we know that also includes some libraries. You can factor them out by print out memory map:
%pmap -d 3528
... ...
000000000d046000 272004 rw--- 000000000d046000 000:00000 [ anon ]
... ....
mapped: 606660K writeable/private: 284196K shared: 620K
The biggest one is 272M which is anonymous memory. What is the hell of anonymous memory:
Let's look at smap instead.
% cat /proc/3528/smaps
...
0d046000-1d9e7000 rw-p 0d046000 00:00 0 [heap]
Size: 272004 kB
Rss: 229712 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 4928 kB
Private_Dirty: 224784 kB
Swap: 40520 kB
Well, the heap is using 270M ... if you are using 2.6.25 kernel, you can use better /proc/$PID/pagemaps as well.
There are a few other things can contribute to the anonymous memory, like thread stack and mmap.
(1) Binaries (including the libraries) are "demanded paged", only part of the library actually used by a process will be loaded from disk.
(2) Two different processes may share the same loaded library.
(3) The writable page will use COW(copy on write). So if you spawn a process, it will not allocate the memory for the sub-process until you try to write that page. Then a private copy of that page will be allocated.
If you run top in Linux:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3528 xxx 15 0 584m 233m 8752 S 0.7 11.6 32:40.23 gnome-terminal
Here the virtual set size is 584M (which include all code, data, shared library and swap out pages) and resident set size is 233M (the non-swap physical memory used). Well, we know that also includes some libraries. You can factor them out by print out memory map:
%pmap -d 3528
... ...
000000000d046000 272004 rw--- 000000000d046000 000:00000 [ anon ]
... ....
mapped: 606660K writeable/private: 284196K shared: 620K
The biggest one is 272M which is anonymous memory. What is the hell of anonymous memory:
Let's look at smap instead.
% cat /proc/3528/smaps
...
0d046000-1d9e7000 rw-p 0d046000 00:00 0 [heap]
Size: 272004 kB
Rss: 229712 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 4928 kB
Private_Dirty: 224784 kB
Swap: 40520 kB
Well, the heap is using 270M ... if you are using 2.6.25 kernel, you can use better /proc/$PID/pagemaps as well.
There are a few other things can contribute to the anonymous memory, like thread stack and mmap.
linux file soft link vs. hard link
Hard link uses different inode and cannot cross devices. Soft link store the file name as its data.
So if you have two files hard linked, even if you delete the origianl file, you can still access it using linked file name.
On the other hand, if you have a soft link to another file. If you deleted the origianl file, you can not access it even if the origianl file has some hard link to other files.
One more thing: If the file is opened by other process and you deleted the file, the file will be still kept in the hard driver until you close it.
So if you have two files hard linked, even if you delete the origianl file, you can still access it using linked file name.
On the other hand, if you have a soft link to another file. If you deleted the origianl file, you can not access it even if the origianl file has some hard link to other files.
One more thing: If the file is opened by other process and you deleted the file, the file will be still kept in the hard driver until you close it.
Thursday, November 11, 2010
jump through jump box
We had a network settings. You always have to jump to the jump box and then logon other machine(tmachine). Here is the trick to do the job:
ssh tmachine -o ProxyCommand="netcat-proxy-command jumpbox tmachine"
Copy folder via jump box:
tar cvzf - . | ssh tmachine -o
ProxyCommand="netcat-proxy-command jumpbox tmachine"
cat ">" folder.tar.gz
The netcat-proxy-command:
#!/bin/sh
#http://www.hackinglinuxexposed.com/articles/20040830.html
bouncehost=$1
target=$2
ssh $bouncehost nc -w 1 $target 22
ssh tmachine -o ProxyCommand="netcat-proxy-command jumpbox tmachine"
Copy folder via jump box:
tar cvzf - . | ssh tmachine -o
ProxyCommand="netcat-proxy-command jumpbox tmachine"
cat ">" folder.tar.gz
The netcat-proxy-command:
#!/bin/sh
#http://www.hackinglinuxexposed.com/articles/20040830.html
bouncehost=$1
target=$2
ssh $bouncehost nc -w 1 $target 22
Monday, October 25, 2010
firefox 3.6.x proxy ntlm authenticaiton issue
If you are using NTLM proxy authentication, the 3.6.x will keep on popping up the password dialog box. This is because the firefox switches from their internal NTLM implementation to native NTLM windows API. The workaround is to set: network.auth.force-generic-ntlm to fallback to old ways.
Friday, October 22, 2010
Create a vista Gadget
Got my Windows 7 box in both home and office. It is time to play the Gadget side bar!
As a start project, I want to create a gadget to monitor a web site. Whenever it changes, it will display different icon. So you need a html file, a xml configuration file, and a javascript.
(1) How to access the Internet?
There are two ways: XMLHTTPRequest or create a dll to wrap the logic inside. Here I use the first approach.
(2) The regular request seems never sent out
It is because of the cache. Add the header like:
xmlhttp.setRequestHeader("If-Modified-Since", "Sat 1 Jan 2000 00:00:00 GMT");
(3) How to detect the network connection error
You can hook a timeout. Or if simply the web server did not start up, you will get the status code 12029 (WinInet error of the attmpt to connection to the server failed).
(4) I use:
<body onload="start();windows.setInterval('refresh()',5000)" ....>
to request the web site every 5 seconds.
As a start project, I want to create a gadget to monitor a web site. Whenever it changes, it will display different icon. So you need a html file, a xml configuration file, and a javascript.
(1) How to access the Internet?
There are two ways: XMLHTTPRequest or create a dll to wrap the logic inside. Here I use the first approach.
(2) The regular request seems never sent out
It is because of the cache. Add the header like:
xmlhttp.setRequestHeader("If-Modified-Since", "Sat 1 Jan 2000 00:00:00 GMT");
(3) How to detect the network connection error
You can hook a timeout. Or if simply the web server did not start up, you will get the status code 12029 (WinInet error of the attmpt to connection to the server failed).
(4) I use:
<body onload="start();windows.setInterval('refresh()',5000)" ....>
to request the web site every 5 seconds.
Saturday, October 9, 2010
decrypt the web page encrypted using HTML Guardian
After open the page, instead of using view the source, in the address bar, input:
javascript:var sorc=document.documentElement.outerHTML;document.open("text/plain");document.write(sorc);
javascript:var sorc=document.documentElement.outerHTML;document.open("text/plain");document.write(sorc);
Thursday, September 30, 2010
self pipe trick
Our old red hat Linux kernel does not support pselect(), so we have to use self pipe trick to handle it. In a select loop, we have to monitor both the file descriptors and signal. Whenever one of them happens, it should not block. For example,
while (1)
{
if (shutdownSignal)
//do shutdown
<===== The shutdown signal may come here
select (socketfd);
}
When no traffic sent through, and the select may block here until somebody send some data.
Or you can use siglongjmp, which will jump out from the signal handler even if you are inside of the select() call.
while (1)
{
if (shutdownSignal)
//do shutdown
<===== The shutdown signal may come here
select (socketfd);
}
When no traffic sent through, and the select may block here until somebody send some data.
Or you can use siglongjmp, which will jump out from the signal handler even if you are inside of the select() call.
Thursday, September 23, 2010
Write safe code
In these days, I come cross a few vulnerability of squid. Here are some lessons learned:
(1) strcmp
If you pass the strcmp with NULL pointer, the behavior is undefined and program may crash. Also check the NULL, '\0' and string can be trivial. This assumes that the str1 and str2 is end with '\0':
int compare(char* str1, char* str2)
{
if (str1==NULL || str2==NULL)
{
if (str1==str2)
return 0;
if (str1==NULL)
return -1;
if (str2==NULL)
return 1;
}
return strcmp(str1,str2);
}
(2) check the minor major version in HTTP header.
Squid was using this code to get numeric version number from HTTP header (HTTP/1.1...):
//assume the data stored in buffer, assume we only care about major digit now..
int maj=-1;
if (buffer see line end)
maj=1;
else
return;
for (pos=verStart; isdigit(buffer[pos]); pos++)
{
maj = maj * 10;
maj = maj + (hmsg->buf[i]) - '0';
}
//The maj should never be -1 until it is overflow at 65536
assert(maj!=-1)
(3) Recently, there is a vulnerablity in bzip2 code
int N, result;
while (buffer not end)
{
//read buffer;
result+=N*2;
}
Here the result is signed integer and it may overflow, which cause undefined behavior.
(1) strcmp
If you pass the strcmp with NULL pointer, the behavior is undefined and program may crash. Also check the NULL, '\0' and string can be trivial. This assumes that the str1 and str2 is end with '\0':
int compare(char* str1, char* str2)
{
if (str1==NULL || str2==NULL)
{
if (str1==str2)
return 0;
if (str1==NULL)
return -1;
if (str2==NULL)
return 1;
}
return strcmp(str1,str2);
}
(2) check the minor major version in HTTP header.
Squid was using this code to get numeric version number from HTTP header (HTTP/1.1...):
//assume the data stored in buffer, assume we only care about major digit now..
int maj=-1;
if (buffer see line end)
maj=1;
else
return;
for (pos=verStart; isdigit(buffer[pos]); pos++)
{
maj = maj * 10;
maj = maj + (hmsg->buf[i]) - '0';
}
//The maj should never be -1 until it is overflow at 65536
assert(maj!=-1)
(3) Recently, there is a vulnerablity in bzip2 code
int N, result;
while (buffer not end)
{
//read buffer;
result+=N*2;
}
Here the result is signed integer and it may overflow, which cause undefined behavior.
Wednesday, September 15, 2010
From Red Hat 8.0 to Centos 5.3
This page is reserved for the porting from Red Hat 8.0 to Centos 5.3:
1). What is changed?
a). Native Posix Thread support. Special synchronization primitive: futex
b). O(1) scheduler and SMP Scalability.
c). Preemptive kernel.
d). Latency improvement. schedule latency <0.5 microseconds
e). Redesign block layer
f). Improved VM Subsystem
1). What is changed?
a). Native Posix Thread support. Special synchronization primitive: futex
b). O(1) scheduler and SMP Scalability.
c). Preemptive kernel.
d). Latency improvement. schedule latency <0.5 microseconds
e). Redesign block layer
f). Improved VM Subsystem
Tuesday, September 7, 2010
set up a user on centos
Here are some notes to add a user and set up sudo in CentOS:
(1) Create a user ,for example, alice:
useradd alice
passwd alice
(2) Add to the sudo file
visudo
Then add this to the last line:
alice All=(ALL) ALL
(3) ssh-agent
(4) ssh-agent automatically
sudo yum install openssh-askpass
Main Menu Button (on the Panel) => Preferences => More Preferences =>Sessions, and click on the Startup Programs tab. Click Add and enter /usr/bin/ssh-add in the Startup Command text area
(1) Create a user ,for example, alice:
useradd alice
passwd alice
(2) Add to the sudo file
visudo
Then add this to the last line:
alice All=(ALL) ALL
(3) ssh-agent
/usr/bin/ssh-agent $SHELL
ssh-add
(4) ssh-agent automatically
sudo yum install openssh-askpass
Main Menu Button (on the Panel) => Preferences => More Preferences =>Sessions, and click on the Startup Programs tab. Click Add and enter /usr/bin/ssh-add in the Startup Command text area
Thursday, July 22, 2010
Redirect https via proxy
It is hard to redirect https via proxy, for example, generating a block page for the https traffic as IE stop you doing that:
The idea is to send 500 page back with javascript and/or html redirect in place, after the page is loaded in browser, then the browser will try to access redirected https web site.
The idea is to send 500 page back with javascript and/or html redirect in place, after the page is loaded in browser, then the browser will try to access redirected https web site.
Array is not a const pointer
It sounds like you can use the array and pointer interchangeably:
char a[]="abcdef";
char* b=a;
b[0]='e';
it will change the a[0] to 'e'.
However, the sizeof(a) and sizeof(b) is different. The former is 6 and the latter is 4 (32 bit system).
char a[]="abcdef";
char* b=a;
b[0]='e';
it will change the a[0] to 'e'.
However, the sizeof(a) and sizeof(b) is different. The former is 6 and the latter is 4 (32 bit system).
Wednesday, June 16, 2010
suffix/prefix expressions in google safe browsing
google uses host suffix/path prefix expressions to hash the blacklist and malwarelist url for google safe browsing.
When you try to match against a URL: http://www.google.com/header/x.html, you will try all the combination:
google.com/
google.com/header/
google.com/header/x.html
The original design only download 4 bytes hash, when it matches, it will contact the google server again to download 32 bytes hash.
When you try to match against a URL: http://www.google.com/header/x.html, you will try all the combination:
google.com/
google.com/header/
google.com/header/x.html
The original design only download 4 bytes hash, when it matches, it will contact the google server again to download 32 bytes hash.
Thursday, June 3, 2010
URL categorization
Some resources to help collect URL categorization information:
Google top 1000
Alexa 1,000,000 top sites
URLblacklist
K9 Web Protection
Squid Guard
Dans Guardian
The Dans Guardian has the regular expression for content filter under the folder:
configs/lists
Google top 1000
Alexa 1,000,000 top sites
URLblacklist
K9 Web Protection
Squid Guard
Dans Guardian
The Dans Guardian has the regular expression for content filter under the folder:
configs/lists
Friday, May 28, 2010
SSL ciphers difference
I am looking at the difference of these ciphers after I run
openssl ciphers -v
DHE-RSA-AES256-SHA SSLv3 Kx=DH Au=RSA Enc=AES(256) Mac=SHA1
DHE-DSS-AES256-SHA SSLv3 Kx=DH Au=DSS Enc=AES(256) Mac=SHA1
AES256-SHA SSLv3 Kx=RSA Au=RSA Enc=AES(256) Mac=SHA1
As it states, they are all used in SSLv3, encryption is AES(256), and Message authentication codes is SHA1
the DHE-xxx use Diffie-Hellman (need authentication) key exchange, but the AES256-SHA uses RSA (both digital signing and encrypting data) key exchange. The Auth difference between RSA and DSS (Digital Signature standard). Verify the DSA a little bit slow.
openssl ciphers -v
DHE-RSA-AES256-SHA SSLv3 Kx=DH Au=RSA Enc=AES(256) Mac=SHA1
DHE-DSS-AES256-SHA SSLv3 Kx=DH Au=DSS Enc=AES(256) Mac=SHA1
AES256-SHA SSLv3 Kx=RSA Au=RSA Enc=AES(256) Mac=SHA1
As it states, they are all used in SSLv3, encryption is AES(256), and Message authentication codes is SHA1
the DHE-xxx use Diffie-Hellman (need authentication) key exchange, but the AES256-SHA uses RSA (both digital signing and encrypting data) key exchange. The Auth difference between RSA and DSS (Digital Signature standard). Verify the DSA a little bit slow.
Tuesday, May 25, 2010
C++ template tricks
Here are a few tricks for future reference:
1. Explicitly specify which template function version you want to call. This is especially useful if you have parameter type in return type.
template <class RET, class T>
RET f(T t)
{
}
char ch=f<char>("abc"); //partial specify is enough as the second parameter can be deducted
2. exported keyword is supposed to allow the template definition to be separated with the declaration. Since no one wants to support it, it is removed from C++ 0x.
In order to separate the template declaration and non-inline member function definition, you may have to put the instantiation declaration in its cpp file:
template class Foo<int>;
3. Friend function link error
If you declare a friend function in a template class, the compiler normally will take it as non-template definition:
Foo<int> operator+ <Foo<int>&, Foo<int>&);
In order to tell the compiler you want to use the template version:
template<class T> class Foo; //pre-declare Foo
template<class T> Foo<T> operator+(Foo<T>&, Foo<T>&); //a template declaration for friend
template <class T> class Foo{
//...
friend Foo<T> operator+<> (Foo<T>&, Foo<T>&); //I am using the template declaration before
};
4. The compiler does not look up dependent base class when looking up non-dependent name (signature not correlate with parametrized type). So the data structure defined in the template base class cannot be referred in the derived class. You can give a hint with typename like:
typename B::xyz x;
Similarly, if you want to refer a function defined in the template base class, you have to use:
this->f();
as this is always a dependent name, so it will search for the base class.
5. The template keyword is required before dependent names accessing member templates via ., ->, or ::
template <class T> void f(T& x)
{
int n=x.template convert<3>(pi); //tell compiler the dependent name is a template, otherwise, it can be treated as "(x.convert <3) > pi".
}
6. Default parameter list as shown in our previous tuple example:
template <class T0=null_type, class T1=null_type, class T2=null_type> class tuple;
7. Template partial specilization to absorb some parameters
template <int n, T t> struct element; //This is the real template interface I want to use
template <int n, Class TT, class HH> struct element<n, cons<HH, TT>> {
}; //Use it as a different mean.
1. Explicitly specify which template function version you want to call. This is especially useful if you have parameter type in return type.
template <class RET, class T>
RET f(T t)
{
}
char ch=f<char>("abc"); //partial specify is enough as the second parameter can be deducted
2. exported keyword is supposed to allow the template definition to be separated with the declaration. Since no one wants to support it, it is removed from C++ 0x.
In order to separate the template declaration and non-inline member function definition, you may have to put the instantiation declaration in its cpp file:
template class Foo<int>;
3. Friend function link error
If you declare a friend function in a template class, the compiler normally will take it as non-template definition:
Foo<int> operator+ <Foo<int>&, Foo<int>&);
In order to tell the compiler you want to use the template version:
template<class T> class Foo; //pre-declare Foo
template<class T> Foo<T> operator+(Foo<T>&, Foo<T>&); //a template declaration for friend
template <class T> class Foo{
//...
friend Foo<T> operator+<> (Foo<T>&, Foo<T>&); //I am using the template declaration before
};
4. The compiler does not look up dependent base class when looking up non-dependent name (signature not correlate with parametrized type). So the data structure defined in the template base class cannot be referred in the derived class. You can give a hint with typename like:
typename B
Similarly, if you want to refer a function defined in the template base class, you have to use:
this->f();
as this is always a dependent name, so it will search for the base class.
5. The template keyword is required before dependent names accessing member templates via ., ->, or ::
template <class T> void f(T& x)
{
int n=x.template convert<3>(pi); //tell compiler the dependent name is a template, otherwise, it can be treated as "(x.convert <3) > pi".
}
6. Default parameter list as shown in our previous tuple example:
template <class T0=null_type, class T1=null_type, class T2=null_type> class tuple;
7. Template partial specilization to absorb some parameters
template <int n, T t> struct element; //This is the real template interface I want to use
template <int n, Class TT, class HH> struct element<n, cons<HH, TT>> {
}; //Use it as a different mean.
Friday, May 21, 2010
Use lambda caculus to analyze the C++ template
I am looking at some template expandsion for boost c++ _1 in the conext of:
foreach(vector, cout<<_1;)
The statment is expanded to a template:
boost::lambda::lambda_functor<
boost::lambda::lambda_functor_base<
boost::lambda::bitwise_action<boost::lambda::leftshift_action>,
boost::tuples::tuple<
boost::lambda::lambda_functor<
boost::lambda::lambda_functor_base<
boost::lambda::bitwise_action<boost::lambda::leftshift_action>
boost::tuples::tuple<
std::ostream&,boost::lambda::lambda_functor<
boost::lambda::placeholder<1>
>,
boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type
>
>
>,
const char,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type
>
>
>::operator ()
It is very tedious to analyze it. The idea is to use lambda calculus to help:
foreach(vector, cout<<_1;)
The statment is expanded to a template:
boost::lambda::lambda_functor<
boost::lambda::lambda_functor_base<
boost::lambda::bitwise_action<boost::lambda::leftshift_action>,
boost::tuples::tuple<
boost::lambda::lambda_functor<
boost::lambda::lambda_functor_base<
boost::lambda::bitwise_action<boost::lambda::leftshift_action>
boost::tuples::tuple<
std::ostream&,boost::lambda::lambda_functor<
boost::lambda::placeholder<1>
>,
boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type
>
>
>,
const char,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type,boost::tuples::null_type
>
>
>::operator ()
It is very tedious to analyze it. The idea is to use lambda calculus to help:
- Extended Rules for better reading
- One large character to represent function
- three small characters to represent variable name
- the , [ { may be used to distinguish multiple parameters
- Function and variables definition
- F.arg the lambda functor
- B.ret,arg the lambda base functor which takes return value and argument
- A.act The bitwise action, which helps to select left,right type as return type
- T.t0,t1..t9 The tuple can take up to 10 different types
- lsa left shift action
- ofs ofstream
- pl1 place holder one
- nut null type
- The whole template can be defined as:
- Parse this function:
- T(ofs,F(pl1)) associate the ofstream and _1
- B(A(lsa), T(ofs,F(pl1))) Add the return value of the ofstream to the base functor
- TF[...] wrap it as an function and put it in a tuple
- F(B(A{...})) Use the tuple as an action and then pass it to the Base functor and Functor.
How to create a C++ tuple to take any amount of type
Boost C++ had a tuple library, which can take any amount of the type (maxium 10):
tuple<int>
tuple<double&, const double&, const double, double*, const double*>
tuple<A, int(*)(char, int), B(A::*)(C&), C>
//(1) A null Type will be refered as default template parameters
struct null_type {};
//(2) A class can take up to 3 class type
template<class T0, class T1, class T2>
class tuple{
public:
//(3) Constructor for these type
tuple(T0 t){}
tuple(T0 t0, T1 t1){}
tuple(T0 t0, T1 t1, T2 t2){}
};
//(4) default parameter will take effect
template <class T0=null_type, class T1=null_type, class T2=null_type> class tuple;
}
using namespace tuples;
int main()
{
tuple<int> b(1);
tuple<int, char> c(2,'a');
tuple<int, float, char> d(3,12.5,'c');
}
tuple<int>
tuple<double&, const double&, const double, double*, const double*>
tuple<A, int(*)(char, int), B(A::*)(C&), C>
The key is to use the default parameter type in a separated declaration:
namespace tuples{//(1) A null Type will be refered as default template parameters
struct null_type {};
//(2) A class can take up to 3 class type
template<class T0, class T1, class T2>
class tuple{
public:
//(3) Constructor for these type
tuple(T0 t){}
tuple(T0 t0, T1 t1){}
tuple(T0 t0, T1 t1, T2 t2){}
};
//(4) default parameter will take effect
template <class T0=null_type, class T1=null_type, class T2=null_type> class tuple;
}
using namespace tuples;
int main()
{
tuple<int> b(1);
tuple<int, char> c(2,'a');
tuple<int, float, char> d(3,12.5,'c');
}
Wednesday, May 12, 2010
STL/Boost template based meta programming
Here is some experiment of the Arbitrary overload and Concept Check using C++ template.
#include
//is_integral
template struct is_integral{
enum {value=false};
};
template<> struct is_integral{
enum {value=true};
};
//Enable if
template struct enable_if { }; //default
template struct enable_if{
typedef T type;
};
//Arbitary ovlerload: only allow integer Foo : Substitute failure is not an error
template
typename enable_if::value, T>::type Foo(T x){return x*x;}
//Concept check
template void ignore_unused_variable_warning(const T&) { }
template void require_boolean(const T& t)
{
bool x=t;
ignore_unused_variable_warning(x);
}
template struct Comparable{
void constraints(){
require_boolean(a==b);
require_boolean(a!=b);
}
T a,b;
};
template void function_requires(Concept t){
void (Concept::*x)()=&Concept::constraints; //Force compiler to compile the constraints code
}
template void NeedCompare(T t){
function_requires(Comparable());
};
struct Data{
public:
Data operator*(const Data& data){return Data();}
private:
Data& operator=(const Data &);
};
int main()
{
is_integral a;
is_integral b;
Data data;
std::cout<<"A:"<<<" B:"<<
int x=Foo(12);
//Foo(12.3); //Error no matching function for call to Foo(double)
//NeedCompare(data) ; //no match for ‘operator==’ in
NeedCompare(12);
}
#include
//is_integral
template
enum {value=false};
};
template<> struct is_integral
enum {value=true};
};
//Enable if
template
template
typedef T type;
};
//Arbitary ovlerload: only allow integer Foo : Substitute failure is not an error
template
typename enable_if
//Concept check
template
template
{
bool x=t;
ignore_unused_variable_warning(x);
}
template
void constraints(){
require_boolean(a==b);
require_boolean(a!=b);
}
T a,b;
};
template
void (Concept::*x)()=&Concept::constraints; //Force compiler to compile the constraints code
}
template
function_requires(Comparable
};
struct Data{
public:
Data operator*(const Data& data){return Data();}
private:
Data& operator=(const Data &);
};
int main()
{
is_integral
is_integral
Data data;
std::cout<<"A:"<
int x=Foo(12);
//Foo(12.3); //Error no matching function for call to Foo(double)
//NeedCompare(data) ; //no match for ‘operator==’ in
NeedCompare(12);
}
Monday, May 10, 2010
How the google ratproxy works
The google ratproxy is used to find the potential web risk, specially, the XSS.
(1) Get the request and response
(2) Refer request header: whether the parameter contains the session tokens("token", "once", "secret", "secid", "auth", "=tok", "=sig") to detect the token leakage.
(3) When the response MIME type is active content type, it may warn "external code inclusion":
"text/html", /* HTML */
"application/xhtml+xml", /* XHTML */
"application/java-vm", /* Java class */
"application/java-archive", /* Java JAR */
"application/x-shockwave-flash", /* Flash */
"video/flv", /* Flash */
"video/x-flv", /* Flash */
(4) If it detects the POST request, it may warn "Cross-domain POST requests"
(5) Now check the URL and response
"iso8859-1", /* Valid Western */
"iso-8859-1", /* Invalid but recognized */
"iso8859-2", /* Valid European */
"iso-8859-2", /* Invalid but recognized */
"iso8859-15", /* ISO-8859-1, new and improved */
"iso-8859-15", /* ISO-8859-1, new and improved */
"windows-1252", /* Microsoft's Western */
"windows-1250", /* Microsoft's European */
"us-ascii", /* Old school but generally safe */
WARNING: Please note that "harmless" misspellings such as
'utf8' or 'utf_8' are *not* harmless, and may trigger utf-7
XSSes. Do not add these to the list unless thoroughly
validated.
(1) Get the request and response
(2) Refer request header: whether the parameter contains the session tokens("token", "once", "secret", "secid", "auth", "=tok", "=sig") to detect the token leakage.
(3) When the response MIME type is active content type, it may warn "external code inclusion":
"text/html", /* HTML */
"application/xhtml+xml", /* XHTML */
"application/java-vm", /* Java class */
"application/java-archive", /* Java JAR */
"application/x-shockwave-flash", /* Flash */
"video/flv", /* Flash */
"video/x-flv", /* Flash */
(4) If it detects the POST request, it may warn "Cross-domain POST requests"
(5) Now check the URL and response
- Is there any echoed query parameter in response body?
- Is there any echoed query parameter in response headers?
- check whether the URL contains the authentication fields? "login","user", "sess","account","pass"
- re-send without cookie to double check whether it is a request require authentication
- Sniff the char set in response body. The valid charset is:
"iso8859-1", /* Valid Western */
"iso-8859-1", /* Invalid but recognized */
"iso8859-2", /* Valid European */
"iso-8859-2", /* Invalid but recognized */
"iso8859-15", /* ISO-8859-1, new and improved */
"iso-8859-15", /* ISO-8859-1, new and improved */
"windows-1252", /* Microsoft's Western */
"windows-1250", /* Microsoft's European */
"us-ascii", /* Old school but generally safe */
WARNING: Please note that "harmless" misspellings such as
'utf8' or 'utf_8' are *not* harmless, and may trigger utf-7
XSSes. Do not add these to the list unless thoroughly
validated.
- Try_replay_xsrf: set all the session tokens in the request to clobber value, and then send it again to the server, then compare the md5 of the result.
- The header based check: for example, authentication header but not 40x response.
- HTTP redirect: detect 302 response with location header: is the host name in the request query parameter or payload?
- Check the redirect in payload: HTTP-EQUIV=\"Refresh\"
- Handle Content-Type: multipart/form-data ??
- If the response and request cookies are the same, Cookie issuer with no XSRF protection
- POST requests that do not require authentication are interesting
- Multiple "Content-Type or Content-Disposition" headers
- Misstated Content-Length: pay load greater than the content-length header
- Check cross domain POST request: the request host and refer host is different.
- Cacheable SetCookie: Check if the web page can be cached and with the cookie/auth
- Missing charsets and typos lead to UTF-7 cross-site scripting.
- content sniffing and content-type mismatch
- Echoed markup in a query is bad.
- File path in query parameters: Non-echoed paths in query are often bad
- Java method names in a query are bad.
- Javascript code in a query is bad; ignore alert(...) though, as this is almost always a sign of manual XSS testing, not a legitimate functionality.
- SQL statement in a query is bad.
- Check for OGNL-style parameter names.
- Check for what looks like JSON with inline HTML (we skip standalone scripts,as they often contain static HTML to be rendered). We do some basic quotestate tracking not to get confused by regular arithmetic. No commenttracking, but that shouldn't break easily.
- Response with directory index: "\>[To Parent Directory]\<" "\
Index of /" - javascript .write(, .writeln(, .innerHtml, .outerHtml, document.referrer, document.domain
Monday, April 26, 2010
summary of the assemble code map in gcc
The gcc function call stack layout looks like this (from high address to low address):
Grand Parent ebp ----> Parameters ----> parent return address ---> parent ebp ---> Local variables
Assume that you will call
m=foo("abcdef")
And it will be disassembled as
a) push 0x422dfsf ; This is the address store "abcdef"
b) call foo; when this is called, the parent instruction address in c) will be on the stack
c) add 0x4 %esp; pop up the parameter pushed into after call
e) mov %eax, 0xfffffffc(%ebp); the return value is in the %eax, and it will be assigned to the local variable
In foo() function
a) push %ebp; save parent ebp
b) move %esp, %ebp; now point to new ebp
c) sub 0x4, %eps; move ebps to allocate the space for local variables.
...
d) move 0xfffffffc(%ebp), %eax; store the function return result into the %eax
e) leave
f) ret; change the eps to point to the parameters at 8(%ebp), set eip (instruction pointer) to parent return address 4(%ebp), set ebp to parent ebp.
Grand Parent ebp ----> Parameters ----> parent return address ---> parent ebp ---> Local variables
Assume that you will call
m=foo("abcdef")
And it will be disassembled as
a) push 0x422dfsf ; This is the address store "abcdef"
b) call foo; when this is called, the parent instruction address in c) will be on the stack
c) add 0x4 %esp; pop up the parameter pushed into after call
e) mov %eax, 0xfffffffc(%ebp); the return value is in the %eax, and it will be assigned to the local variable
In foo() function
a) push %ebp; save parent ebp
b) move %esp, %ebp; now point to new ebp
c) sub 0x4, %eps; move ebps to allocate the space for local variables.
...
d) move 0xfffffffc(%ebp), %eax; store the function return result into the %eax
e) leave
f) ret; change the eps to point to the parameters at 8(%ebp), set eip (instruction pointer) to parent return address 4(%ebp), set ebp to parent ebp.
General program bugs
- Buffer overflow
- Use After free:
For example, return a pointer on a stack.
- Memory leak:
- Double free
- Free unallocated memory
free string;
- Heap overflow
strncpy(p,mystr,MAX_LENGTH)
use mcheck and mprobe MALLOC_CHECK_
- Race condition
count++;
(1) Read into register
(2) increase to one
(3) write back the new value into memory
- Deadlock
- Compiler optimization
memory barrier
asm volatile ("" : : : "memory")
write back all the data from register back to the memory.
- CPU optimization
- Signal Handler
- Tips for troubleshooting
(bash) logger -p err "test"
syslog()
change syslog level by modify the /etc/syslog.conf
disable assert() using NDEBUG
Function to print backtrace
#include
int backtrace(int **buffer, int size)
POSIX threads trace toolkit
use ar to create static library and ln to create a dynamic library
Thursday, April 22, 2010
Rebuild backtrace of GDB
Sometimes the GDB had a corrupt backtrace, which has to be rebuilt. Here is an example:
find out the stack base pointer
examine 2048 bytes in hex format for the memory around that address
(gdb) x/2048h 0xbf9ef358
objdump -Dt /usr/sbin/squid > squid.dis
Then look for
find out the stack base pointer
(gdb) info reg ebp
ebp 0xbf9ef358 0xbf9ef358
examine 2048 bytes in hex format for the memory around that address
(gdb) x/2048h 0xbf9ef358
0xbf9ef358: 0xbf9ef388 0x00d94cf7 0x0a85e988 0x00000024 0xbf9ef368: 0xbf9ef388 0x00d94d8a 0x00000000 0x05000000 0xbf9ef378: 0x00000000 0x0a85e994 0x00000000 0x0a86dacc 0xbf9ef388: 0xbf9ef418 0x00c331ab 0x0a85e994 0x00000015 0xbf9ef398: 0x00000000 0xb7800000 0x0992f188 0xbf9ef40c
So you have stack:
StackFrame | Instruction Pointer | Pointer ------------------------ 0xbf9ef388 | 0x00d94cf7
0xbf9ef418 | 0x00c331abdump the symobl information
objdump -Dt /usr/sbin/squid > squid.dis
Then look for
0x00c331abinside squid.dis
Tuesday, April 20, 2010
GDB cheat sheet
Here is the cheat sheet for GDB in Linux
% ulimit -c 500000
Set maximum core file size to 500K.
(gdb) info functions
(gdb) delete breakpoints 1
(gdb) commands 1
print variable
continue
end
Note: break at 188 lines and then set a group of commands at break point 1.
define cls
shell clear
end
document cls
clears screen for gdb
end
Example Macros for STL debugging.
(gdb) i signal
(gdb) handle SIG32 nostop print pass
Client side: gdb program
(gdb) handle SIGTRAP nonstop noprint pass
(gdb) target remote gdbserverip:9999
(gdb) set $table=*table_ptr
(gdb) show conv
Checkpoint
(gdb) checkpoint
(gdb) i checkpoint
(gdb) restart checkpoint-id
info: show things about the program being debugged
show: show things about the debugger
- Enable the core file
% ulimit -c 500000
Set maximum core file size to 500K.
- List all proc information and function list
(gdb) info functions
- Disasseeble
- Display and clear breakpoints
(gdb) delete breakpoints 1
- Watching it live
(gdb) commands 1
print variable
continue
end
Note: break at 188 lines and then set a group of commands at break point 1.
- Break when a specific address is accessed. Refer to this article.
- Define in macro and alias in ~/.gdbinit A good article about it.
define cls
shell clear
end
document cls
clears screen for gdb
end
Example Macros for STL debugging.
- Handle Signal
(gdb) i signal
(gdb) handle SIG32 nostop print pass
- Multiple threads
- Remote debugging
Client side: gdb program
(gdb) handle SIGTRAP nonstop noprint pass
(gdb) target remote gdbserverip:9999
- Store history
(gdb) set $table=*table_ptr
(gdb) show conv
Checkpoint
(gdb) checkpoint
(gdb) i checkpoint
(gdb) restart checkpoint-id
- See Macros in program
- Why the program is stopped
- exit a loop
- print, info, show
info: show things about the program being debugged
show: show things about the debugger
Monday, April 19, 2010
Aho Corasic algorithm
Thursday, April 15, 2010
code review lesson learned
Just had some code review and got a lot of comments:
- mis-spell in comments. I should use the a spell check add-in
- copy and paste others code without change the styles
- use the const function whenever possible
- use std::string::empty() instead of compare with empty string
- include all dependent header in the header definition vs. depending on the cpp file include sequence.
- pay more attention to resource release, return true/false path even if most did not matter.
Tuesday, April 13, 2010
Detect OpenSSL errors
I want to detect detail openSSL client certificate error after call SSL_accept(). This can be found from the reason:
Unknown Client CA: SSL_R_TLSV1_ALERT_UNKNOWN_CA
No Client Certificate: SSL_R_PEER_DID_NOT_RETURN_A_CERTIFICATE
Certificate expired: SSL_R_NO_CERTIFICATE_RETURNED
The interesting is when client certificate is expired, it did not return SSL_R_SSLV3_ALERT_CERTIFICATE_EXPIRED, instead, it returns SSL_R_NO_CERTIFICATE_RETURNED.
Unknown Client CA: SSL_R_TLSV1_ALERT_UNKNOWN_CA
No Client Certificate: SSL_R_PEER_DID_NOT_RETURN_A_CERTIFICATE
Certificate expired: SSL_R_NO_CERTIFICATE_RETURNED
The interesting is when client certificate is expired, it did not return SSL_R_SSLV3_ALERT_CERTIFICATE_EXPIRED, instead, it returns SSL_R_NO_CERTIFICATE_RETURNED.
Monday, April 5, 2010
A simple self replicate code (quine)
Here is a simple self replicate C code:
main(){char* s="main(){char* s=%c%s%c;printf(s,34,s,34);}";printf(s,34,s,34);}
main(){char* s="main(){char* s=%c%s%c;printf(s,34,s,34);}";printf(s,34,s,34);}
Wednesday, March 31, 2010
Deploy signed msi file to multiple machines using group policy
Windows Active Directory have two ways to deploy an application:
More things:
The Group Policy can be used to deploy three kinds of files: msi (windows installer package), mst(transform files) and msp(patch files)
If you want to modify signed msi file,
- Publish to a user
- Assign application to a user or a computer
More things:
The Group Policy can be used to deploy three kinds of files: msi (windows installer package), mst(transform files) and msp(patch files)
If you want to modify signed msi file,
Wednesday, March 24, 2010
create a certificate and sign it using signtool.exe: the windows way
It is assumed that you have installed the Visual Studio 2005, and open a command promote for it:
(1) create the certificate:
makecert.exe -sv mykey.pvk -n "CN=Mycompany Inc." mycert.cer
now you will have the private key in mykey.pvk and the certificate in mycert.cer
(2) Convert the certificate to the software publisher certificate (.spc) format
cert2spc.exe mycert.cer mycert.spc
It will generate the mycert.spc, which will use together with mykey.pvk to sign your executable.
(3) Before sign it, you have to comtine these two files into a single PFX file
pvk2pfx.exe -pvk mykey.pvk -pi -spc mycert.spc -pfx mycert.pfx -po
You always have to specify a password for -po.
(4) now you can sign your code using
signtool.exe sign /f mycert.pfx /p /t /v filetobesigned
The url can be one of he following:
http://timestamp.verisign.com/scripts/timestamp.dll
http://timestamp.globalsign.com/scripts/timestamp.dll
http://timestamp.comodoca.com/authenticode
(1) create the certificate:
makecert.exe -sv mykey.pvk -n "CN=Mycompany Inc." mycert.cer
now you will have the private key in mykey.pvk and the certificate in mycert.cer
(2) Convert the certificate to the software publisher certificate (.spc) format
cert2spc.exe mycert.cer mycert.spc
It will generate the mycert.spc, which will use together with mykey.pvk to sign your executable.
(3) Before sign it, you have to comtine these two files into a single PFX file
pvk2pfx.exe -pvk mykey.pvk -pi
You always have to specify a password for -po.
(4) now you can sign your code using
signtool.exe sign /f mycert.pfx /p
The url can be one of he following:
http://timestamp.verisign.com/scripts/timestamp.dll
http://timestamp.globalsign.com/scripts/timestamp.dll
http://timestamp.comodoca.com/authenticode
Monday, March 22, 2010
Add a customize data to an msi installer
The idea is to append the data to the certificate section at the end of the file. A little bit background first:
- How to generate/verify the signature: it hashes the executable and then used to make a digital certificate which is authenticated by some authority. This certificate is attached to the end of the PE executable in certificate table. When the executable is loaded, windows will compute the hash value and compares with the value in the certificate table.
- There are three areas of PE executable are excluded from the hash computation:
- the checkum in the optional windows specific header, 4 bytes
- the certificate table entry in the optional windows specific header. 8 bytes
- The Digital certificat section at the end of the file. Variable length.
- PE header offset located at 0x3c, read that offset as pe_offset
- pe_offset will start with "PE\0\0", which is 4 bytes
- From the pe_offset, find out the Certificate Table Entry (after 28 bytes COFF header and other header 120 bytes), so the offset to the pe_offset should be 0x98 (152bytes)
- You can first read the certificate table entry offset (4 bytes), and then the size of the certificate table entry (4 bytes)
- Modify the size if you want to append the data.
- Now seek to the certificate table entry (the absolute location is in the previous certificate table entry offset), change again the certificate size if you modified it.
- Then go to the end of the file and add the new payload.
- Possibly calculate the new checksum of the file.
Thursday, March 18, 2010
Troublehsoot applications on Windows
I am looking at the ways to troubleshoot windows application crash. When a program error occurs in windows, the system will try to find a program error handler. If the error is not handled, the system will try to process un-handled errors by looking at registry: HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug. If the Auto name entries, it will pop up a message box and need user to confirm. The Debugger specifies the which debug application will be used.
- Dr. Watson (drwtsn32): The debug tools that can create system log and core dump files. You can enable it by running "drwtsn32 -i"
- userdump.exe : This will work in Windows 7.
- In Windows 7, The Dr. Watson is replaced with the "Action Center", so you have to do following:
- Run Task Manager
- Go to Processes tab and right-click on the crashed process
- Select the Create Dump File item;Select the Create Dump File item
- Pick the .DMP file created;
- Open Start->Run, type %USERPROFILE%\AppData\Local\Microsoft\Windows\WER\ReportArchive string and hit Enter;
- After that you will see a folder (or a few folders), the name of which starts with "Report"
- Collect the Report.wer file from the last created folder.
Thursday, March 11, 2010
shutdown vs. close in the socket
I was using the boost asio in windows XP to develop a simple proxy. The proxy will relay the traffic between the browser and the upstream server. Every time when some data is received from the server side, the boost c++ asio async_write , will be used to send all the response to the client socket, and then it will try to read more from the server side socket. If the server side closes the connection, the client side should already flush out the http response received, so it would be safe to close the client side socket.
However, this did not work sometimes, for example, http://sourceforge.net/projects/tailforwin32/ works for firefox but does not work for IE8. Even if the log shows that all data has been written to client socket, the page just cannot be displayed.
The IE8 may detect the connection close before read all the data in the wire: when using step debug, the IE8 can handle it properly. The idea is to shutdown both read and write of the client socket instead of close it, so that the IE8 can read the data and detect the socket is closing. Here is a good article on socket close vs. shutdown. The close will close the socket id for the process, but other process may still use it, and it is still open for read and write. The shutdown will close the read/write pipe for all process. Any read/write will result in EOF.
However, this did not work sometimes, for example, http://sourceforge.net/projects/tailforwin32/ works for firefox but does not work for IE8. Even if the log shows that all data has been written to client socket, the page just cannot be displayed.
The IE8 may detect the connection close before read all the data in the wire: when using step debug, the IE8 can handle it properly. The idea is to shutdown both read and write of the client socket instead of close it, so that the IE8 can read the data and detect the socket is closing. Here is a good article on socket close vs. shutdown. The close will close the socket id for the process, but other process may still use it, and it is still open for read and write. The shutdown will close the read/write pipe for all process. Any read/write will result in EOF.
Friday, March 5, 2010
Stream based filter
I am thinking to create a filter which can parse/decompress/scan partial read HTTP data. The idea is to use three buffers and implement a customized istream and ostream, both will use a customerized streambuf. In the compression process:
a). the buffer to store compressed data in the customized streambuf.
b). when overflow is called, the first buffered data sent to the de-compressor which may also buffer the data internally. When it outputs the data, it may be sent to the third buffer.
c). the user defined stream buffer.
The streambuf has a few function has to be override:
a). Overflow() in output stream: there is no space in the buffer, write it out. The data between ibegin and iend is sent to the gzip library to compress to a file
b). Underflow() in input stream: there is no more character in the buffer, read more. The data read is decrypted by call gzip library and then dump to input buffer.
An other option is to use
http://www.codeproject.com/KB/stl/zipstream.aspx
a). the buffer to store compressed data in the customized streambuf.
b). when overflow is called, the first buffered data sent to the de-compressor which may also buffer the data internally. When it outputs the data, it may be sent to the third buffer.
c). the user defined stream buffer.
The streambuf has a few function has to be override:
a). Overflow() in output stream: there is no space in the buffer, write it out. The data between ibegin and iend is sent to the gzip library to compress to a file
b). Underflow() in input stream: there is no more character in the buffer, read more. The data read is decrypted by call gzip library and then dump to input buffer.
An other option is to use
boost::iostreams::gzip
http://www.codeproject.com/KB/stl/zipstream.aspx
Thursday, February 18, 2010
rpm strip out debug symbol
When I build the rpm, it always extract out the debug symbols by running:
rpm --eval %__spec_install_post
/usr/lib/rpm/brp-compress
/usr/lib/rpm/brp-strip
/usr/lib/rpm/brp-strip-static-archive
/usr/lib/rpm/brp-strip-comment-note
which is very frustrating. You can disable it by using:
rpm --eval %__spec_install_post
/usr/lib/rpm/brp-compress
/usr/lib/rpm/brp-strip
/usr/lib/rpm/brp-strip-static-archive
/usr/lib/rpm/brp-strip-comment-note
which is very frustrating. You can disable it by using:
%define __spec_install_port /usr/lib/rpm/brp-compress
%define debug_package %{nil}
However, I already have a version installed in the production.
I got core file, but cannot get useful backtrace. So this time,
I have all these debug options enabled:
-ggdb -g3 -O0
Friday, February 5, 2010
send https request via a proxy which is chain to another proxy using ssl
I am trying to send a https request via a proxy which is chain to another proxy by setting the cache_peer ssl in the first proxy and https_port in the second proxy. The upstream proxy gives me this error:
2010/02/05 16:31:39| clientNegotiateSSL: Error negotiating SSL connection on FD 21: error:1407609B:SSL routines:SSL23_GET_CLIENT_HELLO:http
s proxy request (1/-1)
This is the error defined in the openssl Library: SSL_R_HTTP_PROXY_REQUEST.
Let me have a background how the https via proxy works:
(1) The browser send a CONNECT request to the proxy
(2) The proxy open the 443 on the https web server
(3) Then the proxy sends the HTTP/1.0 200 connect established to the browser
(4) At this point, all the message will be relayed from this connection.
In our case, as the upstream is also a proxy, so it will forward "CONNECT xxx:443 HTTP/1.0 ...." to the upstream proxy directly, which is not SSL traffic and rejected by the upstream.
2010/02/05 16:31:39| clientNegotiateSSL: Error negotiating SSL connection on FD 21: error:1407609B:SSL routines:SSL23_GET_CLIENT_HELLO:http
s proxy request (1/-1)
This is the error defined in the openssl Library: SSL_R_HTTP_PROXY_REQUEST.
Let me have a background how the https via proxy works:
(1) The browser send a CONNECT request to the proxy
(2) The proxy open the 443 on the https web server
(3) Then the proxy sends the HTTP/1.0 200 connect established to the browser
(4) At this point, all the message will be relayed from this connection.
In our case, as the upstream is also a proxy, so it will forward "CONNECT xxx:443 HTTP/1.0 ...." to the upstream proxy directly, which is not SSL traffic and rejected by the upstream.
Thursday, February 4, 2010
Configure polygraph to test https reverse proxy
You can configure the polygraph to test https reverse proxy:
(1) Set up the https_port for the reverse proxy (You have to disable the client certificate authentication as the polygraph does not support it.)
(2) In the workload file, add these tags:
ssl_wraps = [ wrap ];
(1) Set up the https_port for the reverse proxy (You have to disable the client certificate authentication as the polygraph does not support it.)
(2) In the workload file, add these tags:
SslWrap wrap = {
protocols = [ "SSLv2":40%, "SSLv3", "TLSv1" ];
root_certificate = "/opt/exampleca/cacert.pem";
ciphers = [ "ALL:HIGH": 100% ];Also add this to Robot:
rsa_key_sizes = [ 1024bit ];
session_resumption = 40%;
session_cache = 100;
};
Proxy pxySsl = {
addresses = [ '10.191.237.4:8888' ];
server.ssl_wraps = [ wrap ];
};
use(pxySsl);
ssl_wraps = [ wrap ];
Monday, January 25, 2010
ssl vpn
Yesterday I got a chance to take a look at a SSL VPN. The endpoint can download an ActiveX contorl from a secure web site after logon it. This ActiveX control will create a virtual adaptor, assign a user a virtual private IP. The traffic to the intranet will be intercepted and forward to this virtual adaptor (by chaning routing table?), and then a SSL tunnel is used to establish connections to the web servers.
The main advanatages of the SSL VPN are:
The main advanatages of the SSL VPN are:
- Do not have to install VPN client. You only need a browser
- Provides granularity for access control.
- Use port 443 opened by most firewalls.
Monday, January 18, 2010
Squid ssl cache_peer
The Squid supports cache_peer using ssl:
In downstream squid:
In upstream squid:
In downstream squid:
cache_peer parentip parent parentport 0000 default no-query no-digest ssl sslcert=/opt/exampleca/certs/client2.pem sslkey=/opt/exampleca/client2private.pem sslcafile=/opt/exampleca/cacert.pem name=https-local
In upstream squid:
https_port parentport cert=/opt/exampleca/certs/server.pem key=/opt/exampleca/serverprivate.pem clientca=/opt/exampleca/cacert.pem capath=/opt/exampleca crlfile=/opt/exampleca/my_crl.pem sslflags=VERIFY_CRL sslcontext=mlroaming
Wednesday, January 13, 2010
Coroutine in C
The Coroutine provides more points of entry and exit than routine. The Duff's Device provide a way to simulate it in C Code. Boost C++ also have an experimental implementation.
Monday, January 4, 2010
Wt: A C++ web toolkit
Wt (Pronounced as Witty) is a goolkit to build a web sites using C++. You can use the it to create high performace web application. It uses the C++ libary to generate the javascript code. Potentially, it can help to avoid XSS security problem as it have full control to generated javascript.
Subscribe to:
Posts (Atom)