Tuesday, December 28, 2010

computer virus

I am trying to understand how the anti-virus software create/manage the virus signatures.

This article gives me some idea how the signature is created. When a new virus is found, you can run a 'decoy' program to collect as much as possible variance of the virus, and then compare them to get the candidate signatures. In order to avoid 'false positive', the candidate signature may run against a large volume of the good data to obtain unique signature for the virus.

Here is another blog regarding how to create the clamav signatures. A detail explain to eicar virus test file.

A nice web site regarding the virus analysis.

Friday, December 17, 2010

Linux FAQ

(1) What is the maximum memory for 32 bit Linux?
  Normally, Linux split 4G address space into two parts: 3G for user space virtual address and 1G for kernel virtual address.Here are some solutions:
  a). HIGHMEM solution for up to 4G
  Use the kmap to map the memory in ZONE_HIGHMEM to ZONE_NORMAL
  b). HIGHMEM solution for using 64G memroy (36bit bus address)
  This is enabled via PAE(Physical Address Extension) extension of the Pentiumpro processor. Then map ZONE_HIGHMEM to ZONE_NORMAL

(2) What is the maximum memory for 64 bit linux?
Based on this, it is 16T, which is controlled by:
  arch/x86/include/asm/sparsemem.h:
  # define MAX_PHYSMEM_BITS     44

Wednesday, December 15, 2010

Obtained current logon user name in Windows

We had an application to retreive current logon user name in Windows. Since our application is an HTTP proxy, there are two approaches:

(1) Obtain user name using TCP connection
a). tcptable=GetExtendedTcpTable(...)
b). Get the process id of the socket
    foreach(tcptable->dwNumEntries)
            if (tcptable->table[i].dwLocalPort==port)
                processid=tcptable->table[i].dwOwningpid;
c). Open process to get the process token
    hrpocess=OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, processid)
    OpenProcessToken(hprocess, TOKEN_QUERY, &hToken)
d) Then use the token to user info
    GetTokenInformation(hToken, TokenUser, PUserInfo, dwSize, &dwSize)
e) Look up the user name from the user info
    LookupAccountSidW(NULL, pUserInfo->User.sid, Name, &dwSize, lpDomain, &dwSize, &SidType)


(2) Obtain the user name using active session
a) enumerate the sessions
   WTSEnumerateSessions(WTS_CURRENT_SERVER_HANDLE, 0 ,1, &pSessionInfo, &dwCount)
b) foreach dwCount
        if (pSessionInfo[i].State==WTSActive)
              dwSessionId=pSessionInfo[i].SessionId;
c). Then query the user token use the session id
      WTSQueryUserToken(dwSessionId, &hToken) 
After that following the d) and e) in approach (1)

Tuesday, December 14, 2010

large file cache use mmap

We had a 3G file and fields position can be identified using hash value. We use the mmap to deal with the cache.
(1) open the file
(2) caculate the position, if the data is not in memory, then call:
mmap2(NULL, 16384, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0x210e8)
which loads the 16K into memory.
(3) for the data in the next request, call the mmap2 again (the mmap can be called multiple times).
(4) When you run out of cache (let's say 300M), find the least used fields, then run
munmap(....)
to cache the fileds out.

Perl provides a module Cache:FastMmap to share memory cross processes.

Wednesday, December 8, 2010

process virtual memory layout in 32 bit and 64 bit Linux

I did some experiment to compare the process virtual memory layout in 32 bit linux (2.6.9) and 64 bit linux(2.6.18).

This program is used:

const size_t arraySize=0x1000000; //16M
char global[arraySize*2];
int main(int argc, char** argv)
{
        int i=0;
        char local[arraySize];
        char* heap=(char*)malloc(arraySize*3);
        for (i=0; i        for (i=0; i        for (i=0; i        printf("local %lx, global %lx, heap %lx\r\n",local,global, heap);
}


(1) In linux 32 bit, the output is:

local bee40ef0, global 08049800, heap 4000a008

The memory map is:
00734000-00749000 r-xp 00000000 03:03 32066      /lib/ld-2.3.4.so
00749000-0074a000 r--p 00015000 03:03 32066      /lib/ld-2.3.4.so
0074a000-0074b000 rw-p 00016000 03:03 32066      /lib/ld-2.3.4.so
00753000-0075c000 r-xp 00000000 03:03 38761      /lib/libgcc_s-3.4.6-20060404.so.1
0075c000-0075d000 rw-p 00009000 03:03 38761      /lib/libgcc_s-3.4.6-20060404.so.1
00774000-00898000 r-xp 00000000 03:03 32151      /lib/tls/libc-2.3.4.so
00898000-00899000 r--p 00124000 03:03 32151      /lib/tls/libc-2.3.4.so
00899000-0089c000 rw-p 00125000 03:03 32151      /lib/tls/libc-2.3.4.so
0089c000-0089e000 rw-p 0089c000 00:00 0
008a0000-008c1000 r-xp 00000000 03:03 38754      /lib/tls/libm-2.3.4.so
008c1000-008c3000 rw-p 00020000 03:03 38754      /lib/tls/libm-2.3.4.so
00af9000-00bb9000 r-xp 00000000 03:03 195327     /usr/lib/libstdc++.so.6.0.3
00bb9000-00bbe000 rw-p 000bf000 03:03 195327     /usr/lib/libstdc++.so.6.0.3
00bbe000-00bc4000 rw-p 00bbe000 00:00 0
08048000-08049000 r-xp 00000000 00:14 34996225   /mnt/centos/ut/a.out
08049000-0804a000 rw-p 00000000 00:14 34996225   /mnt/centos/ut/a.out
0804a000-0a04a000 rw-p 0804a000 00:00 0
40008000-4300d000 rw-p 40008000 00:00 0
bee40000-c0000000 rw-p bee40000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0

The exeutable is start from 0x08048000, and then the data and bss section for global variable (32M):
0804a000-0a04a000 rw-p 0804a000 00:00 0

And then the heap starts from 1/3 of the user space (0x4000000), which is 48M:
40008000-4300d000 rw-p 40008000 00:00 0

The stack begins at 0xc0000000 and grows to low address, which is 18M:
bee40000-c0000000 rw-p bee40000 00:00 0

(2) in Linux 64 bit:

The output is:
local 7fffa88b2e40, global 00600b20, heap 2ab801210010

The memory map is:
00400000-00401000 r-xp 00000000 fd:00 34996225                           /home/bliu/ut/a.out
00600000-00601000 rw-p 00000000 fd:00 34996225                           /home/bliu/ut/a.out
00601000-02601000 rw-p 00601000 00:00 0
314ec00000-314ec1c000 r-xp 00000000 fd:00 11796527                       /lib64/ld-2.5.so
314ee1b000-314ee1c000 r--p 0001b000 fd:00 11796527                       /lib64/ld-2.5.so
314ee1c000-314ee1d000 rw-p 0001c000 fd:00 11796527                       /lib64/ld-2.5.so
314f000000-314f14d000 r-xp 00000000 fd:00 11796545                       /lib64/libc-2.5.so
314f14d000-314f34d000 ---p 0014d000 fd:00 11796545                       /lib64/libc-2.5.so
314f34d000-314f351000 r--p 0014d000 fd:00 11796545                       /lib64/libc-2.5.so
314f351000-314f352000 rw-p 00151000 fd:00 11796545                       /lib64/libc-2.5.so
314f352000-314f357000 rw-p 314f352000 00:00 0
315ae00000-315ae0d000 r-xp 00000000 fd:00 11796784                       /lib64/libgcc_s-4.1.2-20080825.so.1
315ae0d000-315b00d000 ---p 0000d000 fd:00 11796784                       /lib64/libgcc_s-4.1.2-20080825.so.1
315b00d000-315b00e000 rw-p 0000d000 fd:00 11796784                       /lib64/libgcc_s-4.1.2-20080825.so.1
3cb2600000-3cb26e6000 r-xp 00000000 fd:00 18000239                       /usr/lib64/libstdc++.so.6.0.8
3cb26e6000-3cb28e5000 ---p 000e6000 fd:00 18000239                       /usr/lib64/libstdc++.so.6.0.8
3cb28e5000-3cb28eb000 r--p 000e5000 fd:00 18000239                       /usr/lib64/libstdc++.so.6.0.8
3cb28eb000-3cb28ee000 rw-p 000eb000 fd:00 18000239                       /usr/lib64/libstdc++.so.6.0.8
3cb28ee000-3cb2900000 rw-p 3cb28ee000 00:00 0
3ecfe00000-3ecfe82000 r-xp 00000000 fd:00 35454980                       /lib64/libm-2.5.so
3ecfe82000-3ed0081000 ---p 00082000 fd:00 35454980                       /lib64/libm-2.5.so
3ed0081000-3ed0082000 r--p 00081000 fd:00 35454980                       /lib64/libm-2.5.so
3ed0082000-3ed0083000 rw-p 00082000 fd:00 35454980                       /lib64/libm-2.5.so
2ab8011f6000-2ab8011f7000 rw-p 2ab8011f6000 00:00 0
2ab80120d000-2ab804213000 rw-p 2ab80120d000 00:00 0
7fffa88b2000-7fffa98b4000 rw-p 7ffffeffd000 00:00 0                      [stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]


Here the executable is start from 0x00400000, then 32M is allocated for the global bss section:
00601000-02601000 rw-p 00601000 00:00 0

After load the library at low address, the heap is started at 0x2ab80120d000, which is 48M:
2ab80120d000-2ab804213000

The stack is started at 7fffa98b4000, which is 16M:
7fffa88b2000-7fffa98b4000


Then VDSO(Virtual dynamic shared object):
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
the vdso mapped to linux-gate.so.1, which is a virtual shared object exposed by kernel. It is used to make virtual system call using sysenter/sysexit, and this is faster than regular interruption.

Tuesday, December 7, 2010

virtual memory vs. RSS in Linux

I wrote a small program to understand the difference between the VSZ and RSS. If you malloc, it only means that you can use the memory address. No real memory will be used until that page is accssed.


First I find out my page size using:
%getconf PAGE_SIZE
4096

(1)
size_t length=0x10000000; //256M ~const size_t pagesize=4096;
char* x=(char*)malloc(length);

after start the program:
%ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
user     25125  0.0  0.0 273552   912 pts/1    S+   10:54   0:00 a.out

Here the VSZ just mean the maxium address space you can use, and the real memory RSS is still 912K although the VSZ is 270M
(2) and then execute
for (int i=0; i<1000; i++)
      x[i]='a';

%ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
user     25395  0.0  0.0 273552   916 pts/1    S+   10:56   0:00 a.out

Only one page is written, so the RSS just changed for an extra one page space (916K now).

(3) and then execute to write/read one character in 1000 pages
   for (int i=0; i<1000; i++)
      x[i*pagesize]='a';


%ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
user     25285  0.0  0.2 273552  4912 pts/1    S+   10:55   0:00 a.out

Now every page written/read will be come the RSS (1000*4K+912K=4912K).