User:Syaghmour/Text Processing Instructions
Text Processing Instructions
[edit | edit source]SSE 4.2 adds four string text processing instructions PCMPISTRI
, PCMPISTRM
, PCMPESTRI
and PCMPESTRM
. These instructions take three parameters, arg1
an xmm register, arg2
an xmm or a 128-bit memory location and IMM8
an 8-bit immediate control byte. These instructions will perform arithmetic comparison between the packed contents of arg1
and arg2
. IMM8
specifies the format of the input/output as well as the operation of two intermediate stages of processing. The results of stage 1 and stage 2 of intermediate processing will be referred to as IntRes1
and IntRes2
respectively. These instructions also provide additional information about the result through overload use of the arithmetic flags(AF
, CF
, OF
, PF
, SF
and ZF
).
The instructions proceed in multiple steps:
arg1
andarg2
are compared- An aggregation operation is applied to the result of the comparison with the result flowing into
IntRes1
- An optional negation is performed with the result flowing into
IntRes2
- An output in the form of an index(in
ECX
) or a mask(inXMM0
) is produced
IMM8 control byte description
[edit | edit source]IMM8 control byte is split into four group of bit fields that control the following settings:
IMM8[1:0]
specifies the format of the 128-bit source data(arg1
andarg2
):IMM8[1:0] Description 00b unsigned bytes(16 packed unsigned bytes) 01b unsigned words(8 packed unsigned words) 10b signed bytes(16 packed signed bytes) 11b signed words(8 packed signed words) IMM8[3:2]
specifies the aggregation operation whose result will be placed in intermediate result 1, which we will refer to asIntRes1
. The size ofIntRes1
will depend on the format of the source data, 16-bit for packed bytes and 8-bit for packed words:IMM8[3:2] Description 00b Equal Any, arg1 is a character set, arg2 is the string to search in. IntRes1[i] is set to 1 if arg2[i] is in the set represented by arg1: arg1 = "aeiou" arg2 = "Example string 1" IntRes1 = 1010001000010000
01b Ranges, arg1 is a set of character ranges i.e. "09az" means all characters from 0 to 9 and from a to z., arg2 is the string to search over. IntRes1[i] is set to 1 if arg[i] is in any of the ranges represented by arg1: arg1 = "09az" arg2 = "Testing 1 2 3, T" IntRes1 = 0111111010101000
10b Equal Each, arg1 is string one and arg2 is string two. IntRes1[i] is set to 1 if arg1[i] == arg2[i]: arg1 = "The quick brown " arg2 = "The quack green " IntRes1 = 1111110111010011
11b Equal Ordered, arg1 is a substring string to search for, arg2 is the string to search within. IntRes1[i] is set to 1 if the substring arg1 can be found at position arg2[i]: arg1 = "he" arg2 = ", he helped her " IntRes1 = 0010010000001000
IMM8[5:4]
specifies the polarity or the processing ofIntRes1
, into intermediate result 2, which will be referred to asIntRes2
:IMM8[5:4] Description 00b Positive Polarity IntRes2 = IntRes1 01b Negative Polarity IntRes2 = -1 XOR IntRes1 10b Masked Positive IntRes2 = IntRes1 11b Masked Negative IntRes2 = IntRes1 if reg/mem[i] is invalid else ~IntRes1 IMM8[6]
specifies the output selection, or howIntRes2
will be processed into the output. ForPCMPESTRI
andPCMPISTRI
, the output is an index into the data currently referenced byarg2
:IMM8[6] Description 0b Least Significant Index ECX contains the least significant set bit in IntRes2 1b Most Significant Index ECX contains the least significant set bit in IntRes2 - For
PCMPESTRM
andPCMPISTRM
, the output is a mask reflecting all the set bits inIntRes2
:IMM8[6] Description 0b Least Significant Index Bit Mask, the least significant bits of XMM0 contain the IntRes2 16(8) bit mask. XMM0 is zero extended to 128-bits. 1b Most Significant Index Byte/Word Mask, XMM0 contains IntRes2 expanded into byte/word mask IMM8[7]
should be set to zero since it has no designed meaning.
The Four Instructions
[edit | edit source]pcmpistri IMM8, arg1, arg2 | GAS Syntax |
pcmpistri arg2, arg1, IMM8 | Intel Syntax |
PCMPISTRI
, Packed Compare Implicit Length Strings, Return Index. Compares strings of implicit length and generates index in ECX
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Modified flags
CF
is reset ifIntRes2
is zero, set otherwiseZF
is set if a null terminating character is found inarg2
, reset otherwiseSF
is set if a null terminating character is found inarg1
, reset otherwiseOF
is set toIntRes2[0]
AF
is resetPF
is reset
Example
;
; nasm -felf32 -g sse4_2StrPcmpistri.asm -l sse4_2StrPcmpistri.lst
; gcc -o sse4_2StrPcmpistri sse4_2StrPcmpistri.o
;
global main
extern printf
extern strlen
extern strcmp
section .data
align 4
;
; Fill buf1 with a repeating pattern of ABCD
;
buf1: times 10 dd 0x44434241
s1: db "This is a string", 0
s2: db "This is a string slightly different string", 0
s3: db "This is a str", 0
fmtStr1: db "String: %s len: %d", 0x0A, 0
fmtStr1b: db "strlen(3): String: %s len: %d", 0x0A, 0
fmtStr2: db "s1: =%s= and s2: =%s= compare: %d", 0x0A, 0
fmtStr2b: db "strcmp(3): s1: =%s= and s2: =%s= compare: %d", 0x0A, 0
;
; Functions will follow the cdecl call convention
;
section .text
main: ; Using main since we are using gcc to link
sub esp, -16 ; 16 byte align the stack
sub esp, 16 ; space for four 4 byte parameters
;
; Null terminate buf1, make it proper C string, length is now 39
;
mov [buf1+39], byte 0x00
lea eax, [buf1]
mov [esp], eax ; Arg1: pointer of string to calculate the length of
mov ebx, eax ; Save pointer in ebx since we will use it again
call strlenSSE42
mov edx, eax ; Copy length of arg1 into edx
mov [esp+8], edx ; Arg3: length of string
mov [esp+4], ebx ; Arg2: pointer to string
lea eax, [fmtStr1]
mov [esp], eax ; Arg1: pointer to format string
call printf ; Call printf(3):
; int printf(const char *format, ...);
lea eax, [buf1]
mov [esp], eax ; Arg1: pointer of string to calculate the length of
mov ebx, eax ; Save pointer in ebx since we will use it again
call strlen ; Call strlen(3):
; size_t strlen(const char *s);
mov edx, eax ; Copy length of arg1 into edx
mov [esp+8], edx ; Arg3: length of string
mov [esp+4], ebx ; Arg2: pointer to string
lea eax, [fmtStr1b]
mov [esp], eax ; Arg1: pointer to format string
call printf ; Call printf(3):
; int printf(const char *format, ...);
lea eax, [s2]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmpSSE42
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s2]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2]
mov [esp], eax ; Arg1: pointer to format string
call printf
lea eax, [s2]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmp ; Call strcmp(3):
; int strcmp(const char *s1, const char *s2);
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s2]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2b]
mov [esp], eax ; Arg1: pointer to format string
call printf
lea eax, [s3]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmpSSE42
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s3]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2]
mov [esp], eax ; Arg1: pointer to format string
call printf
lea eax, [s3]
mov [esp+4], eax ; Arg2: pointer to second string to compare
lea eax, [s1]
mov [esp], eax ; Arg1: pointer to first string to compare
call strcmp ; Call strcmp(3):
; int strcmp(const char *s1, const char *s2);
mov [esp+12], eax ; Arg4: result from strcmpSSE42
lea eax, [s3]
mov [esp+8], eax ; Arg3: pointer to second string
lea eax, [s1]
mov [esp+4], eax ; Arg2: pointer to first string
lea eax, [fmtStr2b]
mov [esp], eax ; Arg1: pointer to format string
call printf
call exit
;
; size_t strlen(const char *s);
;
strlenSSE42:
push ebp
mov ebp, esp
mov edx, [ebp+8] ; Arg1: copy s(pointer to string) to edx
;
; We are looking for null terminating char, so set xmm0 to zero
;
pxor xmm0, xmm0
mov eax, -16 ; Avoid extra jump in main loop
strlenLoop:
add eax, 16
;
; IMM8[1:0] = 00b
; Src data is unsigned bytes(16 packed unsigned bytes)
; IMM8[3:2] = 10b
; We are using Equal Each aggregation
; IMM8[5:4] = 00b
; Positive Polarity, IntRes2 = IntRes1
; IMM8[6] = 0b
; ECX contains the least significant set bit in IntRes2
;
pcmpistri xmm0,[edx+eax], 0001000b
;
; Loop while ZF != 0, which means none of bytes pointed to by edx+eax
; are zero.
;
jnz strlenLoop
;
; ecx will contain the offset from edx+eax where the first null
; terminating character was found.
;
add eax, ecx
pop ebp
ret
;
; int strcmp(const char *s1, const char *s2);
;
strcmpSSE42:
push ebp
mov ebp, esp
mov eax, [ebp+8] ; Arg1: copy s1(pointer to string) to eax
mov edx, [ebp+12] ; Arg2: copy s2(pointer to string) to edx
;
; Subtract s2(edx) from s1(eax). This admititedly looks odd, but we
; can now use edx to index into s1 and s2. As we adjust edx to move
; forward into s2, we can then add edx to eax and this will give us
; the comparable offset into s1 i.e. if we take edx + 16 then:
;
; edx = edx + 16 = edx + 16
; eax+edx = eax -edx + edx + 16 = eax + 16
;
; therefore edx points to s2 + 16 and eax + edx points to s1 + 16.
; We thus only need one index, convoluted but effective.
;
sub eax, edx
sub edx, 16 ; Avoid extra jump in main loop
strcmpLoop:
add edx, 16
movdqu xmm0, [edx]
;
; IMM8[1:0] = 00b
; Src data is unsigned bytes(16 packed unsigned bytes)
; IMM8[3:2] = 10b
; We are using Equal Each aggregation
; IMM8[5:4] = 01b
; Negative Polarity, IntRes2 = -1 XOR IntRes1
; IMM8[6] = 0b
; ECX contains the least significant set bit in IntRes2
;
pcmpistri xmm0, [edx+eax], 0011000b
;
; Loop while ZF=0 and CF=0:
;
; 1) We find a null in s1(edx+eax) ZF=1
; 2) We find a char that does not match CF=1
;
ja strcmpLoop
;
; Jump if CF=1, we found a mismatched char
;
jc strcmpDiff
;
; We terminated loop due to a null character i.e. CF=0 and ZF=1
;
xor eax, eax ; They are equal so return zero
jmp exitStrcmp
strcmpDiff:
add eax, edx ; Set offset into s1 to match s2
;
; ecx is offset from current poition where two strings do not match,
; so copy the respective non-matching byte into eax and edx and fill
; in remaining bits w/ zero.
;
movzx eax, byte[eax+ecx]
movzx edx, byte[edx+ecx]
;
; If s1 is less than s2 return integer less than zero, otherwise return
; integer greater than zero.
;
sub eax, edx
exitStrcmp:
pop ebp
ret
exit:
;
; Call exit(3) syscall
; void exit(int status)
;
mov ebx, 0 ; Arg one: the status
mov eax, 1 ; Syscall number:
int 0x80
Expected output:
String: ABCDABCDABCDABCDABCDABCDABCDABCDABCDABC len: 39
strlen(3): String: ABCDABCDABCDABCDABCDABCDABCDABCDABCDABC len: 39
s1: =This is a string= and s2: =This is a string slightly different string= compare: -32
strcmp(3): s1: =This is a string= and s2: =This is a string slightly different string= compare: -32
s1: =This is a string= and s2: =This is a str= compare: 105
strcmp(3): s1: =This is a string= and s2: =This is a str= compare: 105
pcmpistrm IMM8, arg1, arg2 | GAS Syntax |
pcmpistrm arg2, arg1, IMM8 | Intel Syntax |
PCMPISTRM
, Packed Compare Implicit Length Strings, Return Mask. Compares strings of implicit length and generates a mask stored in XMM0
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Modified flags
CF
is reset ifIntRes2
is zero, set otherwiseZF
is set if a null terminating character is found inarg2
, reset otherwiseSF
is set if a null terminating character is found inarg2
, reset otherwiseOF
is set toIntRes2[0]
AF
is resetPF
is reset
pcmpestri IMM8, arg1, arg2 | GAS Syntax |
pcmpestri arg2, arg1, IMM8 | Intel Syntax |
PCMPESTRI
, Packed Compare Explicit Length Strings, Return Index. Compares strings of explicit length and generates index in ECX
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Implicit Operands
EAX
holds the length ofarg1
EDX
holds the length ofarg2
Modified flags
CF
is reset ifIntRes2
is zero, set otherwiseZF
is set ifEDX
is < 16(for bytes) or 8(for words), reset otherwiseSF
is set ifEAX
is < 16(for bytes) or 8(for words), reset otherwiseOF
is set toIntRes2[0]
AF
is resetPF
is reset
pcmpestrm IMM8, arg1, arg2 | GAS Syntax |
pcmpestrm arg2, arg1, IMM8 | Intel Syntax |
PCMPESTRM
, Packed Compare Explicit Length Strings, Return Mask. Compares strings of explicit length and generates a mask stored in XMM0
.
Operands
arg1
- XMM Register
arg2
- XMM Register
- Memory
IMM8
- 8-bit Immediate value
Implicit Operands
EAX
holds the length ofarg1
EDX
holds the length ofarg2
Modified flags
CF
is reset ifIntRes2
is zero, set otherwiseZF
is set ifEDX
is < 16(for bytes) or 8(for words), reset otherwiseSF
is set ifEAX
is < 16(for bytes) or 8(for words), reset otherwiseOF
is set toIntRes2[0]
AF
is resetPF
is reset