User:Syaghmour/Text Processing Instructions

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Text Processing Instructions[edit | edit source]

SSE 4.2 adds four string text processing instructions PCMPISTRI, PCMPISTRM, PCMPESTRI and PCMPESTRM. These instructions take three parameters, arg1 an xmm register, arg2 an xmm or a 128-bit memory location and IMM8 an 8-bit immediate control byte. These instructions will perform arithmetic comparison between the packed contents of arg1 and arg2. IMM8 specifies the format of the input/output as well as the operation of two intermediate stages of processing. The results of stage 1 and stage 2 of intermediate processing will be referred to as IntRes1 and IntRes2 respectively. These instructions also provide additional information about the result through overload use of the arithmetic flags(AF, CF, OF, PF, SF and ZF).

The instructions proceed in multiple steps:

  1. arg1 and arg2 are compared
  2. An aggregation operation is applied to the result of the comparison with the result flowing into IntRes1
  3. An optional negation is performed with the result flowing into IntRes2
  4. An output in the form of an index(in ECX) or a mask(in XMM0) is produced

IMM8 control byte description[edit | edit source]

IMM8 control byte is split into four group of bit fields that control the following settings:

  1. IMM8[1:0] specifies the format of the 128-bit source data(arg1 and arg2):
    IMM8[1:0] Description
    00b unsigned bytes(16 packed unsigned bytes)
    01b unsigned words(8 packed unsigned words)
    10b signed bytes(16 packed signed bytes)
    11b signed words(8 packed signed words)
  2. IMM8[3:2] specifies the aggregation operation whose result will be placed in intermediate result 1, which we will refer to as IntRes1. The size of IntRes1 will depend on the format of the source data, 16-bit for packed bytes and 8-bit for packed words:
    IMM8[3:2] Description
    00b Equal Any, arg1 is a character set, arg2 is the string to search in. IntRes1[i] is set to 1 if arg2[i] is in the set represented by arg1:
                  arg1    = "aeiou"
                  arg2    = "Example string 1"
                  IntRes1 =  1010001000010000
    
    01b Ranges, arg1 is a set of character ranges i.e. "09az" means all characters from 0 to 9 and from a to z., arg2 is the string to search over. IntRes1[i] is set to 1 if arg[i] is in any of the ranges represented by arg1:
                  arg1    = "09az"
                  arg2    = "Testing 1 2 3, T"
                  IntRes1 =  0111111010101000
    
    10b Equal Each, arg1 is string one and arg2 is string two. IntRes1[i] is set to 1 if arg1[i] == arg2[i]:
                  arg1    = "The quick brown "
                  arg2    = "The quack green "
                  IntRes1 =  1111110111010011
    
    11b Equal Ordered, arg1 is a substring string to search for, arg2 is the string to search within. IntRes1[i] is set to 1 if the substring arg1 can be found at position arg2[i]:
                  arg1    = "he"
                  arg2    = ", he helped her "
                  IntRes1 =  0010010000001000
    
  3. IMM8[5:4] specifies the polarity or the processing of IntRes1, into intermediate result 2, which will be referred to as IntRes2:
    IMM8[5:4] Description
    00b Positive Polarity IntRes2 = IntRes1
    01b Negative Polarity IntRes2 = -1 XOR IntRes1
    10b Masked Positive IntRes2 = IntRes1
    11b Masked Negative IntRes2 = IntRes1 if reg/mem[i] is invalid else ~IntRes1
  4. IMM8[6] specifies the output selection, or how IntRes2 will be processed into the output. For PCMPESTRI and PCMPISTRI, the output is an index into the data currently referenced by arg2:
    IMM8[6] Description
    0b Least Significant Index ECX contains the least significant set bit in IntRes2
    1b Most Significant Index ECX contains the least significant set bit in IntRes2
  5. For PCMPESTRM and PCMPISTRM, the output is a mask reflecting all the set bits in IntRes2:
    IMM8[6] Description
    0b Least Significant Index Bit Mask, the least significant bits of XMM0 contain the IntRes2 16(8) bit mask. XMM0 is zero extended to 128-bits.
    1b Most Significant Index Byte/Word Mask, XMM0 contains IntRes2 expanded into byte/word mask
  6. IMM8[7] should be set to zero since it has no designed meaning.

The Four Instructions[edit | edit source]

pcmpistri IMM8, arg1, arg2 GAS Syntax
pcmpistri arg2, arg1, IMM8 Intel Syntax

PCMPISTRI, Packed Compare Implicit Length Strings, Return Index. Compares strings of implicit length and generates index in ECX.

Operands

arg1

  • XMM Register

arg2

  • XMM Register
  • Memory

IMM8

  • 8-bit Immediate value

Modified flags

  1. CF is reset if IntRes2 is zero, set otherwise
  2. ZF is set if a null terminating character is found in arg2, reset otherwise
  3. SF is set if a null terminating character is found in arg1, reset otherwise
  4. OF is set to IntRes2[0]
  5. AF is reset
  6. PF is reset

Example

;
; nasm -felf32 -g sse4_2StrPcmpistri.asm -l sse4_2StrPcmpistri.lst
; gcc -o sse4_2StrPcmpistri sse4_2StrPcmpistri.o
;
global main 

extern printf
extern strlen
extern strcmp

section .data
	align 4
	;
	; Fill buf1 with a repeating pattern of ABCD
	;
	buf1:		times 10 dd 0x44434241
	s1:		db "This is a string", 0
	s2:		db "This is a string slightly different string", 0
	s3:		db "This is a str", 0
	fmtStr1:	db "String: %s len: %d", 0x0A, 0
	fmtStr1b:	db "strlen(3): String: %s len: %d", 0x0A, 0
	fmtStr2:	db "s1: =%s= and s2: =%s= compare: %d", 0x0A, 0
	fmtStr2b:	db "strcmp(3): s1: =%s= and s2: =%s= compare: %d", 0x0A, 0

;
; Functions will follow the cdecl call convention
;
section .text
	main:			; Using main since we are using gcc to link

	sub	esp, -16	; 16 byte align the stack
	sub	esp, 16		; space for four 4 byte parameters

	;
	; Null terminate buf1, make it proper C string, length is now 39
	;
	mov	[buf1+39], byte 0x00

	lea	eax, [buf1]
	mov	[esp], eax	; Arg1: pointer of string to calculate the length of
	mov	ebx, eax	; Save pointer in ebx since we will use it again
	call	strlenSSE42
	mov	edx, eax	; Copy length of arg1 into edx
	
	mov	[esp+8], edx	; Arg3: length of string
	mov	[esp+4], ebx	; Arg2: pointer to string
	lea	eax, [fmtStr1]
	mov	[esp], eax	; Arg1: pointer to format string
	call	printf		; Call printf(3):
				;	int printf(const char *format, ...);

	lea	eax, [buf1]
	mov	[esp], eax	; Arg1: pointer of string to calculate the length of
	mov	ebx, eax	; Save pointer in ebx since we will use it again
	call	strlen		; Call strlen(3):
				;	size_t strlen(const char *s);
	mov	edx, eax	; Copy length of arg1 into edx
	
	mov	[esp+8], edx	; Arg3: length of string
	mov	[esp+4], ebx	; Arg2: pointer to string
	lea	eax, [fmtStr1b]
	mov	[esp], eax	; Arg1: pointer to format string
	call	printf		; Call printf(3):
				;	int printf(const char *format, ...);

	lea	eax, [s2]
	mov	[esp+4], eax	; Arg2: pointer to second string to compare
	lea	eax, [s1]
	mov	[esp], eax	; Arg1: pointer to first string to compare
	call	strcmpSSE42

	mov	[esp+12], eax	; Arg4: result from strcmpSSE42  
	lea	eax, [s2]
	mov	[esp+8], eax	; Arg3: pointer to second string
	lea	eax, [s1]
	mov	[esp+4], eax	; Arg2: pointer to first string
	lea	eax, [fmtStr2]
	mov	[esp], eax	; Arg1: pointer to format string
	call	printf

	lea	eax, [s2]
	mov	[esp+4], eax	; Arg2: pointer to second string to compare
	lea	eax, [s1]
	mov	[esp], eax	; Arg1: pointer to first string to compare
	call	strcmp		; Call strcmp(3):
				;	int strcmp(const char *s1, const char *s2);

	mov	[esp+12], eax	; Arg4: result from strcmpSSE42  
	lea	eax, [s2]
	mov	[esp+8], eax	; Arg3: pointer to second string
	lea	eax, [s1]
	mov	[esp+4], eax	; Arg2: pointer to first string
	lea	eax, [fmtStr2b]
	mov	[esp], eax	; Arg1: pointer to format string
	call	printf

	lea	eax, [s3]
	mov	[esp+4], eax	; Arg2: pointer to second string to compare
	lea	eax, [s1]
	mov	[esp], eax	; Arg1: pointer to first string to compare
	call	strcmpSSE42

	mov	[esp+12], eax	; Arg4: result from strcmpSSE42  
	lea	eax, [s3]
	mov	[esp+8], eax	; Arg3: pointer to second string
	lea	eax, [s1]
	mov	[esp+4], eax	; Arg2: pointer to first string
	lea	eax, [fmtStr2]
	mov	[esp], eax	; Arg1: pointer to format string
	call	printf

	lea	eax, [s3]
	mov	[esp+4], eax	; Arg2: pointer to second string to compare
	lea	eax, [s1]
	mov	[esp], eax	; Arg1: pointer to first string to compare
	call	strcmp		; Call strcmp(3):
				;	int strcmp(const char *s1, const char *s2);

	mov	[esp+12], eax	; Arg4: result from strcmpSSE42  
	lea	eax, [s3]
	mov	[esp+8], eax	; Arg3: pointer to second string
	lea	eax, [s1]
	mov	[esp+4], eax	; Arg2: pointer to first string
	lea	eax, [fmtStr2b]
	mov	[esp], eax	; Arg1: pointer to format string
	call	printf

	call	exit


;
; size_t strlen(const char *s);
;
strlenSSE42:
	push	ebp
	mov	ebp, esp

	mov	edx, [ebp+8]	; Arg1: copy s(pointer to string) to edx 
	;
	; We are looking for null terminating char, so set xmm0 to zero
	;
	pxor	xmm0, xmm0
	mov	eax, -16	; Avoid extra jump in main loop

strlenLoop:
	add	eax, 16
	;
	; IMM8[1:0]	= 00b
	;	Src data is unsigned bytes(16 packed unsigned bytes)
	; IMM8[3:2]	= 10b
	; 	We are using Equal Each aggregation
	; IMM8[5:4]	= 00b
	;	Positive Polarity, IntRes2	= IntRes1
	; IMM8[6]	= 0b
	;	ECX contains the least significant set bit in IntRes2
	;
	pcmpistri	xmm0,[edx+eax], 0001000b
	;
	; Loop while ZF != 0, which means none of bytes pointed to by edx+eax
	; are zero.
	;
	jnz	strlenLoop
	
	;
	; ecx will contain the offset from edx+eax where the first null
	; terminating character was found.
	;
	add	eax, ecx
	pop	ebp
	ret

;
; int strcmp(const char *s1, const char *s2);
;
strcmpSSE42:
	push	ebp
	mov	ebp, esp

	mov	eax, [ebp+8]	; Arg1: copy s1(pointer to string) to eax
	mov	edx, [ebp+12]	; Arg2: copy s2(pointer to string) to edx
	;
	; Subtract s2(edx) from s1(eax). This admititedly looks odd, but we
	; can now use edx to index into s1 and s2. As we adjust edx to move
	; forward into s2, we can then add edx to eax and this will give us
	; the comparable offset into s1 i.e. if we take edx + 16 then:
	;
	;	edx 	= edx + 16		= edx + 16
	;	eax+edx	= eax -edx + edx + 16	= eax + 16
	;
	; therefore edx points to s2 + 16 and eax + edx points to s1 + 16.
	; We thus only need one index, convoluted but effective.
	;
	sub	eax, edx
	sub	edx, 16		; Avoid extra jump in main loop

strcmpLoop:
	add	edx, 16
	movdqu	xmm0, [edx]
	;
	; IMM8[1:0]	= 00b
	;	Src data is unsigned bytes(16 packed unsigned bytes)
	; IMM8[3:2]	= 10b
	; 	We are using Equal Each aggregation
	; IMM8[5:4]	= 01b
	;	Negative Polarity, IntRes2	= -1 XOR IntRes1
	; IMM8[6]	= 0b
	;	ECX contains the least significant set bit in IntRes2
	;
	pcmpistri	xmm0, [edx+eax], 0011000b
	;
	; Loop while ZF=0 and CF=0:
	;
	;	1) We find a null in s1(edx+eax) ZF=1
	;	2) We find a char that does not match CF=1
	;
	ja	strcmpLoop

	;
	; Jump if CF=1, we found a mismatched char
	;
	jc	strcmpDiff

	;
	; We terminated loop due to a null character i.e. CF=0 and ZF=1
	;
	xor	eax, eax	; They are equal so return zero
	jmp	exitStrcmp

strcmpDiff:
	add	eax, edx	; Set offset into s1 to match s2
	;
	; ecx is offset from current poition where two strings do not match,
	; so copy the respective non-matching byte into eax and edx and fill
	; in remaining bits w/ zero.
	;
	movzx	eax, byte[eax+ecx]
	movzx	edx, byte[edx+ecx]
	;
	; If s1 is less than s2 return integer less than zero, otherwise return
	; integer greater than zero.
	;
	sub	eax, edx

exitStrcmp:
	pop	ebp
	ret

exit:
				;
				; Call exit(3) syscall
				;	void exit(int status)
				;
	mov	ebx, 0		; Arg one: the status
	mov	eax, 1		; Syscall number:
	int 	0x80

Expected output:

String: ABCDABCDABCDABCDABCDABCDABCDABCDABCDABC len: 39
strlen(3): String: ABCDABCDABCDABCDABCDABCDABCDABCDABCDABC len: 39
s1: =This is a string= and s2: =This is a string slightly different string= compare: -32
strcmp(3): s1: =This is a string= and s2: =This is a string slightly different string= compare: -32
s1: =This is a string= and s2: =This is a str= compare: 105
strcmp(3): s1: =This is a string= and s2: =This is a str= compare: 105


pcmpistrm IMM8, arg1, arg2 GAS Syntax
pcmpistrm arg2, arg1, IMM8 Intel Syntax

PCMPISTRM, Packed Compare Implicit Length Strings, Return Mask. Compares strings of implicit length and generates a mask stored in XMM0.

Operands

arg1

  • XMM Register

arg2

  • XMM Register
  • Memory

IMM8

  • 8-bit Immediate value


Modified flags

  1. CF is reset if IntRes2 is zero, set otherwise
  2. ZF is set if a null terminating character is found in arg2, reset otherwise
  3. SF is set if a null terminating character is found in arg2, reset otherwise
  4. OF is set to IntRes2[0]
  5. AF is reset
  6. PF is reset


pcmpestri IMM8, arg1, arg2 GAS Syntax
pcmpestri arg2, arg1, IMM8 Intel Syntax

PCMPESTRI, Packed Compare Explicit Length Strings, Return Index. Compares strings of explicit length and generates index in ECX.

Operands

arg1

  • XMM Register

arg2

  • XMM Register
  • Memory

IMM8

  • 8-bit Immediate value


Implicit Operands

  • EAX holds the length of arg1
  • EDX holds the length of arg2


Modified flags

  1. CF is reset if IntRes2 is zero, set otherwise
  2. ZF is set if EDX is < 16(for bytes) or 8(for words), reset otherwise
  3. SF is set if EAX is < 16(for bytes) or 8(for words), reset otherwise
  4. OF is set to IntRes2[0]
  5. AF is reset
  6. PF is reset


pcmpestrm IMM8, arg1, arg2 GAS Syntax
pcmpestrm arg2, arg1, IMM8 Intel Syntax

PCMPESTRM, Packed Compare Explicit Length Strings, Return Mask. Compares strings of explicit length and generates a mask stored in XMM0.

Operands

arg1

  • XMM Register

arg2

  • XMM Register
  • Memory

IMM8

  • 8-bit Immediate value


Implicit Operands

  • EAX holds the length of arg1
  • EDX holds the length of arg2


Modified flags

  1. CF is reset if IntRes2 is zero, set otherwise
  2. ZF is set if EDX is < 16(for bytes) or 8(for words), reset otherwise
  3. SF is set if EAX is < 16(for bytes) or 8(for words), reset otherwise
  4. OF is set to IntRes2[0]
  5. AF is reset
  6. PF is reset