The damned thing won't work. I thought I had it solved, but noooo... if you have some experience with C, here's the code fragment:


typedef tagTOKEN {
	char * szContent;
	...   /* the other members are unimportant */
} TOKEN;

int f(char *filename, ...){
	TOKEN current;
	char *buff, *nstr = "\0";
	char c;
	FILE *src;

	src = fopen(filename, "r");
	...
	/* do some internal config */
	buff=strdup(nstr);

	while ((c=fgetc(src)) != EOF){
		if (c=='<')
			if(strlen(buff)>1){
				current.szContent = buff;
				buff=strdup(nstr);
				/* add current TOKEN to a stream */
				current.nTokenID = TKN_MARKUP;
			}
			else
				current.nTokenID = TKN_MARKUP;
		
		buff=strncat(buff, &c, 1);
		
		if (c=='>' && strlen(buff)>1){
			current.szContent = buff;
			buff=strdup(nstr);
			/* add current TOKEN to a stream */
			current.nTokenID = TKN_TEXT;
		}
	}
}

This code fragment reads an HTML file and breaks it up into units, either containing markup code or text. Now, the problem is when I compile the program (this is just a fragment of the whole code), this particular fragment cracks up. buff works okay; outputting it produces the correct string. But when I change current.szContent to point to the same place as buff (as the code does), the string suddenly gets truncated to 16 characters. Weird. Or maybe I'm doing the wrong thing here.

I know, I know. I could have used yacc/lex/flex/bison to do the job of creating the tokenizer and parser, but I want to do it on my own, to try things out. I'm using a MinGW compiler, v.2.95. I'm not sure about the build, though; I'm typing this in a Net cafe. But if you're willing to help, please, please help.

Previously: Coding, part deux