Smart String Truncation in Lasso
I needed to generate some short intros/teasers from longer body text today, and not having anything readily at hand, decided to see what was available at tagSwap. Searching for “truncate” brought up two tags, one based on the other.
The first is [string_truncate] by John Burwell. It’s pretty straightforward. Pass in a string and the desired length, and it will return the string truncated to exactly the specified length, plus an optional string to indicate the continuation (for instance, an ellipsis). It checks to see if the source string is shorter than the given length, in which case it returns the string unaltered.
This looked like it would do the trick, but I decided to check out the other tag anyway. It was [gf_lowtext] by Gaetano Frascolla. Gaetano’s tag is based on John’s, but adds an additional check to see if the truncated string ends with a space. If not, it assumes a word is being split, and truncates an additional character until it reaches a space before returning the result.
It’s a nice enhancement, but only checking for a space presents some potential limitations. First of all, there are plenty of other whitespace characters that may break up words, and secondly, it doesn’t take punctuation and other special characters into consideration. I’d rather not see the result end with a comma or apostrophe, especially when followed by an ellipsis.
It seemed as if a little regex and some elbow grease might provide me with a few additional enhancements, so I grabbed Gaetano’s tag and got to work. The result is below:
define_tag(
'truncate',
-namespace='string_',
-req='text',
-req='length', -type='integer', -copy,
-priority='replace',
-encodenone,
-description='Truncates the given string to the given number of characters.'
);
// if the original string is shorter than or equal to the desired length,
// just return it unaltered.
#text->size <= #length ? return(#text);
local('out') = string;
// while #out is empty, #length is still greater than zero,
// and the last character of the new string is not whitespace...
while(!#out->size || !#out->iswhitespace(#out->size) && #length);
// store a new substring in #out
#out = #text->substring(1, #length);
// decrement #length by 1
#length -= 1;
/while;
// if we reached zero, return nothing
!#length ? return;
// remove any trailing non-alphanumeric characters and whitespace
#out = string_replaceregexp(
#out,
-find='[^A-Za-z0-9]*\\s*$',
-replace=''
);
// return the final result with an ellipsis character appended
return(#out + '…');
/define_tag;
The changes I made include:
- Using [string->iswhitespace] to check for any whitespace character (tabs, newlines, etc.) instead of just spaces.
- Returning null if there is no reasonable place to truncate the string within the desired length. This may be an unlikely edge case in normal usage, but without the additional check for #length in the [while], there is the potential to create an endless loop. (For instance, in my test code below.)
- Trimming not only the whitespace from the result, but also any non-alphanumeric characters. This took care of the “hanging punctuation” issue and seemed reasonable for English strings. Additional exceptions could be added for accented characters.
- Appending an HTML-encoded ellipsis character to the result automatically instead of a user supplied value and/or three periods. This is the only way I’ve ever wanted to show the continuation, so I didn’t bother making it optional.
To test the tag, I looped the length of a test string to see where it would break given every possible position:
var('str') = 'The quick, brown-fox jumps over the "lazy" dog.';
loop($str->size);
loop_count + ' - ' + string_truncate($str, loop_count) + '\n';
/loop;
…resulting in the following output:
1 -
2 -
3 -
4 - The…
5 - The…
6 - The…
7 - The…
8 - The…
9 - The…
10 - The…
11 - The quick…
12 - The quick…
13 - The quick…
14 - The quick…
15 - The quick…
16 - The quick…
17 - The quick…
18 - The quick…
19 - The quick…
20 - The quick…
21 - The quick, brown-fox…
22 - The quick, brown-fox…
23 - The quick, brown-fox…
24 - The quick, brown-fox…
25 - The quick, brown-fox…
26 - The quick, brown-fox…
27 - The quick, brown-fox jumps…
28 - The quick, brown-fox jumps…
29 - The quick, brown-fox jumps…
30 - The quick, brown-fox jumps…
31 - The quick, brown-fox jumps…
32 - The quick, brown-fox jumps over…
33 - The quick, brown-fox jumps over…
34 - The quick, brown-fox jumps over…
35 - The quick, brown-fox jumps over…
36 - The quick, brown-fox jumps over the…
37 - The quick, brown-fox jumps over the…
38 - The quick, brown-fox jumps over the…
39 - The quick, brown-fox jumps over the…
40 - The quick, brown-fox jumps over the…
41 - The quick, brown-fox jumps over the…
42 - The quick, brown-fox jumps over the…
43 - The quick, brown-fox jumps over the “lazy‚Ķ
44 - The quick, brown-fox jumps over the “lazy‚Ķ
45 - The quick, brown-fox jumps over the “lazy‚Ķ
46 - The quick, brown-fox jumps over the “lazy‚Ķ
47 - The quick, brown-fox jumps over the “lazy” dog.
I’m satisfied with the results so far, but of course suggestions are welcome.








Hey Jason,
Have a look at these tags I did a long time ago. They’re packaged with PageBlocks, but you can get the source from the online reference, and I’m pretty sure there’s nothing specific to PB in them.
Demos are here:
http://www.pageblocks.org/demo/getstring
Code is here:
http://www.pageblocks.org/refc/fwpStr_getLeft
http://www.pageblocks.org/refc/fwpStr_getRight
http://www.pageblocks.org/refc/fwpStr_getWords
http://www.pageblocks.org/refc/fwpStr_getSentences
http://www.pageblocks.org/refc/fwpStr_getParagraphs
They could probably be combined into one tag/type for a universal API. The coding style is pretty old, I think I originally wrote them back in LP5 days, and then re-released for LP8/PB5.
Hi Jason, how about an option to display a string like the Mac OS list file names when space is constrained. For example ‘The quick, b … e “lazy‚Äù dog.’.
I did some work on this a long time ago - but can I find it?