Truncate a string without ending in the middle of a word

I am looking for a way to truncate a string in Python that will not cut off the string in the middle of a word.

For example:

Original:          "This is really awesome."
"Dumb" truncate:   "This is real..."
"Smart" truncate:  "This is really..."

I'm looking for a way to accomplish the "smart" truncate from above.


Asked by: Tess542 | Posted: 28-01-2022






Answer 1

I actually wrote a solution for this on a recent project of mine. I've compressed the majority of it down to be a little smaller.

def smart_truncate(content, length=100, suffix='...'):
    if len(content) <= length:
        return content
    else:
        return ' '.join(content[:length+1].split(' ')[0:-1]) + suffix

What happens is the if-statement checks if your content is already less than the cutoff point. If it's not, it truncates to the desired length, splits on the space, removes the last element (so that you don't cut off a word), and then joins it back together (while tacking on the '...').

Answered by: Kirsten888 | Posted: 01-03-2022



Answer 2

Here's a slightly better version of the last line in Adam's solution:

return content[:length].rsplit(' ', 1)[0]+suffix

(This is slightly more efficient, and returns a more sensible result in the case there are no spaces in the front of the string.)

Answered by: Arthur634 | Posted: 01-03-2022



Answer 3

There are a few subtleties that may or may not be issues for you, such as handling of tabs (Eg. if you're displaying them as 8 spaces, but treating them as 1 character internally), handling various flavours of breaking and non-breaking whitespace, or allowing breaking on hyphenation etc. If any of this is desirable, you may want to take a look at the textwrap module. eg:

def truncate(text, max_size):
    if len(text) <= max_size:
        return text
    return textwrap.wrap(text, max_size-3)[0] + "..."

The default behaviour for words greater than max_size is to break them (making max_size a hard limit). You can change to the soft limit used by some of the other solutions here by passing break_long_words=False to wrap(), in which case it will return the whole word. If you want this behaviour change the last line to:

    lines = textwrap.wrap(text, max_size-3, break_long_words=False)
    return lines[0] + ("..." if len(lines)>1 else "")

There are a few other options like expand_tabs that may be of interest depending on the exact behaviour you want.

Answered by: Chelsea967 | Posted: 01-03-2022



Answer 4

def smart_truncate1(text, max_length=100, suffix='...'):
    """Returns a string of at most `max_length` characters, cutting
    only at word-boundaries. If the string was truncated, `suffix`
    will be appended.
    """

    if len(text) > max_length:
        pattern = r'^(.{0,%d}\S)\s.*' % (max_length-len(suffix)-1)
        return re.sub(pattern, r'\1' + suffix, text)
    else:
        return text

OR

def smart_truncate2(text, min_length=100, suffix='...'):
    """If the `text` is more than `min_length` characters long,
    it will be cut at the next word-boundary and `suffix`will
    be appended.
    """

    pattern = r'^(.{%d,}?\S)\s.*' % (min_length-1)
    return re.sub(pattern, r'\1' + suffix, text)

OR

def smart_truncate3(text, length=100, suffix='...'):
    """Truncates `text`, on a word boundary, as close to
    the target length it can come.
    """

    slen = len(suffix)
    pattern = r'^(.{0,%d}\S)\s+\S+' % (length-slen-1)
    if len(text) > length:
        match = re.match(pattern, text)
        if match:
            length0 = match.end(0)
            length1 = match.end(1)
            if abs(length0+slen-length) < abs(length1+slen-length):
                return match.group(0) + suffix
            else:
                return match.group(1) + suffix
    return text

Answered by: Lucas567 | Posted: 01-03-2022



Answer 5

>>> import textwrap
>>> textwrap.wrap('The quick brown fox jumps over the lazy dog', 12)
['The quick', 'brown fox', 'jumps over', 'the lazy dog']

You just take the first element of that and you're done...

Answered by: William927 | Posted: 01-03-2022



Answer 6

From Python 3.4+ you can use textwrap.shorten. With the OP example:

>>> import textwrap
>>> original = "This is really awesome."
>>> textwrap.shorten(original, width=20, placeholder="...")
'This is really...'

textwrap.shorten(text, width, **kwargs)

Collapse and truncate the given text to fit in the given width.

First the whitespace in text is collapsed (all whitespace is replaced by single spaces). If the result fits in the width, it is returned. Otherwise, enough words are dropped from the end so that the remaining words plus the placeholder fit within width:

Answered by: Ada970 | Posted: 01-03-2022



Answer 7

def smart_truncate(s, width):
    if s[width].isspace():
        return s[0:width];
    else:
        return s[0:width].rsplit(None, 1)[0]

Testing it:

>>> smart_truncate('The quick brown fox jumped over the lazy dog.', 23) + "..."
'The quick brown fox...'

Answered by: Kelvin936 | Posted: 01-03-2022



Answer 8

For Python 3.4+, I'd use textwrap.shorten.

For older versions:

def truncate(description, max_len=140, suffix='…'):    
    description = description.strip()
    if len(description) <= max_len:
        return description
    new_description = ''
    for word in description.split(' '):
      tmp_description = new_description + word
      if len(tmp_description) <= max_len-len(suffix):
          new_description = tmp_description + ' '
      else:
          new_description = new_description.strip() + suffix
          break
    return new_description

Answered by: William534 | Posted: 01-03-2022



Answer 9

In case you might actually prefer to truncate by full sentence rather than by word, here's something to start with:

def smart_truncate_by_sentence(content, length=100, suffix='...',):
    if not isinstance(content,str): return content
    if len(content) <= length:
        return content
    else:
        sentences=content.split('.')
        cs=np.cumsum([len(s) for s in sentences])
        n = max(1,  len(cs[cs<length]) )
        return '.'.join(sentences[:n])+ '. ...'*(n<len(sentences))

Answered by: Kirsten340 | Posted: 01-03-2022



Answer 10

C++ version:

string trim(string s, int k) {
    if (s.size()<=k) return s;
    while(k>=0 && s[k]!=' ')
        k--;
    if (k<0) return "";
    string res=s.substr(0, k+1);
    while(res.size() && (res.back()==' '))
        res.pop_back();
    return res;    
}

Answered by: Melissa280 | Posted: 01-03-2022



Similar questions

Python truncate a long string

How does one truncate a string to 75 characters in Python? This is how it is done in JavaScript: var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" var info = (data.length &gt; 75) ? data.s...


python - Pad or truncate string based on fixed length

Currently have code that looks something like; print '{: &lt;5}'.format('test') This will pad my string with ' ' if it is less than 5 characters. If the string is more than 5 characters, I'd need the string to be truncated. Without explicitly checking the length of my string before formatting it, is there a better way to pad if less than fixed length or truncate if gr...


file io - Python truncate lines as they are read

I have an application that reads lines from a file and runs its magic on each line as it is read. Once the line is read and properly processed, I would like to delete the line from the file. A backup of the removed line is already being kept. I would like to do something like file = open('myfile.txt', 'rw+') for line in file: processLine(line) file.truncate(line) This seems like a si...


Python truncate a long string

How does one truncate a string to 75 characters in Python? This is how it is done in JavaScript: var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" var info = (data.length &gt; 75) ? data.s...


Why truncate when we open a file in 'w' mode in python

I am going through Zed Shaw's Python Book. I am currently working on the opening and reading files chapters. I am wondering why we need to do a truncate, when we are already opening the file in a 'w' mode? print "Opening the file..." target = open(filename, 'w') print "Truncating the file. Goodbye!" target.truncate()


python - How to truncate the values of a 2D numpy array

I have a two-dimensional numpy array(uint16), how can I truncate all values above a certain barrier(say 255) to that barrier? The other values must stay the same. Using a nested loop seems to be ineffecient and clumsy.


python - How to truncate data in a dict so that the resulting JSON isn't longer than n bytes?

I have a python 2.7 dict such as {u"eat": u"糖果", u"drink": u"café"}, and I need to transfer it using JSON. The JSON string must be regular ASCII and it must be less than 256 chars. So far, I have coded this: import json def payload_to_json(payload, max_size = 256): while True: json_string = json.dumps(payload, separators = (',', ':')) if len(json_string) &lt;= ma...


How do i truncate url using python

This question already has answers here:


Python Syntax Truncate Error

I'm trying to set up a script that re-writes the interfaces file and eventually it will change the ip address to static, but when I run it I get an error the line that reads ' new_location_interfaces.truncate()' and it says that 'str' object has no attribute truncate. from sys import argv from os.path import exists import os script_name = argv print "You are currently running %s" % script_name print "Vers...


python - How can I truncate a table using pandas?

I have a function that is executed few times, each time it appends elements to a table on SQL Server using this code: import pandas as pd import pandas.io.sql as pdsql import pyodbc params = [(self.key[int(el[0])], bid, label, tr_date, el[1]) for el in elements] df = pd.DataFrame(params, columns=['ID', 'BID', 'Label', 'tr_date', 'Score']) engine = sqlalchemy.create_engine('mssql+pyodbc://MY-SERVER/Test') d...


python - Is it faster to truncate a list by making it equal to a slice, or by using del?

Suppose I have a list TruncList with some number of elements greater than n. If I want to remove n elements from the end of that list, is it faster to redefine the list as a slice of itself preserving the desired elements, as by TruncList = TruncList[:-n], or to delete the slice of unwanted elements from the list, as by


Truncate head of file in Python

How to truncate x bytes of the head file ? I have a log which has 5 GB and I want to cut first 3 GB ( remove old information) .


file io - Python truncate lines as they are read

I have an application that reads lines from a file and runs its magic on each line as it is read. Once the line is read and properly processed, I would like to delete the line from the file. A backup of the removed line is already being kept. I would like to do something like file = open('myfile.txt', 'rw+') for line in file: processLine(line) file.truncate(line) This seems like a si...


string - A Python buffer that you can truncate from the left?

Right now, I am buffering bytes using strings, StringIO, or cStringIO. But, I often need to remove bytes from the left side of the buffer. A naive approach would rebuild the entire buffer. Is there an optimal way to do this, if left-truncating is a very common operation? Python's garbage collector should actually GC the truncated bytes. Any sort of algorithm for this (keep the buffer in small pieces?), or an existi...


Python truncate a long string

How does one truncate a string to 75 characters in Python? This is how it is done in JavaScript: var data="saddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsaddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddsadddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd" var info = (data.length &gt; 75) ? data.s...


python - How to TRUNCATE TABLE using Django's ORM?

To empty a database table, I use this SQL Query: TRUNCATE TABLE `books` How to I truncate a table using Django's models and ORM? I've tried this, but it doesn't work: Book.objects.truncate()


python - Truncate a string in mako template

I'd like to find a way to get a title to truncate if too long, like this: 'this is a title' 'this is a very long title that ...' Is there a way to print a string in mako, and automatically truncate with "..." if greater than a certain number of characters? Thanks.


python truncate text around keyword

I have a string and I want to search it for a keyword or phrase and return only a portion of the text before and after the keyword or phrase. Google does exactly what I am talking about. Here is a string I grabbed from the web: "This filter truncates words like the original truncate words Django filter, but instead of being based on the number of words, it's based on the number of characters. I found t...


Why truncate when we open a file in 'w' mode in python

I am going through Zed Shaw's Python Book. I am currently working on the opening and reading files chapters. I am wondering why we need to do a truncate, when we are already opening the file in a 'w' mode? print "Opening the file..." target = open(filename, 'w') print "Truncating the file. Goodbye!" target.truncate()


python - How to truncate the time on a datetime object?

What is a classy way to way truncate a python datetime object? In this particular case, to the day. So basically setting hour, minute, seconds, and microseconds to 0. I would like the output to also be a datetime object, not a string.


python - How to truncate the values of a 2D numpy array

I have a two-dimensional numpy array(uint16), how can I truncate all values above a certain barrier(say 255) to that barrier? The other values must stay the same. Using a nested loop seems to be ineffecient and clumsy.


Truncate to three decimals in Python

How do I get 1324343032.324? As you can see below, the following do not work: &gt;&gt;1324343032.324325235 * 1000 / 1000 1324343032.3243253 &gt;&gt;int(1324343032.324325235 * 1000) / 1000.0 1324343032.3239999 &gt;&gt;round(int(1324343032.324325235 * 1000) / 1000.0,3) 1324343032.3239999 &gt;&gt;str(1324343032.3239999) '1324343032.32'






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top