Retrieve the two highest item from a list containing 100,000 integers

How can retrieve the two highest item from a list containing 100,000 integers without having to sort the entire list first?


Asked by: Walter444 | Posted: 30-11-2021






Answer 1

Use heapq.nlargest. This is the most flexible approach in case you ever want to handle more than just the top two elements.

Here's an example.

>>> import heapq
>>> import random
>>> x = range(100000)
>>> random.shuffle(x)
>>> heapq.nlargest(2, x)
[99999, 99998]

Answered by: Kimberly703 | Posted: 01-01-2022



Answer 2

JacobM's answer is absolutely the way to go. However, there are a few things to keep in mind while implementing what he described. Here's a little play-along-at-home tutorial to guide you through the trickier parts of solving this problem.

If this code is meant for production use, please use one of the more efficient/concise answers listed. This answer is targetted at someone new to programming.

The idea

The idea is simple.

  • Keep two variables: largest and second_largest.
  • Go through the list.
    • If an item is greater than largest, assign it to largest.
    • If an item is greater than second_largest, but less than largest, assign it to second_largest.

Getting started

Let's start.

def two_largest(inlist):
    """Return the two largest items in the sequence. The sequence must
    contain at least two items."""
    for item in inlist:
        if item > largest:
            largest = item
        elif largest > item > second_largest:
            second_largest = item
    # Return the results as a tuple
    return largest, second_largest

# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
    inlist = [3, 2, 1]
    print two_largest(inlist)

Okay, we now have JacobM's answer as a Python function. What happens when we try to run it?

Traceback (most recent call last):
  File "twol.py", line 10, in <module>
    print two_largest(inlist)
  File "twol.py", line 3, in two_largest
    if item > largest:
UnboundLocalError: local variable 'largest' referenced before assignment

Apparently, we need to set largest before we start the loop. This probably means we should set second_largest too.

Initializing variables

Let's set largest and second_largest to 0.

def two_largest(inlist):
    """Return the two largest items in the sequence. The sequence must
    contain at least two items."""
    largest = 0 # NEW!
    second_largest = 0 # NEW!
    for item in inlist:
        if item > largest:
            largest = item
        elif largest > item > second_largest:
            second_largest = item
    # Return the results as a tuple
    return largest, second_largest

# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
    inlist = [3, 2, 1]
    print two_largest(inlist)

Good. Let's run it.

(3, 2)

Great! Now let's test with inlist being [1, 2, 3]

    inlist = [1, 2, 3] # CHANGED!

Let's try it.

(3, 0)

...Uh oh.

Fixing the logic

The largest value (3) seems correct. The second-largest value is completely wrong though. What's going on?

Let's work through what the function is doing.

  • When we start, largest is 0 and second_largest is also 0.
  • The first item in the list we look at is 1, so largest becomes 1.
  • The next item is 2, so largest becomes 2.

But what about second_largest?

When we assign a new value to largest, the largest value actually becomes second-largest. We need to show that in the code.

def two_largest(inlist):
    """Return the two largest items in the sequence. The sequence must
    contain at least two items."""
    largest = 0
    second_largest = 0
    for item in inlist:
        if item > largest:
            second_largest = largest # NEW!
            largest = item
        elif largest > item > second_largest:
            second_largest = item
    # Return the results as a tuple
    return largest, second_largest

# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
    inlist = [1, 2, 3]
    print two_largest(inlist)

Let's run it.

(3, 2)

Fantastic.

Initializing variables, part 2

Now let's try it with a list of negative numbers.

    inlist = [-1, -2, -3] # CHANGED!

Let's run it.

(0, 0)

That's not right at all. Where did these zeroes come from?

It turns out that the starting values for largest and second_largest were actually larger than all the items in the list. The first thing you might consider is setting largest and second_largest to the lowest values possible in Python. Unfortunately, Python doesn't have a smallest possible value. That means that, even if you set both of them to -1,000,000,000,000,000,000, you can have a list of values smaller than that.

So what's the best thing to do? Let's try setting largest and second_largest to the first and second items in the list. Then, to avoid double-counting any items in the list, we only look at the part of the list after the second item.

def two_largest(inlist):
    """Return the two largest items in the sequence. The sequence must
    contain at least two items."""
    largest = inlist[0] # CHANGED!
    second_largest = inlist[1] # CHANGED!
    # Only look at the part of inlist starting with item 2
    for item in inlist[2:]: # CHANGED!
        if item > largest:
            second_largest = largest
            largest = item
        elif largest > item > second_largest:
            second_largest = item
    # Return the results as a tuple
    return largest, second_largest

# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
    inlist = [-1, -2, -3]
    print two_largest(inlist)

Let's run it.

(-1, -2)

Great! Let's try with another list of negative numbers.

    inlist = [-3, -2, -1] # CHANGED!

Let's run it.

(-1, -3)

Wait, what?

Initializing variables, part 3

Let's step through our logic again.

  • largest is set to -3
  • second_largest is set to -2

Wait right there. Already, this seems wrong. -2 is larger than -3. Is this what caused the problem? Let's continue.

  • largest is set to -1; second_largest is set to the old value of largest, which is -3

Yes, that looks to be the problem. We need to ensure that largest and second_largest are set correctly.

def two_largest(inlist):
    """Return the two largest items in the sequence. The sequence must
    contain at least two items."""
    if inlist[0] > inlist[1]: # NEW
        largest = inlist[0]
        second_largest = inlist[1]
    else: # NEW
        largest = inlist[1] # NEW
        second_largest = inlist[0] # NEW
    # Only look at the part of inlist starting with item 2
    for item in inlist[2:]:
        if item > largest:
            second_largest = largest
            largest = item
        elif largest > item > second_largest:
            second_largest = item
    # Return the results as a tuple
    return largest, second_largest

# If we run this script, it will should find the two largest items and
# print those
if __name__ == "__main__":
    inlist = [-3, -2, -1]
    print two_largest(inlist)

Let's run it.

(-1, -2)

Excellent.

Conclusion

So here's the code, nicely commented and formatted. It's also had all the bugs I could find beaten from it. Enjoy.

However, assuming this really is a homework question, I hope you get some useful experience from seeing an imperfect piece of code slowly improved. I hope some of these techniques will be useful in future programming assignments.


Efficiency

Not very efficient. But for most purposes, it should be okay: on my computer (Core 2 Duo), a list of 100 000 items can be processed in 0.27 seconds (using timeit, averaged over 100 runs).

Answered by: Kevin899 | Posted: 01-01-2022



Answer 3

You iterate over the list, maintaining variables that contain the value of the highest and the second highest item encountered thus far. Each new item that is encountered will replace whichever of the two the new item is higher than (if any).

Answered by: Dominik576 | Posted: 01-01-2022



Answer 4

A really slick way is to use heapq. Heapify the array (O(n)), then just pop an many elements that you need (log(n)). (Saw this question in an interview once, good question to keep in mind.)

Answered by: Lily355 | Posted: 01-01-2022



Answer 5

"2 highest" is impossible; only one item can be "highest". Perhaps you mean "highest 2". In any case, you need to say what to do when the list contains duplicates. What do you want from [8, 9, 10, 10]: (10, 9) or (10, 10)? If your response is (10, 10), please consider input of [8, 9, 10, 10, 10]. What are you going to do with the "highest two" when you've got them? Please edit your question to give this guidance.

In the meantime, here's an answer that takes the first approach (two unique values):

largest = max(inlist)
second_largest = max(item for item in inlist if item < largest)

You should add guards against fewer than 2 unique values in the list.

Answered by: Nicole843 | Posted: 01-01-2022



Answer 6

This will work, but I don't know if you want to retain the items in the list:

max1 = max(myList)
myList.remove(max1)
max2 = max(myList)

If you do, you can do this:

max1 = max(myList)
idx1 = myList.index(max1)
myList.pop(idx1)

max2 = max(myList)
myList.insert(idx1,max1)

Answered by: Emily491 | Posted: 01-01-2022



Answer 7

Copy your List to List_copy. Retrieve the highest value and get its position by:

Highest_value = max(List_copy)
Highest_position = List_copy.index(max(List_copy))

Assign 0 to the Highest_value.

List_copy[Highest_position] = 0

And run your line again.

Second_Highest = max(List_copy)

Answered by: Lenny635 | Posted: 01-01-2022



Answer 8

I know this topic is old, but here is a simple solution to this problem. Tested against heapq.nlargest and this is a bit faster (no sorting needed):

Works for both positive and negative numbers.

Function below: Max time used: 0.12, max memory used: 29290496 heapq.nlargest: Max time used: 0.14, max memory used: 31088640

def two_highest_numbers(list_to_work):

    first = None
    second = None

    for number in list_to_work:
        if first is None:
            first = number
        elif number > first:
            second = first
            first = number
        else:
            if second is None:
                second = number
            elif number > second:
                second = number

return [first, second]

Answered by: Rafael546 | Posted: 01-01-2022



Answer 9

Sort the list, and if list is not null, extract last two element

>>> a=[0,6,8,5,10,5]
>>> a.sort()
>>> a
[0, 5, 5, 6, 8, 10]
>>> if a:
...  print a[-1],a[-2]
... 
10 8

Simple and most efficient:)

Now if sorting is not required, find max, remove max, find max again

>>> a=[0,6,8,5,10,5]
>>> max(a)
10
>>> a.remove(max(a))
>>> max(a)
8
>>> 

Of course, you will lose the original list but you can create a temporary list as well.

Answered by: Patrick290 | Posted: 01-01-2022



Answer 10

Iterating through the entire list is the only way to do it without sorting.

Answered by: Sienna515 | Posted: 01-01-2022



Answer 11

Without sorting the list the only way to really do it is to iterate through the whole list and save the highest two numbers. I think you'd be better off sorting the list.

Answered by: Anna489 | Posted: 01-01-2022



Answer 12

The second highest item is a fairly simple case, but for the kth highest item what you want is a selection algorithm. That page is pretty thorough so it's probably best just to read that.

Answered by: Edgar203 | Posted: 01-01-2022



Answer 13

The best time you can expect is linear, since you have to at least look through all the elements.

Here is my pseudocode to solve the problem:

//assume list has at least 2 elements
(max, nextMax) = if (list[0] > list[1])
                 then (list[0], list[1])
                 else (list[1], list[0])

for (2 <= i < length) {
    (max, nextMax) = if       (max < list[i])     => (list[i], max)
                     elseif   (nextMax < list[i]) => (max, list[i])
                     else     (no change)         => (max, nextMax)
}

return (max, nextMax)

Answered by: Miranda282 | Posted: 01-01-2022



Answer 14

Another solution that uses only base Python functions can be seen below:

>>> largest = max(lst)
>>> maxIndex = lst.index(largest)
>>> secondLargest = max(max(lst[:maxIndex]), max(lst[maxIndex+1:]))

If we split a list around its largest number, we know that the second largest number is either in the left half or the right half. So, we can trivially find the second largest number by simply finding the larger of the largest number in the left and right half of the list.

It's trivial to show this is O(n) time and O(1) space. We traverse the list once to find the largest element, then again to find the second largest. We only store the largest values themselves and the index of the largest value.

Answered by: Lenny322 | Posted: 01-01-2022



Similar questions

html - How can I retrieve the page title of a webpage using Python?

How can I retrieve the page title of a webpage (title html tag) using Python?


python - How to retrieve an element from a set without removing it?

Suppose the following: &gt;&gt;&gt; s = set([1, 2, 3]) How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it - something I can only be sure of after an asynchronous call to another host. Quick and dirty: &gt;&gt;&gt; elem = s.pop() &gt;&gt;&gt; s.add(elem)


sql server - Python: Retrieve Image from MSSQL

I'm working on a Python project that retrieves an image from MSSQL. My code is able to retrieve the images successfully but with a fixed size of 63KB. if the image is greater than that size, it just brings the first 63KB from the image! The following is my code: #!/usr/bin/python import _mssql mssql=_mssql.connect('&lt;ServerIP&gt;','&lt;UserID&gt;','&lt;Password&gt;') mssql.select_db('&lt;Database...


python - Best way to retrieve variable values from a text file?

Referring on this question, I have a similar -but not the same- problem.. On my way, I'll have some text file, structured like: var_a: 'home' var_b: 'car' var_c: 15.5 And I need that python read the file and then create a variable named var_a with value 'home', and so on. Example...


python - How to retrieve the selected text from the active window

I am trying to create a simple open source utility for windows using Python that can perform user-defined actions on the selected text of the currently active window. The utility should be activated using a pre-defined keyboard shortcut. Usage is partially outlined in the following example: The user selects some text using the mouse or the keyboard (in any application window)


python - How can I retrieve last x elements in Django

I am trying to retrieve the latest 5 posts (by post time) In the views.py, if I try blog_post_list = blogPosts.objects.all()[:5] It retreives the first 5 elements of the blogPosts objects, how can I reverse this to retreive the latest ones? Cheers


python - Retrieve module object from stack frame

Given a frame object, I need to get the corresponding module object. In other words, implement callers_module so this works: import sys from some_other_module import callers_module assert sys.modules[__name__] is callers_module() (That would be equivalent because I can generate a stack trace in the function for this test case. The imports are there simply to make that example complete an...


How do I retrieve Hotmail contacts with python

How can I retrieve contacts from hotmail with python? Is there any example?


linux - How to retrieve the process start time (or uptime) in python

How to retrieve the process start time (or uptime) in python in Linux? I only know, I can call "ps -p my_process_id -f" and then parse the output. But it is not cool.


c++ - How do I retrieve program output in Python?

I'm not a Perl user, but from this question deduced that it's exceedingly easy to retrieve the standard output of a program executed through a Perl script using something akin to: $version = `java -version`; How would I go about getting the same end result in Python? Does t...






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top