Numpy time based vector operations where state of preceding elements matters - are for loops appropriate?
What do numpy arrays provide when performing time based calculations where state matters. In other words, where what has occurred in earlier or later in a sequence is important.
Consider the following time based vectors,
TIME = np.array([0., 10., 20., 30., 40., 50., 60., 70., 80., 90.]) FLOW = np.array([100., 75., 60., 20.0, 60.0, 50.0, 20.0, 30.0, 20.0, 10.0]) TEMP = np.array([300., 310., 305., 300., 310., 305., 310., 305., 300., 295.0])
Let's say that an exponential decay in TEMP should be applied once the FLOW falls below 30 without raising again above 50. In the data above, a function would be applied at TIME=60 above and the last two values of TEMP would be updated by this secondary function which would began with the corresponding TEMP value.
There is a need to "look ahead" to determine if the FLOW rises above 50 in the elements after the <30 condition is requested. It does not seem that the numpy functions are aimed at time based vectors where state is important and the traditional method of nested for loops perhaps remains the way to go. But given given my newness to numpy and the fact that I have to perform alot of these types of state based manipulations, I would appreciate direction or affirmation.
Asked by: Kellan891 | Posted: 30-11-2021
While Joe Kington's answer is certainly correct (and quite flexible), it is rather more circuitous than need be. For someone trying to learn Numpy, I think a more direct route might be easier to understand.
As I noted under your question (and as Joe also noticed), there seems to be an inconsistency between your description of the code's behaviour and your example. Like Joe, I'm also going to assume you describe the correct behaviour.
A couple notes:
- Numpy works well with filter arrays to specify to which elements an operation should be applied. I make use of them several times.
np.flatnonzerofunction returns an array of indices specifying the locations at which the given array is non-zero (or True).
The code uses the example arrays you provided.
import numpy as np TIME = np.array([0., 10., 20., 30., 40., 50., 60., 70., 80., 90.]) FLOW = np.array([100., 75., 60., 20.0, 60.0, 50.0, 20.0, 30.0, 20.0, 10.0]) TEMP = np.array([300., 310., 305., 300., 310., 305., 310., 305., 300., 295.0]) last_high_flow_index = np.flatnonzero(FLOW > 50)[-1] low_flow_indices = np.flatnonzero(FLOW < 30) acceptable_low_flow_indices = low_flow_indices[low_flow_indices > last_high_flow_index] apply_after_index = acceptable_low_flow_indices
We now have the index after which the function should be applied to TEMP. If I am reading your question correctly, you would like the temperature to begin decaying once your condition is met. This can be done as follows:
time_delta = TIME[apply_after_index:] - TIME[apply_after_index] TEMP[apply_after_index:] = TEMP[apply_after_index:] * np.exp(-0.05 * time_delta)
TEMP has been updated, so
print TEMP outputs
[ 300. 310. 305. 300. 310. 305. 310. 184.99185121 110.36383235 65.82339724]
Alternatively, you can apply an arbitrary Python function to the appropriate elements by first vectorizing the function:
def myfunc(x): ''' a normal python function that acts on individual numbers''' return x + 3 myfunc_v = np.vectorize(myfunc)
and then updating the TEMP array:
Answered by: Justin144 | Posted: 01-01-2022
TEMP[apply_after:] = myfunc_v(TEMP[apply_after:])
You can certainly do this without nested loops in numpy. If you want to get really fancy, you could probably vectorize the entire thing, but it's probably the most readable to just vectorize it to the point that you only need one loop.
Generally speaking, try to vectorize things unless it becomes excessively clunky/unreadable or you're having memory usage issues. Then do it another way.
In some cases loops are more readable, and they'll commonly use less memory than vectorized expressions, but they're generally slower than vectorized expressions.
You'd probably be surprised at how flexible the various indexing tricks are, though. It's rare that you have to use loops to do a calculation, but they often wind up being more readable in complex cases.
However, I'm slightly confused by what you state as the correct case... You say that you want to apply a function to portions of temp where the flow falls below 30 without rising above 50. By this logic, the function should be applied to the last 4 elements of the temp array. However, you state that it should only be applied to the last two... I'm confused! I'm going to go with my reading of things and have it applied to the last 4 elements of the array...
Here's how I would do it. This uses random data rather than your data so that there are multiple regions...
Note that there aren't any nested loops, and we're only iterating over the number of contiguous regions in the array where your "asymmetric" threshold conditions are met (i.e. there's only one iteration, in this case).
import numpy as np import matplotlib.pyplot as plt def main(): num = 500 flow = np.cumsum(np.random.random(num) - 0.5) temp = np.cumsum(np.random.random(num) - 0.5) temp -= temp.min() - 10 time = np.linspace(0, 10, num) low, high = -1, 1 # For regions of "flow" where flow drops below low and thereafter # stays below high... for start, stop in asymmetric_threshold_regions(flow, low, high): # Apply an exponential decay function to temp... t = time[start:stop+1] - time[start] temp[start:stop+1] = temp[start] * np.exp(-0.5 * t) plot(flow, temp, time, low, high) def contiguous_regions(condition): """Finds contiguous True regions of the boolean array "condition". Returns a 2D array where the first column is the start index of the region and the second column is the end index.""" # Find the indicies of changes in "condition" d = np.diff(condition) idx, = d.nonzero() if condition: # If the start of condition is True prepend a 0 idx = np.r_[0, idx] if condition[-1]: # If the end of condition is True, append the length of the array idx = np.r_[idx, len(condition)-1] # Reshape the result into two columns idx.shape = (-1,2) return idx def asymmetric_threshold_regions(x, low, high): """Returns an iterator over regions where "x" drops below "low" and doesn't rise above "high".""" # Start with contiguous regions over the high threshold... for region in contiguous_regions(x < high): start, stop = region # Find where "x" drops below low within these below_start, = np.nonzero(x[start:stop] < low) # If it does, start at where "x" drops below "low" instead of where # it drops below "high" if below_start.size > 0: start += below_start yield start, stop def plot(flow, temp, time, low, high): fig = plt.figure() ax1 = fig.add_subplot(2,1,1) ax1.plot(time, flow, 'g-') ax1.set_ylabel('Flow') ax1.axhline(y=low, color='b') ax1.axhline(y=high, color='g') ax1.text(time.min()+1, low, 'Low Threshold', va='top') ax1.text(time.min()+1, high, 'High Threshold', va='bottom') ax2 = fig.add_subplot(2,1,2, sharex=ax1) ax2.plot(time, temp, 'b-') ax2.set_ylabel('Temp') ax2.set_xlabel('Time (s)') for start, stop in asymmetric_threshold_regions(flow, low, high): ax1.axvspan(time[start], time[stop], color='r', alpha=0.5) ax2.axvspan(time[start], time[stop], color='r', alpha=0.5) plt.setp(ax1.get_xticklabels(), visible=False) plt.show() if __name__ == '__main__': main()
Answered by: Dominik376 | Posted: 01-01-2022
python - Which of these scripting languages is more appropriate for pen-testing?
Closed. This question is opinion-based. It is not c...
Help me find an appropriate ruby/python parser generator
The first parser generator I've worked with was Parse::RecDescent, and the guides/tutorials available for it were great, but the most useful feature it has was it's debugging tools, specifically the tracing capabilities ( activated by setting $RD_TRACE to 1 ). I am looking for a parser generator that can help you debug it's rules. The thing is, it has to be written in python or in ruby, and have a verbose mo...
Would python be an appropriate choice for a video library for home use software
I am thinking of creating a video library software which keep track of all my videos and keep track of videos that I already haven't watched and stats like this. The stats will be specific to each user using the software. My question is, is python appropriate to create this software or do I need something like c++.
python - How do I determine the appropriate check interval?
I'm just starting to work on a tornado application that is having some CPU issues. The CPU time will monotonically grow as time goes by, maxing out the CPU at 100%. The system is currently designed to not block the main thread. If it needs to do something that blocks and asynchronous drivers aren't available, it will spawn another thread to do the blocking operation. Thus we have the main thread being almost tot...
arrays - Most appropriate data structure (Python)
I'm new to Python and have what is probably a very basic question about the 'best' way to store data in my code. Any advice much appreciated! I have a long .csv file in the following format: Scenario,Year,Month,Value 1,1961,1,0.5 1,1961,2,0.7 1,1961,3,0.2 etc. My scenario values run from 1 to 100, year goes from 1961 to 1990 and month goes from 1 to 12. My file therefore has 100*29...
python extend or append a list when appropriate
Is there a simple way to append a list if X is a string, but extend it if X is a list? I know I can simply test if an object is a string or list, but I was wondering if there is a quicker way than this?
python - Finding appropriate cut-off values
I try to implement Hampel tanh estimators to normalize highly asymmetric data. In order to do this, I need to perform the following calculation: Given x - a sorted list of numbers and...
python - What is an appropriate way to datamine the total number of results of a keyword search?
newbie programmer and lurker here, hoping for some sensible advice. :) Using a combination of Python, BeautifulSoup, and the Bing API, I was able to find what I wanted with the following code: import urllib2 from BeautifulSoup import BeautifulStoneSoup Appid = #My Appid query = #My query soup = BeautifulStoneSoup(urllib2.urlopen("http://api.search.live.net/xml.aspx?Appid=" + Appid + "&query=" ...
python - Defining appropriate number of processes
I have a python code treating a lot of apache logs (decompress, parse, crunching numbers, regexping etc). One parent process which takes a list of files (up to few millions), and sends a list of files to parse to workers, using multiprocess pool. I wonder, if there is any guidelines / benchmarks / advices which can help me to estimate ideal number of child process ? Ie. having one process per core...