Best way to create a NumPy array from a dictionary?

I'm just starting with NumPy so I may be missing some core concepts...

What's the best way to create a NumPy array from a dictionary whose values are lists?

Something like this:

d = { 1: [10,20,30] , 2: [50,60], 3: [100,200,300,400,500] }

Should turn into something like:

data = [
  [10,20,30,?,?],
  [50,60,?,?,?],
  [100,200,300,400,500]
]

I'm going to do some basic statistics on each row, eg:

deviations = numpy.std(data, axis=1)

Questions:

  • What's the best / most efficient way to create the numpy.array from the dictionary? The dictionary is large; a couple of million keys, each with ~20 items.

  • The number of values for each 'row' are different. If I understand correctly numpy wants uniform size, so what do I fill in for the missing items to make std() happy?

Update: One thing I forgot to mention - while the python techniques are reasonable (eg. looping over a few million items is fast), it's constrained to a single CPU. Numpy operations scale nicely to the hardware and hit all the CPUs, so they're attractive.


Asked by: Emma125 | Posted: 30-11-2021






Answer 1

You don't need to create numpy arrays to call numpy.std(). You can call numpy.std() in a loop over all the values of your dictionary. The list will be converted to a numpy array on the fly to compute the standard variation.

The downside of this method is that the main loop will be in python and not in C. But I guess this should be fast enough: you will still compute std at C speed, and you will save a lot of memory as you won't have to store 0 values where you have variable size arrays.

  • If you want to further optimize this, you can store your values into a list of numpy arrays, so that you do the python list -> numpy array conversion only once.
  • if you find that this is still too slow, try to use psycho to optimize the python loop.
  • if this is still too slow, try using Cython together with the numpy module. This Tutorial claims impressive speed improvements for image processing. Or simply program the whole std function in Cython (see this for benchmarks and examples with sum function )
  • An alternative to Cython would be to use SWIG with numpy.i.
  • if you want to use only numpy and have everything computed at C level, try grouping all the records of same size together in different arrays and call numpy.std() on each of them. It should look like the following example.

example with O(N) complexity:

import numpy
list_size_1 = []
list_size_2 = []
for row in data.itervalues():
    if len(row) == 1:
      list_size_1.append(row)
    elif len(row) == 2:
      list_size_2.append(row)
list_size_1 = numpy.array(list_size_1)
list_size_2 = numpy.array(list_size_2)
std_1 = numpy.std(list_size_1, axis = 1)
std_2 = numpy.std(list_size_2, axis = 1)

Answered by: Nicole997 | Posted: 01-01-2022



Answer 2

While there are already some pretty reasonable ideas present here, I believe following is worth mentioning.

Filling missing data with any default value would spoil the statistical characteristics (std, etc). Evidently that's why Mapad proposed the nice trick with grouping same sized records. The problem with it (assuming there isn't any a priori data on records lengths is at hand) is that it involves even more computations than the straightforward solution:

  1. at least O(N*logN) 'len' calls and comparisons for sorting with an effective algorithm
  2. O(N) checks on the second way through the list to obtain groups(their beginning and end indexes on the 'vertical' axis)

Using Psyco is a good idea (it's strikingly easy to use, so be sure to give it a try).

It seems that the optimal way is to take the strategy described by Mapad in bullet #1, but with a modification - not to generate the whole list, but iterate through the dictionary converting each row into numpy.array and performing required computations. Like this:

for row in data.itervalues():
    np_row = numpy.array(row)    
    this_row_std = numpy.std(np_row)
    # compute any other statistic descriptors needed and then save to some list

In any case a few million loops in python won't take as long as one might expect. Besides this doesn't look like a routine computation, so who cares if it takes extra second/minute if it is run once in a while or even just once.


A generalized variant of what was suggested by Mapad:

from numpy import array, mean, std

def get_statistical_descriptors(a):
    if ax = len(shape(a))-1
    functions = [mean, std]
    return f(a, axis = ax) for f in functions


def process_long_list_stats(data):
    import numpy

    groups = {}

    for key, row in data.iteritems():
        size = len(row)
        try:
            groups[size].append(key)
        except KeyError:
            groups[size] = ([key])

    results = []

    for gr_keys in groups.itervalues():             
        gr_rows = numpy.array([data[k] for k in gr_keys])       
        stats = get_statistical_descriptors(gr_rows)                
        results.extend( zip(gr_keys, zip(*stats)) )

    return dict(results)

Answered by: Aldus447 | Posted: 01-01-2022



Answer 3

numpy dictionary

You can use a structured array to preserve the ability to address a numpy object by a key, like a dictionary.

import numpy as np


dd = {'a':1,'b':2,'c':3}
dtype = eval('[' + ','.join(["('%s', float)" % key for key in dd.keys()]) + ']')
values = [tuple(dd.values())]
numpy_dict = np.array(values, dtype=dtype)

numpy_dict['c']

will now output

array([ 3.])

Answered by: Freddie856 | Posted: 01-01-2022



Similar questions

python - How to create a dictionary from a couple of tuples of the same size?

Consider I have the two following tuples: keys=("second", "minute", "hour", "day") values=(1, 60, 60, 24) I would like to create a dictionary that has the keys tuple as keys and the values tuple as values. Here's my naive way of doing it: d={} for i in xrange(len(keys)): d[keys[i]] = values[i] Is there an easier more elegant w...


dictionary - python create new dict key from the value

I have a array of hashes like: detail = [{'name': 'Adam'}, {'name': 'Jackie'}] Now what I want to do is create a new dict like: {'name' : 'Sandra'} What I did was: for i in detail: for key_in_i in i: dict(key_in_i = 'Sandra') What I would like to get is {'Name': 'Sandra'}. But if I do this I am get...


Python - If given dictionary create a list of keys in order of the values

I have a dictionary that looks like the below. ex1_pattern = {'ex':0,'country':1,'dow':2,'hod':3,'adx':4,'vid1':5} I would like to create a lists of the keys e.g. ex1_pattern.keys() but..I would like the list to be in the order of the ranks. e.g.: [ex,country,dow,hod,adx,vid1] What is the most time efficient means to do...


python - Create list of tuples from dictionary?

I have a simple dictionary: {"keyy":{"key": "value", "cey": "value"}, "kaye":{"key": "value"}} That I want to encode as a "context-aware" tuple: [("keyy","key","value"), ("keyy","cey","value"), ("kaye","key","value")] My attempt with ValueError: too many values to unpack: if __name__=='__main__': mydict={"keyy":{"key": "value",...


python - How to create a dictionary from file?

I want to create a dictionary with values from a file. The problem is that it would have to be read line by line to be added to the dictionary because I don't think I have enough memory to load in all the information to be appended to the dictionary. The key can be default but the value will be one selected from each line in the file. The file is not csv but I always split the lines so I can be able to s...


python - Create matrix from list of values within Dictionary

I wish to turn the following dictionary into a matrix where the first and second values of the dictionary are the column and row values. Where the matrix is true I want there to be a '1' and when it is false I want a '0'. {0: [2, 5.0], 1: [6, 7.0], 2: [6, 8.0], 3: [5, 6.0], 4: [1, 5.0], 5: [3, 4.0], 6: [4, 5.0]} The desired output would look something like this 1 2 3 4 5 6...


python - trying to create a dictionary from a text file

fieldict(filename) reads a file in DOT format and returns a dictionary with the DOT CMPLID, converted to an integer, as the key, and a tuple as the corresponding value for that key. The format of the tuple is: (manufacturer, date, crash, city, state) fieldict("DOT500.txt")[416] ('DAIMLERCHRYSLER CORPORATION', datetime.date(1995, 1, 9), False, 'ARCADIA', so far, I have tried ...


python - Create a dictionary with a string with colons

Let's say there's a string, s, which looks like this: s = 'Title: A title Date: November 23 1234 Other: Other information' Is it possible to create a dictionary which would be: {'Title':'A title','Date':'November 23 1234','Other':'Other information'} At first I thought simply by splitting it where the colons were, but then, not knowing what the values for ...


python - Create a list of keys given a value in a dictionary


python - How to create a List of Dictionary Values?

How would I make a list of Values of a dictionary? For example, if I had {1:Apple,2:Red,3:Purple,4:Green} How would I receive a list of ['Apple','Red','Purple','Green'] After fooling around on python for an hour I still can't seem to figure it out. Thanks for any and all help.


python - How to create this dictionary?

I have a dict object, and a list containing lists, and want to combine the keys of the dict with the lists from the list, however the result always comes out in a (predetermined) mess. my_dict = {(255,0,0):'red', (0,255,0):'green', (0,0,255):'blue'} my_list = [[1,2,3], [4,5,6], [7,8,9]] new_dict = dict(zip(my_dict, mylist)) As you might guess, this is not the act...


python - how to create a dictionary from a file?

I'm trying to write a Python code that will allow me to take in text, and read it line by line. In each line, the words just go into the dictionary as a key and the numbers should be the assigned values, as a list. the file 'topics.txt' will be composed of hundreds of lines that have the same format as this: 1~cocoa 2~ 3~ 4~ 5~grain~wheat~corn~barley~oat~sorghum 6~veg-oil~linseed~lin-oil~soy-oil~sun-oil~s...


python - Create a dictionary from a list

If I have a large list that I want to create a dictionary out of, what would be the most efficient way of doing this supposing I just want to assign the value as so: {'item1':'0','item2':'1','item3':'2','itemn':'n-1'} I've seen a lot on here about just assigning same value to all the keys, but nothing about how to assign the values as I need. Thanks. EDIT: ...


python - How to create a dictionary using a set of items as values from model

I'm a newb at python and would appreciate some help please... I have 2 models each with fields that i want to use as "values" for a dictionary i'm constructing. The 2nd class file is a sub table of the first one. And i want to pass the values from the models as item_sets. class One(Models.model): id = models.Autofield(Primary_key=True, null=True) type = models.CharField(max_length=50, null=...


python - How do I create add new items to a dictionary while in a loop?

I'm writing a program that reads names and statistics related to those names from a file. Each line of the file is another person and their stats. For each person, I'd like to make their last name a key and everything else linked to that key in the dictionary. The program first stores data from the file in an array and then I'm trying to get those array elements into the dictionary, but I'm not sure how to do that. Plus I'...


python - trying to create a dictionary from a text file but

so, I have text file (a paragraph) and I need to read the file and create a dictionary containing each different word from the file as a key and the corresponding value for each key will be an integer showing the frequency of the word in the text file. an example of what the dictionary should look like: {'and':2, 'all':1, 'be':1, 'is':3} etc. so far I have this, def create_wo...


python - Create dictionary and see if key always has same value

If I had a file of lines starting with a number followed by some text, how could I see if the numbers are always followed by different text? For example: 0 Brucella abortus Brucellaceae 0 Brucella ceti Brucellaceae 0 Brucella canis Brucellaceae 0 Brucella ceti Brucellaceae So here, I'd like to know that 0 is followed by 3 different "types" of text. Ideally I could read...


How do I use a python dictionary to create a csv file?

I have consulted the official documentation and even with a few tests, I do not know how to create a csv file from a list. I looked at this post about lists and csv as well as this post about dictionaries and csv. I honestly don't un...


python - Create dictionary from 2 lists

My task: I have 2 lists different size(1,2,3,4 and "qwe","asd","zxc" for example). How I can create dictionary with this lists with next condition: if count of keys more than values- dict[key]=None. If count of values more-just inore it. My code: list1=[1,2,3,4] list2=["qwe","asd","zxc"] dictx={} for x in range(len(list1)): if x>len(list2): dictx[list1[x]]=None else: dictx[list1[x]]=list2[x]


python - Create dictionary using 3 lines as values

I have fastq files I wish to parse in. Below shows an example of 1 'read' of thousands in each file: @PSI179204_0037:4:1:2139:945#0/2 AGAGATCCTACGGGAGGCAGCAGTGAGGAATATTGGTCAATGGGCGCGAGCCTGAACCAGCCAAGTAGCGTGAGGGACGACTGCCCTACGGGTTGTAAACCTCTTTTGTTCGGGAATAAAGTGCGGCACGCGTGCCGGTTTGTATGTCCCGTTCGAATAG +PSI179204_0037:4:1:2139:945#0/2 ghhhhhhhhhhhfhdhhhfhhhhhgeeghhhdghfgheh[hhfhfhhhhehghffcahhhhfgcfgeaegd_ah_aaOa[a[...


python - List all words in a dictionary that start with <user input>

How would a go about making a program where the user enters a string, and the program generates a list of words beginning with that string? Ex: User: "abd" Program:abdicate, abdomen, abduct... Thanks! Edit: I'm using python, but I assume that this is a fairly language-independent problem.


python, dictionary and int error

I have a very frustrating python problem. In this code fixedKeyStringInAVar = "SomeKey" def myFunc(a, b): global sleepTime global fixedKeyStringInAVar varMe=int("15") sleepTime[fixedKeyStringInAVar] = varMe*60*1000 #more code Now this works. BUT sometimes when I run this function I get TypeError: 'int' object does not support item assignment


python - List a dictionary

In a list appending is possible. But how I achieve appending in dictionary? Symbols from __ctype_tab.o: Name Value Class Type Size Line Section __ctype |00000000| D | OBJECT|00000004| |.data __ctype_tab |00000000| r | OBJECT|00000101| |.rodata Symbols from _ashldi3.o: Name Value Class ...


python - How to filter a dictionary by value?

Newbie question here, so please bear with me. Let's say I have a dictionary looking like this: a = {"2323232838": ("first/dir", "hello.txt"), "2323221383": ("second/dir", "foo.txt"), "3434221": ("first/dir", "hello.txt"), "32232334": ("first/dir", "hello.txt"), "324234324": ("third/dir", "dog.txt")} I want all values that are equal to each other to be moved into...


Python and dictionary like object

I need a python 3.1 deep update function for dictionaries (a function that will recursively update child dictionaries that are inside a parent dictionary). But I think, in the future, my function could have to deal with objects that behave like dictionaries but aren't. And furthermore I want to avoid using isinstance and type (because they are considered b...


python - Remove dictionary from list

If I have a list of dictionaries, say: [{'id': 1, 'name': 'paul'}, {'id': 2, 'name': 'john'}] and I would like to remove the dictionary with id of 2 (or name 'john'), what is the most efficient way to go about this programmatically (that is to say, I don't know the index of the entry in the list so it can't simply be popped).


C# way to mimic Python Dictionary Syntax

Is there a good way in C# to mimic the following python syntax: mydict = {} mydict["bc"] = {} mydict["bc"]["de"] = "123"; # &lt;-- This line mydict["te"] = "5"; # &lt;-- While also allowing this line In other words, I'd like something with [] style access that can return either another dictionary or a string type, depending on how it has been set. I've been trying to work...


python - Can a dictionary be passed to django models on create?

Is it possible to do something similar to this with a list, dictionary or something else? data_dict = { 'title' : 'awesome title', 'body' : 'great body of text', } Model.objects.create(data_dict) Even better if I can extend it: Model.objects.create(data_dict, extra='hello', extra2='world')


python - Make Dictionary From 2 List

This question already has answers here:


Python dictionary simple way to add a new key value pair

Say you have, foo = 'bar' d = {'a-key':'a-value'} And you want d = {'a-key':'a-value','foo':'bar'} e = {'foo':foo} I know you can do, d['foo'] = foo #Either of the following for e e = {'foo':foo} e = dict(foo=foo) But, in all these way to add the variable foo to dict, I have had to use the word foo twice; onc...


sorting - In Python, how can you easily retrieve sorted items from a dictionary?

Dictionaries unlike lists are not ordered (and do not have the 'sort' attribute). Therefore, you can not rely on getting the items in the same order when first added. What is the easiest way to loop through a dictionary containing strings as the key value and retrieving them in ascending order by key? For example, you had this: d = {'b' : 'this is b', 'a': 'this is a' , 'c' : 'this is c'}


Python dictionary from an object's fields

Do you know if there is a built-in function to build a dictionary from an arbitrary object? I'd like to do something like this: &gt;&gt;&gt; class Foo: ... bar = 'hello' ... baz = 'world' ... &gt;&gt;&gt; f = Foo() &gt;&gt;&gt; props(f) { 'bar' : 'hello', 'baz' : 'world' } NOTE: It should not include methods. Only fields.


python - How do you retrieve items from a dictionary in the order that they're inserted?

Is it possible to retrieve items from a Python dictionary in the order that they were inserted?


python - How can I make a dictionary from separate lists of keys and values?

I want to combine these: keys = ['name', 'age', 'food'] values = ['Monty', 42, 'spam'] Into a single dictionary: {'name': 'Monty', 'age': 42, 'food': 'spam'}


python - Dictionary or If statements, Jython

I am writing a script at the moment that will grab certain information from HTML using dom4j. Since Python/Jython does not have a native switch statement I decided to use a whole bunch of if statements that call the appropriate method, like below: if type == 'extractTitle': extractTitle(dom) if type == 'extractMetaTags': extractMetaTags(dom)


Is a Python dictionary an example of a hash table?

One of the basic data structures in Python is the dictionary, which allows one to record "keys" for looking up "values" of any type. Is this implemented internally as a hash table? If not, what is it?


python - Is there a "one-liner" way to get a list of keys from a dictionary in sorted order?

The list sort() method is a modifier function that returns None. So if I want to iterate through all of the keys in a dictionary I cannot do: for k in somedictionary.keys().sort(): dosomething() Instead, I must: keys = somedictionary.keys() keys.sort() for k in keys: dosomething() Is there a pretty way to iterate t...


python - Interface to versioned dictionary

I have an versioned document store which I want to access through an dict like interface. Common usage is to access the latest revision (get, set, del), but one should be able to access specific revisions too (keys are always str/unicode or int). from UserDict import DictMixin class VDict(DictMixin): def __getitem__(self, key): if isinstance(key, tuple): docid, rev = key e...


python - List all words in a dictionary that start with <user input>

How would a go about making a program where the user enters a string, and the program generates a list of words beginning with that string? Ex: User: "abd" Program:abdicate, abdomen, abduct... Thanks! Edit: I'm using python, but I assume that this is a fairly language-independent problem.


python - Check if a given key already exists in a dictionary and increment it

How do I find out if a key in a dictionary has already been set to a non-None value? I want to increment the value if there's already one there, or set it to 1 otherwise: my_dict = {} if my_dict[key] is not None: my_dict[key] = 1 else: my_dict[key] += 1






Still can't find your answer? Check out these communities...



PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python



top