What's the best way to calculate a 3D (or n-D) centroid?

As part of a project at work I have to calculate the centroid of a set of points in 3D space. Right now I'm doing it in a way that seems simple but naive -- by taking the average of each set of points, as in:

centroid = average(x), average(y), average(z)

where x, y and z are arrays of floating-point numbers. I seem to recall that there is a way to get a more accurate centroid, but I haven't found a simple algorithm for doing so. Anyone have any ideas or suggestions? I'm using Python for this, but I can adapt examples from other languages.

Asked by: Victoria965 | Posted: 01-10-2021

Answer 1

Contrary to the common refrain here, there are different ways to define (and calculate) a center of a point cloud. The first and most common solution has been suggested by you already and I will not argue that there is anything wrong with this:

centroid = average(x), average(y), average(z)

The "problem" here is that it will "distort" your center-point depending on the distribution of your points. If, for example, you assume that all your points are within a cubic box or some other geometric shape, but most of them happen to be placed in the upper half, your center-point will also shift in that direction.

As an alternative you could use the mathematical middle (the mean of the extrema) in each dimension to avoid this:

middle = middle(x), middle(y), middle(z)

You can use this when you don't care much about the number of points, but more about the global bounding box, because that's all this is - the center of the bounding box around your points.

Lastly, you could also use the median (the element in the middle) in each dimension:

median = median(x), median(y), median(z)

Now this will sort of do the opposite to the middle and actually help you ignore outliers in your point cloud and find a centerpoint based on the distribution of your points.

A more and robust way to find a "good" centerpoint might be to ignore the top and bottom 10% in each dimension and then calculate the average or median. As you can see you can define the centerpoint in different ways. Below I am showing you examples of 2 2D point clouds with these suggestions in mind.

The dark blue dot is the average (mean) centroid. The median is shown in green. And the middle is shown in red. In the second image you will see exactly what I was talking about earlier: The green dot is "closer" to the densest part of the point cloud, while the red dot is further way from it, taking into account the most extreme boundaries of the point cloud.

enter image description here enter image description here

Answered by: Adelaide449 | Posted: 02-11-2021

Answer 2

Nope, that is the only formula for the centroid of a collection of points. See Wikipedia: http://en.wikipedia.org/wiki/Centroid

Answered by: Anna586 | Posted: 02-11-2021

Answer 3

You vaguely mention "a way to get a more accurate centroid". Maybe you're talking about a centroid that isn't affected by outliers. For example, the average household income in the USA is probably very high, because a small number of very rich people skew the average; they are the "outliers". For that reason, statisticians use the median instead. One way to obtain the median is to sort the values, then pick the value halfway down the list.

Maybe you're looking for something like this, but for 2D or 3D points. The problem is, in 2D and higher, you can't sort. There's no natural order. Nevertheless, there are ways to get rid of outliers.

One way is to find the convex hull of the points. The convex hull has all the points on the "outside" of the set of points. If you do this, and throw out the points that are on the hull, you'll be throwing out the outliers, and the points that remain will give a more "representative" centroid. You can even repeat this process several times, and the result is kind like peeling an onion. In fact, it's called "convex hull peeling".

Answered by: Miller932 | Posted: 02-11-2021

Answer 4

you can use increase accuracy summation - Kahan summation - was that what you had in mind?

Answered by: Leonardo488 | Posted: 02-11-2021

Answer 5

Potentially more efficient: if you're calculating this multiple times, you can speed this up quite a bit by keeping two standing variables

N  # number of points
sums = dict(x=0,y=0,z=0)  # sums of the locations for each point

then changing N and sums whenever points are created or destroyed. This changes things from O(N) to O(1) for calculations at the cost of more work every time a point is created, moves, or is destroyed.

Answered by: John847 | Posted: 02-11-2021

Answer 6

A "more accurate centroid" I believe centroid is defined the way you calculated it hence there can be no "more accurate centroid".

Answered by: Brooke705 | Posted: 02-11-2021

Answer 7

Yes that is the correct formula.

If you have a large number of points you can exploit the symmetry of the problem (be it cylindrical, spherical, mirror). Otherwise, you can borrow from statistics and average a random number of the points and just have a bit of error.

Answered by: Kelsey518 | Posted: 02-11-2021

Answer 8

If your n-dimensional vector is in a list [[a0, a1, ..., an],[b0, b1, ..., bn],[c0, c1, ..., cn]], just convert the list to array, and than calculate the centroid like this:

import numpy as np

vectors = np.array(Listv)
centroid = np.mean(vectors, axis=0)

Answered by: Fenton548 | Posted: 02-11-2021

Answer 9

You got it. What you are calculating is the centroid, or the mean vector.

Answered by: Maddie356 | Posted: 02-11-2021

Similar questions

math - How to Calculate Centroid in python

I'm beginner to python coding. I'm working over structural coordinates. I have pdb structure which have xyz coordinate information (last three col) ATOM 1 N SER A 1 27.130 7.770 34.390 ATOM 2 1H SER A 1 27.990 7.760 34.930 ATOM 3 2H SER A 1 27.160 6.960 33.790 ATOM 4 3H SER A 1 27.170 8.580 33.790 ATOM 5 CA SER A ...

math - How to Calculate Centroid in python

I'm beginner to python coding. I'm working over structural coordinates. I have pdb structure which have xyz coordinate information (last three col) ATOM 1 N SER A 1 27.130 7.770 34.390 ATOM 2 1H SER A 1 27.990 7.760 34.930 ATOM 3 2H SER A 1 27.160 6.960 33.790 ATOM 4 3H SER A 1 27.170 8.580 33.790 ATOM 5 CA SER A ...

How Python calculate number?

This question already has answers here:

python - Calculate score in a pyramid score system

I am trying to calculate gamescores for a bunch over users and I haven't really got it yet. It is a pyramid game where you can invite people, and the people you invite is placed beneth you in the relations tree. So if i invite X and X invites Y i get kickback from both of them. Let's say 10%^steps... So from X i get 10% of his score and 1% from Y, and X get 10% from Y. So to calculate this i was thi...

How to calculate a mod b in Python?

Is there a modulo function in the Python math library? Isn't 15 % 4, 3? But 15 mod 4 is 1, right?

To calculate the sum of numbers in a list by Python

My data 466.67 465.56 464.44 463.33 462.22 461.11 460.00 458.89 ... I run in Python sum(/tmp/1,0) I get an error. How can you calculate the sum of the values by Python?

python - How to calculate a date back from another date with a given number of work days

I need to calculate date (year, month, day) which is (for example) 18 working days back from another date. It would be enough to eliminate just weekends. Example: I've got a date 2009-08-21 and a number of 18 workdays as a parameter, and correct answer should be 2009-07-27. thanks for any help

python - How to calculate the scrape URL for a torrent

I've read the Bit-torrent specification and done a number of searches, trying to find out how I can get the seeds/peers/downloaded data from a torrent tracker (using Python). I can calculate the info hash from a Torrent no problem, which matches up with the info hash given by various working torrent applications. However, when I try to get the information from the tracker I either timeout (the tracker is working) o...

datetime - How to use Python to calculate time

I want to write python script that acts as a time calculator. For example: Suppose the time is now 13:05:00 I want to add 1 hour, 23 minutes, and 10 seconds to it. and I want to print the answer out. How do I do this in Python? What if date is also involved?

c# - Calculate percent at runtime

I have this problem where I have to "audit" a percent of my transtactions. If percent is 100 I have to audit them all, if is 0 I have to skip them all and if 50% I have to review the half etc. The problem ( or the opportunity ) is that I have to perform the check at runtime. What I tried was: audit = 100/percent So if percent is 50 audit = 100 /...

python - Calculate time between time-1 to time-2?

enter time-1 // eg 01:12 enter time-2 // eg 18:59 calculate: time-1 to time-2 / 12 // i.e time between 01:12 to 18:59 divided by 12 How can it be done in Python. I'm a beginner so I really have no clue where to start. Edited to add: I don't want a timer. Both time-1 and time-2 are entered by the user manually. Thanks in advance for your help.

python - How to calculate positions of holes in a game board?

I'm making a game with Python->PyGame->Albow and ran into a problem with board generation. However I'll try to explain the problem in a language agnostic way. I believe it's not related to python. I've split the game board generation into several parts. Part one generates the board holes. Holes are contained in a list/array. Each hole object has a mapping of angles relating to other...

Still can't find your answer? Check out these communities...

PySlackers | Full Stack Python | NHS Python | Pythonist Cafe | Hacker Earth | Discord Python