# How do you make this code more pythonic?

Could you guys please tell me how I can make the following code more pythonic?

The code is correct. Full disclosure - it's problem 1b in Handout #4 of this machine learning course. I'm supposed to use newton's algorithm on the two data sets for fitting a logistic hypothesis. But they use matlab & I'm using scipy

Eg one question i have is the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?

Thanks

```
import os.path
import math
from numpy import matrix
from scipy.linalg import inv #, det, eig
x = matrix( '0.0;0;1' )
y = 11
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
theta = matrix( '0.0;0;0' )
# run until convergence=6or7
for i in range(1, 6):
#reset
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
#over whole set=99 items
for i in range(1, 100):
xline = xfile.readline()
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yfile.readline())
hypoth = 1/ (1+ math.exp(-(theta.transpose() * x)))
for j in range(0,3):
grad[j] = grad[j] + (y-hypoth)* x[j]
for k in range(0,3):
hess[j,k] = hess[j,k] - (hypoth *(1-hypoth)*x[j]*x[k])
theta = theta - inv(hess)*grad #update theta after construction
xfile.close()
yfile.close()
print "done"
print theta
```

# Answer 1

One obvious change is to get rid of the "for i in range(1, 100):" and just iterate over the file lines. To iterate over both files (xfile and yfile), zip them. ie replace that block with something like:

```
import itertools
for xline, yline in itertools.izip(xfile, yfile):
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yline)
...
```

(This is assuming the file is 100 lines, (ie. you want the whole file). If you're deliberately restricting to the *first* 100 lines, you could use something like:

```
for i, xline, yline in itertools.izip(range(100), xfile, yfile):
```

However, its also inefficient to iterate over the same file 6 times - better to load it into memory in advance, and loop over it there, ie. outside your loop, have:

```
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
data = zip([line.split(" ")[1:3] for line in xfile], map(float, yfile))
```

And inside just:

```
for (x1,x2), y in data:
x[0] = x1
x[1] = x2
...
```

# Answer 2

```
x = matrix([[0.],[0],[1]])
theta = matrix(zeros([3,1]))
for i in range(5):
grad = matrix(zeros([3,1]))
hess = matrix(zeros([3,3]))
[xfile, yfile] = [open('q1'+a+'.dat', 'r') for a in 'xy']
for xline, yline in zip(xfile, yfile):
x.transpose()[0,:2] = [map(float, xline.split(" ")[1:3])]
y = float(yline)
hypoth = 1 / (1 + math.exp(theta.transpose() * x))
grad += (y - hypoth) * x
hess -= hypoth * (1 - hypoth) * x * x.transpose()
theta += inv(hess) * grad
print "done"
print theta
```

# Answer 3

the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?

At the top of your code:

```
from __future__ import division
```

In Python 2.6 and earlier, integer division always returns an integer unless there is at least one floating point number within. In Python 3.0 (and in **future** division in 2.6), division works more how we humans might expect it to.

If you *want* integer division to return an integer, and you've imported from **future**, use a double //. That is

```
from __future__ import division
print 1//2 # prints 0
print 5//2 # prints 2
print 1/2 # prints 0.5
print 5/2 # prints 2.5
```

# Answer 4

You could make use of the **with** statement.

# Answer 5

the code that reads the files into lists could be drastically simpler

```
for line in open("q1x.dat", "r"):
x = map(float,line.split(" ")[1:])
y = map(float, open("q1y.dat", "r").readlines())
```

