Python UDP Client and Server

I use this often, so I’m pasting the code here as a reference.

UDP Client

From PyMOTW :

import socket
import sys

# Create a UDP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

server_address = ('', 8080)
message = 'This is the message.  It will be repeated.'


    # Send data
    print >>sys.stderr, 'sending "%s"' % message
    sent = sock.sendto(message, server_address)

    # Receive response
    print >>sys.stderr, 'waiting to receive'
    data, server = sock.recvfrom(4096)
    print >>sys.stderr, 'received "%s"' % data

    print >>sys.stderr, 'closing socket'

UDP Server

From PyMOTW :

import socket
import sys

# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

# Bind the socket to the port
server_address = ('localhost', 10000)
print >>sys.stderr, 'starting up on %s port %s' % server_address
while True:
    print >>sys.stderr, '\nwaiting to receive message'
    data, address = sock.recvfrom(4096)

    print >>sys.stderr, 'received %s bytes from %s' % (len(data), address)
    print >>sys.stderr, data

    if data:
        sent = sock.sendto(data, address)
        print >>sys.stderr, 'sent %s bytes back to %s' % (sent, address)

AsyncIO Python Pitfalls

In an earlier post I mentioned switching my code to use Python3’s Async/Await and not seeing any noticeable improvements. In an article on LWN this week the answer popped out:

Much of the Python standard library is written in blocking fashion, however, so the socket, subprocess, and threading modules (and other modules that use them) and even simple things like time.sleep() cannot be used in async programs. All of the asynchronous frameworks provide their own non-blocking replacements for those modules, but that means “you have to relearn how to do these things that you already know how to do”, Grinberg said.

(From An introduction to asynchronous Python )

That’s an annoyance that I can’t easily fix if I use other libraries that internally use these blocking modules. So I’ll need to see if I can break down the logic into smaller chunks and write asnyc versions for some parts at least (e.g. the fetching of a webpage if not for the pdf conversion, or by calling a subprocess asynchronously instead of using the existing blocking library).

The docs have sections for async transports, protocols, subprocesses and so on, so I’ll need to spend some time looking at it before I proceed.

More to come when I make some progress.

web2pdf – A tool to export bookmarks to PDF

Here’s a follow up to my earlier post on archiving. I spent a couple of days coming up with a quick Python app to fill my needs.

Here it is: web2pdf.

Once installed, the configuration simply expects a bookmarks.html file on the filesystem. It reads it, stores the contents to an sqlite DB and starts  saving PDF versions of each link there-in.

You can kill the script and re-run it at a later point in time and it will continue where it stopped. The output looks like this:

(pdf) bash-4.3 ~/code/web2pdf/web2pdf$ ./ 
Found 2599 links in the bookmark file
Found 2599 rows in the bookmark db
..of which 81 links are already saved
..and 2506 are pending
Hit enter to start downloading pending PDFs
Downloading | experiment-reaffirms-quantum-weirdness

There are more details in the github link. I’m not happy with how fast slow it is but I seem to be limited by the library I’m using and the use-case itself: fetching a page is trivial but it has to render it before exporting it.

As always there is more to do but it works pretty well already. It tags failed bookmarks separately in the DB in case it needs retrying later. I’ve tried to speed it up using Python3’s native async/await, but the performance improvements are not noticeable so far. I’ll try with multiprocess instead and commit whichever one works better.

Python 3 Unicode annoyances

Every once in a while I feel guilty for not using Python 3, so I spin it up for a few rounds. My experience is usually:

  1. Start using Python 3
  2. oops, UnicodeDecodeError
  3. Go back to Python 2

Looks like I’m not the only one who has this frustration. Knowing when to use encode vs decode was always a frustrating exercise in trial and error. There are some good tips in the linked thread and is worth a thorough read. A useful bit is this comment from redditor Fylwind, part of which is:

  • encode: textual data to binary data.
  • decode: binary data to textual data.

The term “encode” means to a transformation from some high-level structure into bytes, hence in the context of strings it means converting text into binary data.

Q. What are the appropriate data types for textual data and binary data?

  • In Python 3:
    • Textual data is str, written as "foo".
    • Binary data is bytes, written as b"foo".
    • The encode function only works on textual data, and the decode function only works on binary data.
  • In Python 2:
    • Textual data is unicode, written as u"foo". If unicode_literals is enabled, then it’s "foo".
    • Binary data is str (alias: bytes), written as "foo". If unicode_literals is enabled, then it’s b"foo"

Python square root algorithm

I’m learning Sedgewick’s Algorithms and the examples are all in Java. So I’m converting them to Python for my own understanding.
Here’s the first example I came across in the first chapter, for finding a square root:
def sqrt(i):
    if i<=0:
        return None
    x = i
    while (x-i/x) > (e):
    return x
In [30]: sqrt(4)
Out[30]: 2.0
In [31]: sqrt(9)
Out[31]: 3.0
In [32]: sqrt(142857)
Out[32]: 377.96428402694346
I ought to raise an Exception instead of returning None in the beginning. My maths skills have turned painfully rusty — it took me a good 5-10 minutes of reading the example in the book before I understood that it worked, after which I wrote this version without too much trouble.

Hopefully I’ll stick with this project through to the end.

Basic Twisted echo server

Nothing mind blowing here. I’m playing with Python’s twisted engine, and here is a slightly modified server from their home page example:

from twisted.internet import protocol, reactor
class Echo(protocol.Protocol):
    def dataReceived(self, data):
        newdata = 'reversed: ' + ''.join(reversed(data.strip())) + 'n'
class EchoFactory(protocol.Factory): #build an Echo object for each connection
    def buildProtocol(self, addr):                                                                                      
        return Echo()
reactor.listenTCP(8000, EchoFactory()) #register a callback

Run the script and telnet to port 8000 of this box. This is what it looks like:

Connected to erdos.
Escape character is '^]'.
reversed: olleh
reverse this!
reversed: !siht esrever

The example is also here, with a more detailed explanation of the basics.

The API reference helps to see what methods need to be implemented while sub-classing one of its classes.