Posted in python

Python UDP Client and Server

I use this often, so I’m pasting the code here as a reference.

UDP Client

From PyMOTW :

import socket
import sys

# Create a UDP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

server_address = ('192.168.0.100', 8080)
message = 'This is the message.  It will be repeated.'

try:

    # Send data
    print >>sys.stderr, 'sending "%s"' % message
    sent = sock.sendto(message, server_address)

    # Receive response
    print >>sys.stderr, 'waiting to receive'
    data, server = sock.recvfrom(4096)
    print >>sys.stderr, 'received "%s"' % data

finally:
    print >>sys.stderr, 'closing socket'
    sock.close()

UDP Server

From PyMOTW :

import socket
import sys

# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)

# Bind the socket to the port
server_address = ('localhost', 10000)
print >>sys.stderr, 'starting up on %s port %s' % server_address
sock.bind(server_address)
while True:
    print >>sys.stderr, '\nwaiting to receive message'
    data, address = sock.recvfrom(4096)

    print >>sys.stderr, 'received %s bytes from %s' % (len(data), address)
    print >>sys.stderr, data

    if data:
        sent = sock.sendto(data, address)
        print >>sys.stderr, 'sent %s bytes back to %s' % (sent, address)
Posted in python

AsyncIO Python Pitfalls

In an earlier post I mentioned switching my code to use Python3’s Async/Await and not seeing any noticeable improvements. In an article on LWN this week the answer popped out:

Much of the Python standard library is written in blocking fashion, however, so the socket, subprocess, and threading modules (and other modules that use them) and even simple things like time.sleep() cannot be used in async programs. All of the asynchronous frameworks provide their own non-blocking replacements for those modules, but that means “you have to relearn how to do these things that you already know how to do”, Grinberg said.

(From An introduction to asynchronous Python )

That’s an annoyance that I can’t easily fix if I use other libraries that internally use these blocking modules. So I’ll need to see if I can break down the logic into smaller chunks and write asnyc versions for some parts at least (e.g. the fetching of a webpage if not for the pdf conversion, or by calling a subprocess asynchronously instead of using the existing blocking library).

The docs have sections for async transports, protocols, subprocesses and so on, so I’ll need to spend some time looking at it before I proceed.

More to come when I make some progress.

Posted in links, python

web2pdf – A tool to export bookmarks to PDF

Here’s a follow up to my earlier post on archiving. I spent a couple of days coming up with a quick Python app to fill my needs.

Here it is: web2pdf.

Once installed, the configuration simply expects a bookmarks.html file on the filesystem. It reads it, stores the contents to an sqlite DB and starts  saving PDF versions of each link there-in.

You can kill the script and re-run it at a later point in time and it will continue where it stopped. The output looks like this:

(pdf) bash-4.3 ~/code/web2pdf/web2pdf$ ./web2pdf.py 
Found 2599 links in the bookmark file
Found 2599 rows in the bookmark db
..of which 81 links are already saved
..and 2506 are pending
Hit enter to start downloading pending PDFs
Downloading https://www.quantamagazine.org/20170207-bell-test-quantum-loophole/ | experiment-reaffirms-quantum-weirdness

There are more details in the github link. I’m not happy with how fast slow it is but I seem to be limited by the library I’m using and the use-case itself: fetching a page is trivial but it has to render it before exporting it.

As always there is more to do but it works pretty well already. It tags failed bookmarks separately in the DB in case it needs retrying later. I’ve tried to speed it up using Python3’s native async/await, but the performance improvements are not noticeable so far. I’ll try with multiprocess instead and commit whichever one works better.

Posted in books, python

Plotting eight years of book buying habits

Since late-2008, I’ve been adding my book purchases to my online catalog at LibraryThing. I thought 8 years worth of (a relatively small set of) data would be worthwhile to play with.
 
They had an export option that gave me a tab-separated dump of all these books. One column in this file is the ‘Entry Date‘ that shows when I added it to the website (usually within a day or two of buying it). This is what I was interested in.

My first exploration involved matplotlib. It did its job well but I got sidetracked into ‘prettier’ packages like plot.ly and bokeh. The latter is what I ended up using.

The data processing was trivial. I only needed to calculate how many books I added in a given month and plot a bar chart from the resulting counts. The result looks like this:

And it confirms what I suspected!

  • I got married in 2011. The density drops off drastically then, but is still reasonable.
  • Kid #1 popped out in April 2014, and the second little fellow in March 2016. The counts are a lot more sparse there onwards 😦

I’m trying to get them interested in books so they’ll leave me in peace as well. Let’s see how that goes.

…and here is the code to make it happen. It’s just a simple script so I didn’t make the extra effort in packaging it and so on. I used anaconda/Spyder to develop the script, and it was pretty easy, despite my complete lack of knowledge in this area.

Posted in python

rssmailer.py – Convert RSS feeds to a digest email

I finally scratched a long-pending itch and wrote a python application that:
  • Reads a list of feeds (that update more frequently than I prefer) once a day,
  • Extracts the last day’s posts,
  • and mails it to me.
I used mailgun’s neat APIs to send out the mails, and feedparser to do the rss/atom parsing.
The application hinges on twisted to asynchronously parse multiple feeds at a time.
Here’s the package in PyPi: https://pypi.python.org/pypi/rssmailer/0.2
And here’s the product page: https://tools.indeliblestamp.com/
The mail below is what a sample mail looks like when I get it for one of the sites.
—– Original message —–
From: RSS Digest Mailer
To:
Subject: Digest mail for 3quarksdaily
Date: Thu, 05 May 2016 13:02:46 +0000

Dangerous Fictions: A Pakistani Novelist Tests the Limits

Dexter Filkins in The New Yorker: Hanif got the idea of writing about a nurse in a decrepit hospital. Alice Bhatti (named for his old editor) is a ferociously strong young woman: smart, independent, and rebellious to the point of…

Our brain uses statistics to calculate confidence, make decisions

From PhysOrg: The directions, which came via cell phone, were a little garbled, but as you understood them: “Turn left at the 3rd light and go straight; the restaurant will be on your right side.” Ten minutes ago you made…

Trump-Sanders Phenomenon Signals an Oligarchy on the Brink of a Civilization-Threatening Collapse

Sally Goerner in Evonomics: The media has made a cottage industry out of analyzing the relationship between America’s crumbling infrastructure, outsourced jobs, stagnant wages, and evaporating middle class and the rise of anti-establishment presidential candidates Donald Trump and Bernie Sanders….

Stop telling kids you’re bad at math

Petra Bonfert-Taylor in the Washington Post: Why do smart people enjoy saying that they are bad at math? Few people would consider proudly announcing that they are bad at writing or reading. Our country’s communal math hatred may seem rather…

Blockchain technology will revolutionise far more than money: it will change your life

Dominic Frisby in Aeon: The impact of record-keeping on the course of history cannot be overstated. For example, the act of preserving Judaism and Christianity in written form enabled both to outlive the plethora of other contemporary religions, which were…

Warsan Shire: the Somali-British poet quoted by Beyoncé in Lemonade

Rafia Zakaria in The Guardian: She writes of places where many Beyoncé fans rarely go, the portions of London where the faces are black and brown, where men huddle outside shop-front mosques and veiled women are trailed by long chains…

The Essence of Mathematics, in One Beatles Song

Ben Orlin in Math With Bad Drawings: Okay, here’s a life regret: No one has ever stopped me on the street, grabbed me by the collar, and demanded that I explain to them the essence of mathematics. Me: So, you…

How should we live in a diverse society?

Kenan Malik in Pandaemonium: ‘Can Europe be the same with different people in it?’ So asked the American writer Christopher Caldwell in his book, Reflections on the Revolution in Europe, published a few years ago. It is a question that…

Sarah Palin, Jimmy Kimmel and Scientists on Climate Change

Why did the death of a single lion cause a sustained uproar?

Jason Goldman for Conservation Magazine: When the story of Cecil the lion’s death at the hands of an American hunter hit the media, the global response was “the largest reaction in the history of wildlife conservation,” according to a new…

Posted in python

Python 3 Unicode annoyances

Every once in a while I feel guilty for not using Python 3, so I spin it up for a few rounds. My experience is usually:

  1. Start using Python 3
  2. oops, UnicodeDecodeError
  3. Go back to Python 2

Looks like I’m not the only one who has this frustration. Knowing when to use encode vs decode was always a frustrating exercise in trial and error. There are some good tips in the linked thread and is worth a thorough read. A useful bit is this comment from redditor Fylwind, part of which is:

  • encode: textual data to binary data.
  • decode: binary data to textual data.

The term “encode” means to a transformation from some high-level structure into bytes, hence in the context of strings it means converting text into binary data.

Q. What are the appropriate data types for textual data and binary data?

  • In Python 3:
    • Textual data is str, written as "foo".
    • Binary data is bytes, written as b"foo".
    • The encode function only works on textual data, and the decode function only works on binary data.
  • In Python 2:
    • Textual data is unicode, written as u"foo". If unicode_literals is enabled, then it’s "foo".
    • Binary data is str (alias: bytes), written as "foo". If unicode_literals is enabled, then it’s b"foo"