Create charts of HackerNews information - URLs submitted by a user; and users submitting an URL.
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

See these two images.

The first image: Bob enters a username of someone on HackerNews. Retrieve the last 100 submissions for that user from HackerNews. Create a chart. The X axis is all the different URLs that user has posted. The Y axis is the number of submissions of those URLs the user has made. Display this chart.

EG: "Submissions for USERNAME"

example.com, 3 submissions; example.org, 2 submissions; example.co.uk, 27 submissions.

The second image: Bob enters an URL. (With nothing after the TLD.) Retrieve the last 100 submissions to HackerNews of that URL. Create a chart. The X axis will have each username that has submitted that URL. The Y axis will have the number of times those users have submitted that URL. Display this chart.

EG: "Submissions of Example.com"

Alice, 1; Bob, 8; Cas, 2; Dav, 28.

Any solution must use the HN API; and must "behave nicely" to HN servers.

http://www.hnsearch.com/api

any language? python+matplotlib would be ok?
hashme33 over 1 year ago
Yes! Any language is fine.
danbc over 1 year ago
is it ok to edit the script to change query, or it should have console parameters?
hashme33 over 1 year ago
Editing the script is fine. Uh, just have a nice clear comment near whatever needs to be edited.
danbc over 1 year ago
awarded to Wikimedia via hashme33

Crowdsource coding tasks.

1 Solution

Winning solution

https://dl.dropbox.com/u/10235407/scr/out.png - here is example
https://dl.dropbox.com/u/10235407/scr/hnsearch.py script itself.
Do you know how to install Python 2.7 + Matplotlib?

#!/usr/bin/env python
# -*- coding: utf-8 -*-


BOTTOM_ADJUST = 0.2 #make this to biffer if long usernames get out of the border
BAR_WIDTH =0.5 #width of each bar
FONTSIZE = 8 #font size
SPACING = 1.1 #distance in proportion to width
QUERY = "DanBC" # what are we querying
IS_DOMAIN = False # domain or username? if domain, than True, else username
LIMIT = 100 #limit
DPI = 120 # make it bigger if the quality is bad
OUT_FILE = "out.png" #where to write
FIGURE_WIDTH = 4*BAR_WIDTH # width of the figure, adjust if there are a lot of usernames


import urllib2
import json
from collections import defaultdict
import matplotlib.pyplot as plt
import numpy as np
import urllib

def get_value_stats(query, limit, is_domain):
    url = "http://api.thriftdb.com/api.hnsearch.com/items/_search"
    field = 'domain' if is_domain else 'username'
    result_field = 'domain' if not is_domain else 'username'
    dct = {'filter[fields][%s]'%field:query,
        'limit':limit,
        'filter[fields][type]':'submission',
        'sortby':'create_ts desc'}
    payload = urllib.urlencode(dct)
    data = urllib2.urlopen(url+"?"+payload).read()
    data = json.loads(data)
    counter = defaultdict(lambda:0)
    data = data['results']
    for d in data:
        if str(d['item'][result_field])!='None':
            counter[d['item'][result_field]]+=1
    return counter

stats = get_value_stats(QUERY,LIMIT, IS_DOMAIN)


def build_plot(data, out_file):
    N = len(data)
    ind = np.arange(N)*BAR_WIDTH*SPACING  # the x locations for the groups
    dt = [d[1] for d in data]
    fig = plt.figure()
    fig.subplots_adjust(bottom=BOTTOM_ADJUST)
    fig.set_figwidth(fig.get_figwidth()*FIGURE_WIDTH)
    #fig.set_figwidth(FIGURE_WIDTH*N)
    ax = fig.add_subplot(111)
    ax.set_ylabel("Number of submissions")
    if IS_DOMAIN:
        ax.set_title('Posts by username for %s'%QUERY)
    else:
        ax.set_title('Posts by domain for %s'%QUERY)
    ya = ax.get_yaxis()
    ya.set_major_locator(plt.MaxNLocator(integer=True))
    ax.bar(ind, dt, BAR_WIDTH, color='r')
    ax.set_xticks(ind+BAR_WIDTH/2)
    ax.set_xticklabels( [d[0] for d in data], rotation='vertical')
    for tick in ax.xaxis.get_major_ticks():
        tick.label.set_fontsize(FONTSIZE) 
    ax.axis('tight')
    #plt.show()
    plt.savefig(out_file, dpi=DPI,aspect='auto')

build_plot(stats.items(), OUT_FILE)
@hashme33 Cool solution! Could you post the script here as well? Makes it easier to audit the solution history if there are multiple submissions (I'll make that more clear in the guidelines). Thanks.
bevan over 1 year ago
@bevan, did you mean embedding script code into the comment?
hashme33 over 1 year ago
@hashme33 Thanks for updating the solution, that's what I meant. Cheers.
bevan over 1 year ago
Thank you for this! Feel free to submit it to HN as a "Show HN" if you want to.
danbc over 1 year ago
View Timeline