What I Learned Today

No frills, just learn

Posts in the Science category

A wealth of information in the Vapnik's masterpiece

An admissible structure is one satisfying the following three properties.

1) The set S* is everywhere dense in S
2) The VC-dimension of each set of functions contained in S is finite.
3) Any element of the structure contains totally bounded functions

In the first part of the file tutorial2.py of repository Tensorflow in my github, you have the commented version of the MINST tutorial for experts in Tensorflow.

Tensorflow is an opensource API for several languages that act as frontend for a machine learning backend that is quite flexible.

What I was surprised for is the result of this first small exercise, that is: you take the pixel of the MINST package for character recognition traning. You flatten them. A packet of 784 pixel serves as an ordinate of your linear regression, and the numbers from 0 to 9 (actually 10, than you flat it) as abscissa of possible values contained in the pixel packet.

A linear regression fitting the possible outcomes, even with a simple ordinary least squares and a fit a first order degree, results in an accuracy around 92%.

True, is terrible and embarrassing (as Google points out). After all, there only 10 options, missing 1 over 10 is not a great deal, but think about the feature here: image recognition with a linear regression!

I also made a live youtube video about it (in Italian), that received decent following (add 100 views).

Now with the dubious part: It might be this condition what makes neural network possible and feasible?

Now I'm trying to go a step further with nuclear mass models, and while image recognition are all fine and dandy, even a simple mass formula has several fractional exponents and a polynomial expansion to take into account. Fitting a non local density functional is something orders of magnitude more complicated!

In fact one of the limits that I'm noticing with Tensorflow is the choice of optmization algorithms, mostly (all?) based on derivatives (no Nelder-Mead, sadly), thus suited for well behaved systems but less for the complex ridges of energy landscapes.

Another issue is the limited optimization on multicore and massively multicore system. Still I have to manage to compile it under cluster. In my dual socket workstation with 20 cores HT Tensorflow insists in using only at most (but rarely) 4-5 cores...

Warning: preg_replace(): The /e modifier is no longer supported, use preg_replace_callback instead in /home/phme2432/public_html/wp-content/plugins/latex/latex.php on line 47

In a two days coding challenge over workers day holiday, I decided to play with maps as means to plot information. What I decided to are use are the widely available shapefiles and python matplotlib packages, using mainly pandas and numpy to analyse data.

You can find everything in this git repository.

First of all you can start familiarising with shapefiles by plotting the pure patches, as in function mP in mapPlot.py.

You need these packages:

import shapefile
from matplotlib.patches import Polygon
import matplotlib
import numpy as np

and you start by reading the file flnm in this way:

sf = shapefile.Reader(flnm)
shapes = sf.shapes()
Nshp = len(shapes)

records all the properties of your map, in particular the shapes attribute (records attribute contains the dataset that varies from file to file) and the length of the shapefile. And you are just need to cycle over the patches to build a Polygon array in matplotlib,

ptchs   = []
for nshp in xrange(Nshp):
    pts     = np.array(shapes[nshp].points)
    prt     = shapes[nshp].parts
    par     = list(prt) + [pts.shape[0]]

    for pij in xrange(len(prt)):

that can be finally plotted in the usual matplotlib way,

fig     = plt.figure()
ax      = fig.add_subplot(111)

ax.add_collection(PatchCollection(patches,facecolor='0.75', edgecolor='w', linewidths=.2))
ax.axis('auto'); ax.axis('off')

to get the final map.

Map of England divided by postcodes sections. From https://borders.ukdataservice.ac.uk/

However, just juxtaposing patches is not ideal. Libraries like basemap have inclusive and sophisticated routines to correcting for coma, geodetic aberrations and have more interesting options regarding plotting scales, legends...etc...

One has to import a little bit more stuff,

import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
from shapely.geometry import Point, Polygon, MultiPoint, MultiPolygon
from matplotlib.collections import PatchCollection
from descartes import PolygonPatch
import fiona

but the generalization and flexibility gained is worth it.

Use mP_Basemap open the file using fiona,

shp = fiona.open(flnm)
bds = shp.bounds

fiona enables to extract the bounds of map simply with .bounds command, then the bounds can be used as latitude and longitude extremes in basemap plotting routine

m = Basemap(
    ellps = 'WGS84',
    llcrnrlon=bounds[0] - extra * (bounds[2]-bounds[0]),
    llcrnrlat=bounds[1] - extra + 0.01 * (bounds[3]-bounds[1]),
    urcrnrlon=bounds[2] + extra * (bounds[2]-bounds[0],
    urcrnrlat=bounds[3] + extra + 0.01 * (bounds[3]-bounds[1]),

WGS84 is the chosen geodetic coordinate, have a look on the README.md and on the script inside the Map/ directory for further information on how to convert coordinate systems.

After this you can build a dataframe containing not only frames, but also the data of the shapefile, for example the ward name or properties in the case the map is containing those data,

# set up a map dataframe
df_map = pd.DataFrame({
    'poly': [Polygon(xy) for xy in m.map]
    ,'properties': [ward['name'] for ward in m.map_info]
df_map['area_m'] = df_map['poly'].map(lambda x: x.area)
df_map['area_km'] = df_map['area_m'] / 10000.

# draw ward patches from polygons
df_map['patches'] = df_map['poly'].map(lambda x: PolygonPatch(
    ec='#787878', lw=.75, alpha=.9,

to plot, use df_map['patches'] which contains the array of patches.

You can use the ratio between bounds to have a correct figure scaling, and the ellps guarantees you the correct coordinate representation.

Have a look in the function files on github for the complete functions.

After having obtained the dataframe, you can start by using it. You can print m.map_info to realise what type of labels are inside the map properties. More in general I used to import the dictionary keys related to map_info, inside a dataframe in this fashion

for dicti in m.map_info:
    temp_df = temp_df.append(pd.Series(dicti),ignore_index=True)

and with colnames = list(df.columns.values you can eventually get a full list of column names to elaborate and automatise your shapefile analysis.

The map that I use has postcode sectors throughout the whole England, in the column 'label'.  It joins the DataFrame of this map with an imported DataFrame of data, which has postcode sectors  in the column 'Sector',  keeping only the data that appears in both

    i1 = temp_df.set_index('label').index
    i2 = df.set_index('Sector').index

    temp_df = temp_df[i1.isin(i2)]

#Select only the part that corresonds to the imported dataframe of data
    df_map = pd.concat([df_map, temp_df], axis=1, join='inner')

This new map concerns only the part of the map that is also in the imported dataset.

Using Jenks natural breaks algorithm to color the map, and using the dataset of mortgages selected for postcodes over the Greater London area, the result is the following:

Map of Greater London divided by postcodes sections. Colored using Jenks algorithm regarding mortgages quantities in any postcode in Q3 2016.
Map from https://borders.ukdataservice.ac.uk/
Data from CML

The data has been taken from CML in this case, the .xlsx spreadsheet has been analysed with openpyxl in order to extract a DataFrame containing only rows including a name (e.g. London or Manchester) effectively isolating a geographic area together with the intersection above.

The spreadsheet contains then mortgages totals for every quarters between 2012 and 2016 Q3. One can use this data to synthesise an index indicating  the market health.

With only 16 datapoints in the time series any complicated time series analysis (i.e. any type of self regression or machine learning approach based on the time-series) is a steep road. I started by considering the most elementary condition of the market: an ordinary least squares (OLS) for a linear regression.

Many of the most stable results for the best sectors and districts, were, in fact, very well approximating a growing market.

Naively, best conditions you can have is a market which is already stable and has a good capital to start with, and then grows in a stable and sustainable fashion. If an area is deviating from this regime is more risky to invest in, so if the linear fit has an error on slope an intercept these can be simple accounted by subtraction in the index.

For this reason the first index I tried to build out of these data is simply given by the parameters of the OLS: slope , intercept , and errors on slope and intercept .

The index follows the OLS,

where indicates the average value of the slope across the dataset.

This model is not deprived of defects and deficiencies, but is good first approximation and try to this dataset yielding sensible results over a vast geographical area.

The final result (with a primitive scale and the correct aspect ratio enabled) for the London area is in the following figure.

Map of Greater London divided by postcodes sections. Colored using Jenks algorithm regarding index between Q1 2012 and Q3 2016. Map from https://borders.ukdataservice.ac.uk/
Data from

Again, I remind you about the github repository for the complete and hopefully working code (I understand it needs refactoring and debugging, is only a two days challenge) and further details. Please write me for any comment or insult.

Andrea Idini.

I read a lot of misconceptions this morning related to this article regarding Google Translate.  Is not properly fresh news but this morning in my telegram group @scienza, this other popularization article has been posted that completely misunderstood the premises of the original academic article (also the so-called informed comments are not really , so I decided to try to keep the record straight and offer a question.

In the article the approach is referred as a multitasking learning framework

Our approach is related to the multitask learning framework [4]. Despite its promise, this framework
has seen limited practical success in real world applications. In speech recognition, there have been many
successful reports of modeling multiple languages using a single model (see [17] for an extensive reference and
references therein). Multilingual language processing has also shown to be successful in domains other than
translation [9, 23].

where a neural network (NN) is trained on several tasks simply by implementing extra tokens (in this case the languages) in the input and ground truth layers. The NN will learn all the designated languages simultaneously by associating phrases with ground truth respect to designated linguistic points (BLEU scores) for the whole network and not disjunctly by direct correlation of two languages. In a sense, it probes the off diagonal degrees of freedom of the NN.

What a lot of people then started fantasising a little bit too much about is the "interlingua" process. My guy referred to it even as to a bytecode correspondence between languages that would be unfeasible and would defy too much the purpose of this NN: this has been tried to re-code as little as possible of the original Translate algorithm (that already includes all the semantic, glottology...etc... work of Google's researchers and engineers), and an eventual bytecode would be not flexible and would require a complete overhaul of the code!

What Google Researchers have seen in the NN, is that the same phrase clusterize in different languages according to a specific metric: t-distributed stochastic neighbor embedding, which is a sophisticated projection which reduces dimensionality while preserving pairwise distances (thus creating a low-dimensional metric space).

I wonder: what if I change metric, would I be able to clusterize, according to another metric, the languages?

CP Symmetry violation

Matter-Antimatter symmetry violation might have found it's first culprit and have been recently published on Nature (here's the arXiv https://arxiv.org/abs/1609.05216 )

Together with the results of AMS (pasted) have been an interesting 2016 for big collaborations.

What can nuclear physics contribute to such big and fancy experiments and new quests for dark matter or CP symmetry? All these experiments use nuclei at one point or another of their experimental chain, all of them use the strong force and its low-energy extrapolation...

The First Five Years of the Alpha Magnetic Spectrometer on the International Space Station

Nuclear wave function in function of for the proton state, obtained by diagonalizing the self energy in the exact continuum used above. The first eigenvalue at -23.97 MeV (solid red line) is compared by the corresponding harmonic oscillator state multiplied by the square root of the spectroscopic factor (dashed red line), the second eigenvalue at -2.85 MeV (solid blue line) is similarly compared with harmonic oscillator state (dashed blue line). The spectroscopic factors are 0.78 and 0.21 respectively

A IMHO good picture to didactically illustrate the effect of many-body dynamics on nuclei.

I post it here since that it did not make it to the final version of the Proceeding for Pisa conference, but I think it illustrate nicely how, from the combination of several base state wavefunctions we build many-body wavefunctions which have different properties (are quenched, so posses spectroscopic factors, and have a completely different tail) from the harmonic oscillator starting point (dashed).

This is also why reaction dynamics are way different calculated with a complete set of many-body relation instead of a simple mean field picture.

IMHO would also be difficult to reproduce the richness of these wavefunctions with a single Wood Saxon, let alone an harmonic oscillator potential (you can notice that the decay of the is quite fat respect to the exponential of an harmonic oscillator eigenfunction).

Non-local optical potentials are the way to go! 😉

(absolutely objective and no conflict of interest there), more info here: arXiv:1612.01478 [nucl-th], and soon-to-be publication.


Github returns

I am setting up and using the public github repository after many years. I decided to start releasing some old projects publicly, since seems apparent I will never have time to write anymore papers, nor to polish the code.

Eventually, at least for these little "didactic" subprojects, I will start to do little explanatory videos, or blogposts, instead of articles just to save times and make the process more fluid.

So, with the philosophy of releasing as process, not as product, here it is the first project on Pairing Vibration RPA:


Enjoy and let me know.

Bayesian concept of probability is linked not with frequency, but with state of information. The processing of this information proceeds in a propositional sense. In fact there are semantic work related to the use of Bayesian statistic as foundation for programming languages (BLOG), First kind logic (MEBN , also look at the fun Of Starships and Klingon) and thus the foundation principles of Network-Centric warfare and commerce.

Property of Baysian calculation of proabability are sometimes difficult to grasp and very counter intuitive, interesting consideration can hide behind apparently trivial formulas that can help to answer questions like: If you know your neighbour has a son born on Tuesday, which is the probability your neighbour has two sons? (more…)

Value at Risk on expected return

Value at Risk on expected return is the lower bound for the loss in an investement respect to a probability.
Is a way to estimating losses probabilities in terms of value, instead of standard deviation. E.g. in the case of , means that the investor have a probability of loosing € or more, investing € on the portfolio .

Considering the system ergodic and thus all the variances of the titles summing up coherently to for a variance of the portfolio distributed within the normal distribution, this is simply calculated considering the quantile (the value of the distribution corresponding to a certain fraction of probability) .

(!) The of a portfolio is not a weighted average of the of its component, because but is given by the covariance matrix due to the correlation of assets .