Skip to content
This repository has been archived by the owner on May 17, 2018. It is now read-only.

Latest commit

 

History

History
435 lines (321 loc) · 12.6 KB

README.md

File metadata and controls

435 lines (321 loc) · 12.6 KB

This repository is a fork of a wider project called matchID.

:squirrel: about matchID :squirrel:

matchID aims at helping developers (and organizations) to match people's identities, a kind of dedupe library but focused mainly on people.

Two main applications :

  • link a database filled with people informations to another database also filled with people informations
  • remove duplicate people from a database

matchID is developed by @rhanka and @SuperKiwi who is part of the "Entrepreneur d'Intérêt Général" 2017 program, a French presidential program aiming to bring tech people into working for an administration for a 10 month period — inspired by Obama's Presidential Innovation Fellows (PIF).

To install matchID, please follow guidelines of matchID-backend and matchID-frontend.

matchID validation

The goal of this repository (the validation sub-project) is to provide an user-friendly interface to display matching results.

It allows users to easily visualize matchs between people.

Main used technologies are VueJs and ElasticSearch.

An elasticsearch instance running is mandatory to make this application work

Screenshot Full



Contents



Installation


Requirements

  • a recent version of node
  • a recent version of npm or yarn
  • a recent version (> 5.x) version of elasticsearch

Elasticsearch

To install and configure elasticsearch on your server or localhost, please follow official guidelines

Note sure about the minimum required version of elasticsearch which is needed. We used 5.x versions to develop matchID


Repository

git clone https://github.com/eig-2017/matchID-validation.git
cd matchID-validation
yarn -OR- npm install

cp -r matchIdConfig.example matchIdConfig

And once, your configuration has been set, use yarn run dev (or npm run dev)



Quick tour


Navbar

Screenshot Navbar

There are four different parts in the navbar:

  • Change the language of your app

  • / Green check or Red cross if connection to Elasticsearch is working

  • Keyboard shortcuts cheatsheet

  • Statistics on processed matchs


Table Controller

There are also four parts in the controller :

  • Query we are sending to Elasticsearch

  • Range filter if you have a score column on your datasets

  • Filter through results from Elasticsearch with text filtering

  • Filter through results from Elasticsearch depending on the fact that a match has been already processed or not


Data Table

Please note that all data displayed here is dummy randomized data

The data table lists all different matchs found by matchID backend. Except the two columns Results and Status, all other columns can be customized.

Generally speaking, many things can be customized to your needs in matchID validation

  • Results column are pre-computed results according to the score (displayed in the previous column). In this example:
    • if the score is ajsonbove 55, the first checkbox (validation_decision) will be set to true
    • if the score is between 40 and 65, the question mark checkbox (validation_indecision) will be set to true
  • Status column is checked to true once you consider that the results (validation_decision and validation_indecision) are correct


Configuration


Elasticsearch

Your data mapping should look like this (names and types are fully customizable) :

{
  "properties": {
    "hashed_hexadecimal": {
      "type": "text",
      "store": true    
    },
    "last_name_1": {
      "type": "text",
      "store": true
    },
    "last_name_2": {
      "type": "text",
      "store": true    
    },
    "first_name_1": {
      "type": "text",
      "store": true
    },
    "first_name_2": {
      "type": "text",
      "store": true    
    },  
    "date_of_birth_1": {
      "type": "text",
      "store": true
    },
    "date_of_birth_2": {
      "type": "text",
      "store": true    
    },
    "...": {
      "...": "..."
    },
    "...": {
      "...": "..."
    },
    "distance_between_birth_cities": {
      "type": "long",
      "store": true
    },
    "score": {
      "type": "long",
      "store": true
    },
    "...": {
      "...": "..."
    },
    "...": {
      "...": "..."
    },
    "validation_decision": {
      "type": "text",
      "fielddata": true
    },
    "validation_done": {
      "type": "long",
      "store": true
    }
  }
}

A few notes :

  • Concerning validation_decision, validation_done and optional validation_indecision, you can either keep field as a text with "fielddata": true or just set field as long (more in Validation)
  • hashed_hexadecimal will be explain in Random Hash

A row example for the previous set up would be :

hashed_hexadecimal last_name_1 last_name_2 first_name_1 first_name_2 date_of_birth_1 date_of_birth_2 score valdation_decision validation_done
a2f34aeb3d777f82 Grenier Grenier Martin Jorge Robert Martin Georges 04-03-1989 03-04-1989 68 null null

Columns

If we keep the above example, your columns.json should look like this :

[
  {
    "field": "hashed_hexadecimal",
    "label": "Hashed Id",
    "display": false,
    "searchable": true
  },
  {
    "field": ["last_name_1", "last_name_2"],
    "label": "Last Name",
    "display": true,
    "searchable": true,
    "callBack": "coloredDiff"
  },
  {
    "field": ["first_name_1", "first_name_2"],
    "label": "First Name",
    "display": true,
    "searchable": true,
    "callBack": "coloredDiff"
  },
  {
    "field": ["date_of_birth_1", "date_of_birth_2"],
    "label": "Date of Birth",
    "display": true,
    "searchable": true,
    "callBack": "formatDate",
    "appliedClass": {
      "head": "head-centered",
      "body": "has-text-centered"
    }
  },
  {
    "field": "distance_cities_of_birth",
    "label": "Distance",
    "display": true,
    "searchable": false,
    "callBack": "formatDistance",
    "appliedClass": {
      "head": "head-centered",
      "body": "has-text-centered"
    }
  },
  {
    "field": "score",
    "label": "Score",
    "display": true,
    "searchable": true,
    "type": "score",
    "appliedClass": {
      "head": "head-centered",
      "body": "has-text-centered min-column-width-100"
    }
  }
]

Mandatory sub-fields : field, label, display, searchable.

Some rules :

  • every displayed column (means it has a column in data table) can be searchable or not.
  • you can (or not) add a callBack function to every displayed column (see custom functions)
  • you can (or not) add classes (that will be applied to head's cell or body's cell) - you will define your custom classes in custom.scss (see here)
  • every scale of numbers that you want to display using a progress bar needs to have the following attribute : "type": "score"

Custom functions

You can (and you should) set up custom functions in formatCell.js. Those functions will be used as callbacks (as defined above in Columns) for every cell in data table.

Do not forget to declare your functions inside the export default declaration.


Custom style

You can add your custom style to columns of cells by declaring your classes inside custom.scss.

Just apply them inside appliedClass in columns.json config file.

Bulma is the CSS framework used


Validation

You can decide to remove validation by setting "display": false in validation.json.

{
  "display": true,
  "action": {
    "label": "Results",
    "indecision_display": true
  },
  "done": {
    "label": "Status"
  }
}

If you enable validation, you need to define labels for both action and done fields.

If you set "indecision_display" to true, a question-mark checkbox will allow you to describe indecision concerning the match.


Scores

Let's check what's inside scores.json :

{
  "column": "score",
  "range": [0, 100],
  "colors": {
    "success": 80,
    "info": 60,
    "warning": 30,
    "danger": 0
  },
  "statisticsInterval" : 10,
  "preComputed": {
    "decision": 55,
    "indecision": [40, 65]
  }
}

In this example, we have score that range from 0 to 100.

We set up colors for different range (in this example, the color of progress bar will be success if score is above 80, info if score is between 60 and 80 and so on).

The preComputed field allows matchID-validation to prefill decision and indecision columns according to scores. In this example, if score is above 55, validation_decision will be set to true ; and if score is between 40 and 65, validation_indecision will be set to true.


Json View

Let's check what's inside view.json :

{
  "display": true,
  "column_name": "View",
  "fields": {
    "operation": "excluded",
    "names": ["distance_cities_of_birth"]
  }
}

If you set display to true, it will add a column on the left of your table with a call-to-action. When clicking it, it will open a pop-up with all informations concerning the row.

Good to know :

  • you can set operation field to excluded or included
  • all columns listed in names fields will be excluded/included to Json View (of course, names can be an empty array [])

Random Hash

Let's check what's inside randomId.json :

{
  "characters": "abcdef0123456789",
  "length": 2,
  "prefix": "*",
  "suffix": "*",
  "default_search_field": "hashed_hexadecimal"
}

The default_search_field will be the default field/used to query elasticsearch.

On this field, we will search for a random 2-length word (composed of following characters) with * as prefix and suffix. Example : *a4*


Localization

You can add several languages by customizing lang.json file.



Keyboard Shortcuts

Ctrl+ Alt will enable/disable the shortcuts.

Once enabled, the following ones are available :

  • a will change the pre-computed decision (validation_decision)
  • e will change the pre-computed indecision (validation_indecision)
  • i will randomly reload the data
  • d will show/hide a json view of row's data
  • Arrow UP and Arrow DOWN allow you to move between rows