Skip to content

project1 multi

Bae jiun, Maybe edited this page Oct 6, 2017 · 2 revisions

This page for documentation Project1: Signal::multi

Review

There's three versions of my implementation.

Features

How it works?

First of all, main idea is aho-corasick algorithm. But my implementation is little bit different. Check from below. And process using multi thread with new thread pool.

Aho-corasick Algorithm

It's duplicated in Project1::master Wiki

This for find matching patterns in various patterns from query. There is KMP Algorithm for find matching single pattern from query. But it takes O(m + zn) in set of patterns (m as total length of patterns, z as count of patterns and n as size of query). Aho-corasick can solve this problem in O(m + n + k) (k as number of patterns in query).

There is three major parts in Aho-corasick Algorithm.

  • Construct Trie for patterns.
    • Trace trie nodes and create node if there are no matches.
  • Make Failure Link and Output Link
    • Failure Link for prevent looking back from the beginning.
    • Output Link ensure that works correctly even when sub string exists.
  • Find matching
    • Follow the trie nodes, find the pattern if there is an output link.

My Implementation

Redesign for performance.

Design Principles: Reusable, readability and meaningful code, using high level features. (e.g. auto, lambda)

  • Redesign well organized structure
    • Aho-corasick automata
    • Define Final, Normal and Init states
    • Tracer for trace requested query from map (It's like a awesome iterator)
    • Change Trie structure to 2d array (maximum state size × char size(a-z))
    • Operator for solve assignment
  • Design efficient add / deletion algorithm
  • Multi thread Implementation
    • new Thread pool
    • High compatibility with lambda with support C++14 standard
    • Resolve race condition
    • Minimize shared memory
    • Implement thread safe queue

Process

  1. Make 2d array for Aho-corasick accept automata.
    1. 0 is Init State, under 0 is Final State, over 0 is Normal State (access like this)
    2. From Init State, trace char of pattern and make node (see Map::_insert)
  2. Process add / deletion.
    1. There is O(n) addition / deletion. (much faster than master implementation)
    2. Processing addition / deletion causes automata instability.
    3. In addition, next return free state for new node and does not affect anything else, so stable.
    4. In deletion, pop erase state only dependent on specific pattern.
  3. Thread pool separate tasks to thread
    1. Divide as the number of worker in pool
    2. Each worker find match for input query(each work has different query)
    3. Worker return found match to own queue
    4. Merge all of queues after pool is empty
  4. Find match.
    1. Map::begin return new Tracer accept input query.
    2. Find the Final State by increasing the Tracer until Out State

Classes

Map as Aho-corasick Automata

Map::Tracer as iterator for automata

Operator as problem solver

Thread::Pool

Thread::Safe::queue

Waiting contributes!

​you can comments and make merge requests to fix my codes. 😄