Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md for v1.0.0 #1100

Merged
merged 33 commits into from
Dec 9, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
cfad81f
Fix wrong note in README.md
bo3z Oct 29, 2024
d422659
Merge branch 'main' into update-readme
jmitrevs Nov 5, 2024
fabcf8c
update the project status
jmitrevs Nov 5, 2024
b844acf
restructure of existing documentation
jmitrevs Nov 5, 2024
88e84f3
add an internal layers section, and auto precision
jmitrevs Nov 5, 2024
6abc8ad
pre-commit fixes
jmitrevs Nov 5, 2024
7570c11
Merge remote-tracking branch 'upstream/main' into update-docs
vloncar Nov 26, 2024
09bbefb
Typo fixes
vloncar Nov 26, 2024
42cb368
Add video tutorial link
bo3z Dec 3, 2024
26f4eb2
Merge branch 'main' into update-readme
jmitrevs Dec 4, 2024
fedf790
respond to some review comments and update some descriptions
jmitrevs Dec 4, 2024
f28f364
fix documentation of channels_last conversion for pytorch
JanFSchulte Dec 5, 2024
e55b29c
slightly expand discussion of channels_last in pytorch
JanFSchulte Dec 5, 2024
99e3be0
update requirements
jmduarte Dec 5, 2024
96b530f
add pointwise documentation
jmduarte Dec 5, 2024
a7b6f79
update pointwise description
jmduarte Dec 5, 2024
135eaa2
Merge remote-tracking branch 'upstream/main' into update-readme
vloncar Dec 6, 2024
6af7fef
Add FAQ to docs and readme
vloncar Dec 6, 2024
eac61dd
Nicer link to the tutorial
vloncar Dec 6, 2024
c65e915
add doc strings to pytorch-specific padding calculation functions
JanFSchulte Dec 6, 2024
7cf4134
Merge branch 'update-readme' of https://github.com/fastmachinelearnin…
JanFSchulte Dec 6, 2024
4fc1ea9
clarify default for channels last conversion in pytorch
JanFSchulte Dec 6, 2024
548c462
Restructure documentation
vloncar Dec 6, 2024
4da52a4
bump version to 1.0.0
jmduarte Dec 6, 2024
6959c71
remove obsolete file references
jmitrevs Dec 6, 2024
47d7435
add a touch of text on the backends
jmitrevs Dec 6, 2024
05f8a45
expand pytorch frontend documentation
JanFSchulte Dec 8, 2024
6f971eb
Merge branch 'main' into update-readme
JanFSchulte Dec 9, 2024
536c069
[pre-commit.ci] auto fixes from pre-commit hooks
pre-commit-ci[bot] Dec 9, 2024
d9d09e0
typos in pytorch frontend documentation
JanFSchulte Dec 9, 2024
16c4055
Merge branch 'update-readme' of https://github.com/fastmachinelearnin…
JanFSchulte Dec 9, 2024
e69a392
improve description of brevtias -> QONNX -> hlsm4l workflow
JanFSchulte Dec 9, 2024
896951a
Add docs on BramFactor
vloncar Dec 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ hls_model = hls4ml.converters.keras_to_hls(config)
hls4ml.utils.fetch_example_list()
```

### Building a project with Xilinx Vivado HLS (after downloading and installing from [here](https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html))
Note: Vitis HLS is not yet supported. Vivado HLS versions between 2018.2 and 2020.1 are recommended.
### Building a project.
We will build the project using Xilinx Vivado HLS, which can be downloaded and installed from [here](https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html). Alongside Vivado HLS, hls4ml also supports Vitis HLS, Intel HLS, Catapult HLS and has some experimental support dor Intel oneAPI. The target back-end can be changed using the argument backend when building the model.

```Python
# Use Vivado HLS to synthesize the model
Expand Down
2 changes: 1 addition & 1 deletion docs/command.rst → docs/advanced/command.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ hls4ml config

hls4ml config [-h] [-m MODEL] [-w WEIGHTS] [-o OUTPUT]

This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <setup>` page for more details on how to write a configuration file.
This creates a conversion configuration file. Visit Configuration section of the :doc:`Setup <../setup>` page for more details on how to write a configuration file.

**Arguments**

Expand Down
File renamed without changes.
16 changes: 8 additions & 8 deletions docs/advanced/oneapi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,17 @@ oneAPI Backend
==============

The ``oneAPI`` backend of hls4ml is designed for deploying NNs on Intel/Altera FPGAs. It will eventually
replace the ``Quartus`` backend, which should really have been called the Intel HLS backend. (The actual Quartus
program continues to be used with IP produced by the ``oneAPI`` backend.)
This section discusses details of the ``oneAPI`` backend.
replace the ``Quartus`` backend, which targeted Intel HLS. (Quartus continues to be used with IP produced by the
``oneAPI`` backend.) This section discusses details of the ``oneAPI`` backend.

The ``oneAPI`` code uses SYCL kernels to implement the logic that is deployed on FPGAs. It naturally leads to the
accelerator style of programming. In the IP Component flow, which is currently the only flow supported, the
accelerator style of programming. In the SYCL HLS (IP Component) flow, which is currently the only flow supported, the
kernel becomes the IP, and the "host code" becomes the testbench. An accelerator flow, with easier deployment on
PCIe accelerator boards, is planned to be added in the future.

The produced work areas use cmake to build the projects in a style based
`oneAPI-samples <https://github.com/oneapi-src/oneAPI-samples/tree/main/DirectProgramming/C%2B%2BSYCL_FPGA>`_.
The standard ``fpga_emu``, ``report``, ``fpga_sim``, and ``fpga`` are supported. Additionally, ``make lib``
The standard ``fpga_emu``, ``report``, ``fpga_sim``, and ``fpga`` make targets are supported. Additionally, ``make lib``
produces the library used for calling the ``predict`` function from hls4ml. The ``compile`` and ``build`` commands
in hls4ml interact with the cmake system, so one does not need to manually use the build system, but it there
if desired.
Expand All @@ -30,6 +29,7 @@ io_parallel and io_stream
As mentioned in the :ref:`I/O Types` section, ``io_parallel`` is for small models, while ``io_stream`` is for
larger models. In ``oneAPI``, there is an additional difference: ``io_stream`` implements each layer on its
own ``task_sequence``. Thus, the layers run in parallel, with pipes connecting the inputs and outputs. This
is similar in style to the `dataflow` implementation on Vitis, but more explicit. On the other hand, ``io_parallel``
always uses a single task, relying on pipelining within the task for good performance. In contrast, the Vitis
backend sometimes uses dataflow with ``io_parallel``.
is similar in style to the `dataflow` implementation on Vitis HLS, but more explicit. It is also a change
relative to the Intel HLS-based ``Quartus`` backend. On the other hand, ``io_parallel`` always uses a single task,
relying on pipelining within the task for good performance. In contrast, the Vitis backend sometimes uses dataflow
with ``io_parallel``.
16 changes: 16 additions & 0 deletions docs/api/auto.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
=============================
Automatic precision inference
=============================

The automatic precision inference (implemented in :py:class:`~hls4ml.model.optimizer.passes.infer_precision.InferPrecisionTypes`) attempts to infer the appropriate widths for a given precision.
It is initiated by configuring a precision in the configuration as 'auto'. Functions like :py:class:`~hls4ml.utils.config.config_from_keras_model` and :py:class:`~hls4ml.utils.config.config_from_onnx_model`
automatically set most precisions to 'auto' if the ``'name'`` granularity is used.

.. note::
It is recommended to pass the backend to the ``config_from_*`` functions so that they can properly extract all the configurable precisions.

The approach taken by the precision inference is to set accumulator and other precisions to never truncate, using only the bitwidths of the inputs (not the values). This is quite conservative,
especially in cases where post-training quantization is used, or if the bit widths were set fairly loosely. The recommended action in that case is to edit the configuration and explicitly set
some widths in it, potentially in an iterative process after seeing what precisions are automatically set. Another option, currently implemented in :py:class:`~hls4ml.utils.config.config_from_keras_model`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can updated now since we added it to QONNX and pytorch as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this to not say that it's only for keras models.

is to pass a maximum bitwidth using the ``max_precison`` option. Then the automatic precision inference will never set a bitwdith larger than the bitwidth or an integer part larger than the integer part of
the ``max_precision`` that is passed. (The bitwidth and integer parts are treated separately.)
4 changes: 2 additions & 2 deletions docs/api/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ This python dictionary can be edited as needed. A more advanced configuration ca
default_precision='fixed<16,6>',
backend='Vitis')

This will include per-layer configuration based on the model. Including the backend is recommended because some configation options depend on the backend. Note, the precisions at the
higher granularites usually default to 'auto', which means that ``hls4ml`` will try to set it automatically. Note that higher granularity settings take precendence
This will include per-layer configuration based on the model. Including the backend is recommended because some configuration options depend on the backend. Note, the precisions at the
higher granularites usually default to 'auto', which means that ``hls4ml`` will try to set it automatically (see :ref:`Automatic precision inference`). Note that higher granularity settings take precedence
over model-level settings. See :py:class:`~hls4ml.utils.config.config_from_keras_model` for more information on the various options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we generalize this part of the documentation to not just feature the keras parser as an example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to do this, though feel free to edit.


One can override specific values before using the configuration:
Expand Down
File renamed without changes.
20 changes: 16 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,40 @@
status
setup
release_notes
details
flows
command
reference

.. toctree::
:hidden:
:glob:
:caption: Quick API Reference

api/*
api/configuration
api/auto
api/details
api/hls-model
api/profiling

.. toctree::
:hidden:
:glob:
:caption: Internal Layers

ir/dense
ir/activations
ir/conv

.. toctree::
:hidden:
:caption: Advanced Features

advanced/flows
advanced/qonnx
advanced/fifo_depth
advanced/extension
advanced/oneapi
advanced/accelerator
advanced/model_optimization
advanced/command

.. toctree::
:hidden:
Expand Down
14 changes: 14 additions & 0 deletions docs/ir/activations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
===========
Activations
===========

Most activations without extra parameters are represented with the ``Activation`` layer, and those with single parameters (leaky ReLU, thresholded ReLU, ELU) as ``ParametrizedActivation``.
``PReLU`` has its own class because it has a parameter matrix (stored as a weight). The hard (piecewise linear) sigmoid and tanh functions are implemented in a ``HardActivation`` layer,
and ``Softmax`` has its own layer class.

Softmax has four implementations that the user can choose from by setting the ``implementation`` parameter:

* **latency**: Good latency, but somewhat high resource usage. It does not work well if there are many output classes.
* **stable**: Slower but with better accuracy, useful in scenarios where higher accuracy is needed.
* **legacy**: An older implementation with poor accuracy, but good performance. Usually the latency implementation is preferred.
* **argmax**: If you don't care about normalized outputs and only care about which one has the highest value, using argmax saves a lot of resources. This sets the highest value to 1, the others to 0.
32 changes: 32 additions & 0 deletions docs/ir/conv.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
==================
Convolution Layers
==================

Standard convolutions
=====================

These are the standard 1D and 2D convolutions currently supported by hls4ml, and the fallback if there is no special pointwise implementation.

io_parallel
-----------

Parallel convolutions are for cases where the model needs to be small and fast, though synthesizability limits can be quickly reached. Also note that skip connections
are not supported in io_parallel.

For the Xilinx backends and Catapult, there is a very direct convolution implementation when using the ``Latency`` strategy. This is only for very small models because the
high number of nested loops. The ``Resource`` strategy in all cases defaults to an algorithm using the *im2col* transformation. This generally supports larger models. The ``Quartus``,
``oneAPI``, and ``Catapult`` backends also implement a ``Winograd`` algorithm choosable by setting the ``implementation`` to ``Winograd`` or ``combination``. Note that
the Winograd implementation is available for only a handful of filter size configurations, and it is less concerned about bit accuracy and overflow, but it can be faster.

io_stream
---------

There are two main classes of io_stream implementations, ``LineBuffer`` and ``Encoded``. ``LineBuffer`` is always the default, and generally produces marginally better results,
while ``Catapult`` and ``Vivado`` also implement ``Encoded``, choosable with the ``convImplementation`` configuration option. In all cases, the data is processed serially, one pixel
at a time, with a pixel containing an array of all the channel values for the pixel.

Depthwise convolutions
======================

Pointwise convolutions
======================
25 changes: 25 additions & 0 deletions docs/ir/dense.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
============
Dense Layers
============

One-dimensional Dense Layers
============================

One-dimensional dense layers implement a matrix multiply and bias add. The produced code is also used by other layers to implement the matrix multiplication.


io_parallel
-----------

All the backends implement a ``Resource`` implementation, which explicitly iterates over the reuse factor. There are different implementations depending on whether the reuse factor is
smaller or bigger than the input size. The two Xilinx backends and Catapult also implement a ``Latency`` implementation, which only uses the reuse factor in pragmas.

io_stream
---------

The io_stream implementation only wraps the io_parallel implementation with streams or pipes for communication. The data is still transferred in parallel.

Multi-dimensional Dense Layers
==============================

Multi-dimensional Dense layers are converted to pointwise convolutions, and do not directly use the above implementation
94 changes: 51 additions & 43 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,30 @@ version can be installed directly from ``git``:
Dependencies
============

The ``hls4ml`` library depends on a number of Python packages and external tools for synthesis and simulation. Python dependencies are automatically managed
The ``hls4ml`` library requires python 3.10 or later, and depends on a number of Python packages and external tools for synthesis and simulation. Python dependencies are automatically managed
by ``pip`` or ``conda``.

* `TensorFlow <https://pypi.org/project/tensorflow/>`_ (version 2.4 and newer) and `QKeras <https://pypi.org/project/qkeras/>`_ are required by the Keras converter.
* `TensorFlow <https://pypi.org/project/tensorflow/>`_ (version 2.8 to 2.14) and `QKeras <https://pypi.org/project/qkeras/>`_ are required by the Keras converter. One may want to install newer versions of QKeras from GitHub. Newer versions of TensorFlow can be used, but QKeras and hl4ml do not currently support Keras v3.

* `ONNX <https://pypi.org/project/onnx/>`_ (version 1.4.0 and newer) is required by the ONNX converter.

* `PyTorch <https://pytorch.org/get-started>`_ package is optional. If not installed, the PyTorch converter will not be available.

Running C simulation from Python requires a C++11-compatible compiler. On Linux, a GCC C++ compiler ``g++`` is required. Any version from a recent
Linux should work. On MacOS, the *clang*-based ``g++`` is enough.
Linux should work. On MacOS, the *clang*-based ``g++`` is enough. For the oneAPI backend, one must have oneAPI installed, along with the FPGA compiler,
to run C/SYCL simulations.

To run FPGA synthesis, installation of following tools is required:

* Xilinx Vivado HLS 2018.2 to 2020.1 for synthesis for Xilinx FPGAs
* Xilinx Vivado HLS 2018.2 to 2020.1 for synthesis for Xilinx FPGAs using the ``Vivado`` backend.

* Vitis HLS 2022.2 or newer is required for synthesis for Xilinx FPGAs using the ``Vitis`` backend.

* Vitis HLS 2022.2 or newer is required for synthesis for Xilinx FPGAs using the ``Vitis`` backend.
* Intel Quartus 20.1 to 21.4 for the synthesis for Intel/Altera FPGAs using the ``Quartus`` backend.

* Intel Quartus 20.1 to 21.4 for the synthesis for Intel FPGAs
* oneAPI 2024.1 to 2025.0 with the FPGA compiler and recent Intel/Altara Quartus for Intel/Altera FPGAs using the ``oneAPI`` backend.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Altara

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


Catapult HLS 2024.1_1 or 2024.2 can be used to synthesize both for ASICs and FPGAs.


Quick Start
Expand Down Expand Up @@ -100,76 +107,77 @@ Done! You've built your first project using ``hls4ml``! To learn more about our

If you want to configure your model further, check out our :doc:`Configuration <api/configuration>` page.

Apart from our main API, we also support model conversion using a command line interface, check out our next section to find out more:
..
Apart from our main API, we also support model conversion using a command line interface, check out our next section to find out more:

Getting started with hls4ml CLI (deprecated)
--------------------------------------------
Getting started with hls4ml CLI (deprecated)
--------------------------------------------

As an alternative to the recommended Python PI, the command-line interface is provided via the ``hls4ml`` command.
As an alternative to the recommended Python PI, the command-line interface is provided via the ``hls4ml`` command.

To follow this tutorial, you must first download our ``example-models`` repository:
To follow this tutorial, you must first download our ``example-models`` repository:

.. code-block:: bash
.. code-block:: bash

git clone https://github.com/fastmachinelearning/example-models
git clone https://github.com/fastmachinelearning/example-models

Alternatively, you can clone the ``hls4ml`` repository with submodules
Alternatively, you can clone the ``hls4ml`` repository with submodules

.. code-block:: bash
.. code-block:: bash

git clone --recurse-submodules https://github.com/fastmachinelearning/hls4ml
git clone --recurse-submodules https://github.com/fastmachinelearning/hls4ml

The model files, along with other configuration parameters, are defined in the ``.yml`` files.
Further information about ``.yml`` files can be found in :doc:`Configuration <api/configuration>` page.
The model files, along with other configuration parameters, are defined in the ``.yml`` files.
Further information about ``.yml`` files can be found in :doc:`Configuration <api/configuration>` page.

In order to create an example HLS project, first go to ``example-models/`` from the main directory:
In order to create an example HLS project, first go to ``example-models/`` from the main directory:

.. code-block:: bash
.. code-block:: bash

cd example-models/
cd example-models/

And use this command to translate a Keras model:
And use this command to translate a Keras model:

.. code-block:: bash
.. code-block:: bash

hls4ml convert -c keras-config.yml
hls4ml convert -c keras-config.yml

This will create a new HLS project directory with an implementation of a model from the ``example-models/keras/`` directory.
To build the HLS project, do:
This will create a new HLS project directory with an implementation of a model from the ``example-models/keras/`` directory.
To build the HLS project, do:

.. code-block:: bash
.. code-block:: bash

hls4ml build -p my-hls-test -a
hls4ml build -p my-hls-test -a

This will create a Vivado HLS project with your model implementation!
This will create a Vivado HLS project with your model implementation!

**NOTE:** For the last step, you can alternatively do the following to build the HLS project:
**NOTE:** For the last step, you can alternatively do the following to build the HLS project:

.. code-block:: Bash
.. code-block:: Bash

cd my-hls-test
vivado_hls -f build_prj.tcl
cd my-hls-test
vivado_hls -f build_prj.tcl

``vivado_hls`` can be controlled with:
``vivado_hls`` can be controlled with:

.. code-block:: bash
.. code-block:: bash

vivado_hls -f build_prj.tcl "csim=1 synth=1 cosim=1 export=1 vsynth=1"
vivado_hls -f build_prj.tcl "csim=1 synth=1 cosim=1 export=1 vsynth=1"

Setting the additional parameters from ``1`` to ``0`` disables that step, but disabling ``synth`` also disables ``cosim`` and ``export``.
Setting the additional parameters from ``1`` to ``0`` disables that step, but disabling ``synth`` also disables ``cosim`` and ``export``.

Further help
^^^^^^^^^^^^
Further help
^^^^^^^^^^^^

* For further information about how to use ``hls4ml``\ , do: ``hls4ml --help`` or ``hls4ml -h``
* If you need help for a particular ``command``\ , ``hls4ml command -h`` will show help for the requested ``command``
* We provide a detailed documentation for each of the command in the :doc:`Command Help <../command>` section
* For further information about how to use ``hls4ml``\ , do: ``hls4ml --help`` or ``hls4ml -h``
* If you need help for a particular ``command``\ , ``hls4ml command -h`` will show help for the requested ``command``
* We provide a detailed documentation for each of the command in the :doc:`Command Help <advanced/command>` section

Existing examples
-----------------

* Examples of model files and weights can be found in `example_models <https://github.com/fastmachinelearning/example-models>`_ directory.
* Training codes and examples of resources needed to train the models can be found in the `tutorial <https://github.com/fastmachinelearning/hls4ml-tutorial>`__.
* Examples of model files and weights can be found in `example_models <https://github.com/fastmachinelearning/example-models>`_ directory.

Uninstalling
------------
Expand Down
Loading
Loading