The TAPA Library (libtapa)

The Task Instantiation Library

task

struct task

Defines a parent task instantiating children task instances.

Canonical usage:

tapa::task()
  .invoke(...)
  .invoke(...)
  ...
  ;

A parent task itself does not do any computation. By default, a parent task will not finish until all its children task instances finish. Such children task instances are joined to their parent. The alternative is to detach a child from the parent. If a child task instance is instantiated and detached, the parent will no longer wait for the child task to finish. Detached tasks are very useful when infinite loops can be used.

Public Functions

task(): Constructs a tapa::task.

template<typename Func, typename ...Args> inline task &invoke(Func &&func, Args&&... args)

Invokes a task and instantiates a child task instance.

Parameters:

func – Task function definition of the instantiated child.
args – Arguments passed to func.

Returns:

Reference to the caller tapa::task.

template<int mode, typename Func, typename ...Args> inline task &invoke(Func &&func, Args&&... args)

Invokes a task and instantiates a child task instance with the given instatiation mode.

Template Parameters:

mode – Instatiation mode (join or detach).

Parameters:

func – Task function definition of the instantiated child.
args – Arguments passed to func.

Returns:

Reference to the caller tapa::task.

template<int mode, int n, typename Func, typename ...Args> inline task &invoke(Func &&func, Args&&... args)

Invokes a task n times and instantiates n child task instances with the given instatiation mode.

Template Parameters:

mode – Instatiation mode (join or detach).
n – Instatiation count.

Parameters:

func – Task function definition of the instantiated child.
args – Arguments passed to func.

Returns:

Reference to the caller tapa::task.

struct seq

Class that generates a sequence of integers as task arguments.

Canonical usage:

void TaskFoo(int i, ...) {
  ...
}
tapa::task()
  .invoke<3>(TaskFoo, tapa::seq(), ...)
  ...
  ;

TaskFoo will be invoked three times, receiving 0, 1, and 2 as the first argument, respectively.

Public Functions

seq() = default: Constructs a tapa::seq. This is the only public API.

The Streaming Library

A blocking operation blocks if the stream is not available (empty or full) until the stream becomes available.
A non-blocking operation always returns immediately.
A destructive operation changes the state of the stream.
A non-destructive operation does not change the state of the stream.

istream

template<typename T> class istream : public virtual tapa::internal::basic_stream<T>

Provides consumer-side operations to a tapa::stream where it is used as an input.

This class should only be used in task function parameters and should never be instatiated directly.

Subclassed by tapa::internal::unbound_stream< T >

Public Functions

inline bool empty() const

Tests whether the stream is empty.

This is a non-blocking and non-destructive operation.

Returns:: Whether the stream is empty.

inline bool try_eot(bool &is_eot) const

Tests whether the next token is EoT.

This is a non-blocking and non-destructive operation.

Parameters:: is_eot – [out] Uninitialized if the stream is empty. Otherwise, updated to indicate whether the next token is EoT.
Returns:: Whether is_eot is updated.

inline bool eot(bool &is_success) const

Tests whether the next token is EoT.

This is a non-blocking and non-destructive operation.

Parameters:: is_success – [out] Whether the next token is available.
Returns:: Whether the next token is available and is EoT.

inline bool eot(std::nullptr_t) const

Tests whether the next token is EoT.

This is a non-blocking and non-destructive operation.

Returns:: Whether the next token is available and is EoT.

inline bool try_peek(T &value) const

Peeks the stream.

This is a non-blocking and non-destructive operation.

The next token must not be EoT.

Parameters:: value – [out] Uninitialized if the stream is empty. Otherwise, updated to be the value of the next token.
Returns:: Whether value is updated.

inline T peek(bool &is_success) const

Peeks the stream.

This is a non-blocking and non-destructive operation.

The next token must not be EoT.

Parameters:: is_success – [out] Whether the next token is available.
Returns:: The value of the next token is returned if it is available. Otherwise, default-constructed T() is returned.

inline T peek(std::nullptr_t) const

Peeks the stream.

This is a non-blocking and non-destructive operation.

The next token must not be EoT.

Returns:: The value of the next token is returned if it is available. Otherwise, default-constructed T() is returned.

inline T peek(bool &is_success, bool &is_eot) const

Peeks the stream.

This is a non-blocking and non-destructive operation.

Parameters:

is_success – [out] Whether the next token is available.
is_eot – [out] Set to false if the stream is empty. Otherwise, updated to indicate whether the next token is EoT.

Returns:

The value of the next token is returned if it is available. Otherwise, default-constructed T() is returned.

inline bool try_read(T &value)

Reads the stream.

This is a non-blocking and destructive operation.

The next token must not be EoT.

Parameters:: value – [out] Uninitialized if the stream is empty. Otherwise, updated to be the value of the next token.
Returns:: Whether value is updated.

inline T read()

Reads the stream.

This is a blocking and destructive operation.

The next token must not be EoT.

Returns:: The value of the next token.

inline istream &operator>>(T &value)

Reads the stream.

This is a blocking and destructive operation.

The next token must not be EoT.

Parameters:: value – [out] The value of the next token.
Returns:: *this.

inline T read(bool &is_success)

Reads the stream.

This is a non-blocking and destructive operation.

The next token must not be EoT.

Parameters:: is_success – [out] Whether the next token is available.
Returns:: The value of the next token is returned if it is available. Otherwise, default-constructed T() is returned.

inline T read(std::nullptr_t)

Reads the stream.

This is a non-blocking and destructive operation.

The next token must not be EoT.

Returns:: The value of the next token is returned if it is available. Otherwise, default-constructed T() is returned.

inline T read(const T &default_value, bool *is_success = nullptr)

Reads the stream.

This is a non-blocking and destructive operation.

Parameters:

default_value – [in] Value to return if the stream is empty.
is_success – [out] Updated to indicate whether the next token is available if is_success is not nullptr.

Returns:

The value of the next token is returned if it is available. Otherwise, default_value is returned.

inline bool try_open()

Consumes an EoT token.

This is a non-blocking and destructive operation.

The next token must be EoT.

Returns:: Whether an EoT token is consumed.

inline void open()

Consumes an EoT token.

This is a blocking and destructive operation.

The next token must be EoT.

istreams

template<typename T, uint64_t S> class istreams : public virtual tapa::internal::basic_streams<T>

Provides consumer-side operations to an array of tapa::stream where they are used as inputs.

This class should only be used in task function parameters and should never be instatiated directly.

Subclassed by tapa::internal::unbound_streams< T, S >

Public Functions

inline istream<T> operator[](int pos) const

References a tapa::stream in the array.

Parameters:: pos – Position of the array reference.
Returns:: tapa::istream referenced in the array.

Public Static Attributes

static constexpr int length = S : Length of the tapa::stream array.

ostream

template<typename T> class ostream : public virtual tapa::internal::basic_stream<T>

Provides producer-side operations to a tapa::stream where it is used as an output.

This class should only be used in task function parameters and should never be instatiated directly.

Subclassed by tapa::internal::unbound_stream< T >

Public Functions

inline bool full() const

Tests whether the stream is full.

This is a non-blocking and non-destructive operation.

Returns:: Whether the stream is full.

inline bool try_write(const T &value)

Writes value to the stream.

This is a non-blocking and destructive operation.

Parameters:: value – [in] The value to write.
Returns:: Whether value has been written successfully.

inline void write(const T &value)

Writes value to the stream.

This is a blocking and destructive operation.

Parameters:: value – [in] The value to write.

inline ostream &operator<<(const T &value)

Writes value to the stream.

This is a blocking and destructive operation.

Parameters:: value – [in] The value to write.
Returns:: *this.

inline bool try_close()

Produces an EoT token to the stream.

This is a non-blocking and destructive operation.

Returns:: Whether the EoT token has been written successfully.

inline void close()

Produces an EoT token to the stream.

This is a blocking and destructive operation.

ostreams

template<typename T, uint64_t S> class ostreams : public virtual tapa::internal::basic_streams<T>

Provides producer-side operations to an array of tapa::stream where they are used as outputs.

This class should only be used in task function parameters and should never be instatiated directly.

Subclassed by tapa::internal::unbound_streams< T, S >

Public Functions

inline ostream<T> operator[](int pos) const

References a tapa::stream in the array.

Parameters:: pos – Position of the array reference.
Returns:: tapa::ostream referenced in the array.

Public Static Attributes

static constexpr int length = S : Length of the tapa::stream array.

stream

template<typename T, uint64_t N = kStreamDefaultDepth> class stream : public tapa::internal::unbound_stream<T>

Defines a communication channel between two task instances.

Public Functions

inline stream(): Constructs a tapa::stream.

template<size_t S> inline stream(const char (&name)[S])

Constructs a tapa::stream with the given name for debugging.

Parameters:: name – [in] Name of the communication channel (for debugging only).

Public Static Attributes

static constexpr int depth = N : Depth of the communication channel.

streams

template<typename T, uint64_t S, uint64_t N = kStreamDefaultDepth> class streams : public tapa::internal::unbound_streams<T, S>

Defines an array of tapa::stream.

Public Functions

inline streams(): Constructs a tapa::streams array.

template<size_t name_length> inline streams(const char (&name)[name_length])

Constructs a tapa::streams array with the given base name for debugging.

The actual name of each tapa::stream would be name[i].

Parameters:: name – [in] Base name of the streams (for debugging only).

inline stream<T, N> operator[](int pos) const

References a tapa::stream in the array.

Parameters:: pos – Position of the array reference.
Returns:: tapa::stream referenced in the array.

Public Static Attributes

static constexpr int length = S : Count of tapa::stream in the array.

static constexpr int depth = N : Depth of each tapa::stream in the array.

The MMAP Library

async_mmap

template<typename T> class async_mmap : public tapa::mmap<T>

Defines a view of a piece of consecutive memory with asynchronous random accesses.

Public Types

using addr_t = int64_t: Type of the addresses.

using resp_t = uint8_t: Type of the write responses.

Public Members

ostream<addr_t> read_addr

Provides access to the read address channel.

Each value written to this channel triggers an asynchronous memory read request. Consecutive requests may be coalesced into a long burst request.

istream<T> read_data

Provides access to the read data channel.

Each value read from this channel represents the data retrieved from the underlying memory system.

ostream<addr_t> write_addr

Provides access to the write address channel.

Each value written to this channel triggers an asynchronous memory write request. Consecutive requests may be coalesced into a long burst request.

ostream<T> write_data

Provides access to the write data channel.

Each value written to this channel supplies data to the memory write request.

istream<resp_t> write_resp

Provides access to the write response channel.

Each value read from this channel represents the data count acknowledged by the underlying memory system.

mmap

template<typename T> class mmap

Defines a view of a piece of consecutive memory with synchronous random accesses.

Subclassed by tapa::async_mmap< T >, tapa::hmap< T, chan_count, chan_size >

Public Functions

inline explicit mmap(T *ptr)

Constructs a tapa::mmap with unknown size.

Parameters:: ptr – Pointer to the start of the mapped memory.

inline mmap(T *ptr, uint64_t size)

Constructs a tapa::mmap with the given size.

Parameters:

ptr – Pointer to the start of the mapped memory.
size – Size of the mapped memory (in unit of element count).

template<typename Container> inline explicit mmap(Container &container)

Constructs a tapa::mmap from the given container.

Parameters:: container – Container holding a tapa::mmap. Must implement data() and size().

inline operator T*()

Implicitly casts to a regular pointer.

tapa::mmap should be used just like a pointer in the kernel.

inline mmap &operator++()

Increments the start of the mapped memory.

Returns:: The incremented tapa::mmap.

inline mmap &operator--()

Decrements the start of the mapped memory.

Returns:: The decremented tapa::mmap.

inline mmap operator++(int)

Increments the start of the mapped memory.

Returns:: The tapa::mmap before incrementation.

inline mmap operator--(int)

Decrements the start of the mapped memory.

Returns:: The tapa::mmap before decrementation.

inline T *get() const

Retrieves the start of the mapped memory.

This should be used on the host only.

Returns:: The start of the mapped memory.

inline uint64_t size() const

Retrieves the size of the mapped memory.

This should be used on the host only.

Returns:: The size of the mapped memory (in unit of element count).

template<uint64_t N> inline mmap<vec_t<T, N>> vectorized() const

Reinterprets the element type of the mapped memory as tapa::vec_t<T, N>.

This should be used on the host only. The size of mapped memory must be a multiple of N.

Template Parameters:: N – Vector length of the new element type.
Returns:: tapa::mmap of the same piece of memory but of type tapa::vec_t<T, N>.

template<typename U> inline mmap<U> reinterpret() const

Reinterprets the element type of the mapped memory as U.

This should be used on the host only. Both T and U must have standard layout. The host memory pointer must be properly aligned. If sizeof(U) > sizeof(T), the size of mapped memory must be a multiple of sizeof(U)/sizeof(T) (which must be an integer itself). If sizeof(U) < sizeof(T), sizeof(T) must be a multiple of sizeof(U).

Template Parameters:: U – The new element type.
Returns:: tapa::mmap<U> of the same piece of memory.

mmaps

template<typename T, uint64_t S> class mmaps

Defines an array of tapa::mmap.

Public Functions

template<typename PtrContainer, typename SizeContainer> inline mmaps(const PtrContainer &pointers, const SizeContainer &sizes)

Constructs a tapa::mmap array from the given pointers and sizes.

Parameters:

ptrs – Pointers to the start of the array of mapped memory.
sizes – Sizes of each mapped memory (in unit of element count).

template<typename Container> inline explicit mmaps(Container &container)

Constructs a tapa::mmap array from the given container.

Parameters:: container – Container holding an array of tapa::mmap. container must implement operator[] that returns a container suitable for constructing a tapa::mmap.

inline mmap<T> &operator[](int idx): References a tapa::mmap in the array.

template<uint64_t N> inline mmaps<vec_t<T, N>, S> vectorized() const

Reinterprets the element type of each mapped memory as tapa::vec_t<T, N>.

This should be used on the host only. The size of each mapped memory must be a multiple of N.

Template Parameters:: N – Vector length of the new element type.
Returns:: tapa::mmap of the same pieces of memory but of type tapa::vec_t<T, N>.

template<typename U> inline mmaps<U, S> reinterpret() const

Reinterprets the element type of each mapped memory as U.

This should be used on the host only. Both T and U must have standard layout. The host memory pointers must be properly aligned. If sizeof(U) > sizeof(T) , the size of each mapped memory must be a multiple of sizeof(U)/sizeof(T) (which must be an integer itself). If sizeof(U) < sizeof(T) , sizeof(T) must be a multiple of sizeof(N).

Template Parameters:: U – The new element type.
Returns:: tapa::mmaps<U, S> of the same pieces of memory.

The Utility Library

widthof

template<typename T> inline constexpr int tapa::widthof()

Queries width (in bits) of the type.

Template Parameters:: T – Type to be queried.
Returns:: T::width if it exists, sizeof(T) * CHAR_BIT otherwise.

template<typename T> inline constexpr int tapa::widthof(T object)

Queries width (in bits) of the object.

Note

Unlike sizeof, the argument expression is evaluated (though unused).

Template Parameters:: T – Type of object.
Parameters:: object – Object to be queried.
Returns:: T::width if it exists, sizeof(T) * CHAR_BIT otherwise.

The TAPA Compiler (tapac)

Compiles TAPA C++ code into packaged RTL.

usage: tapac [-h] [-V] [-v] [-q] [--tapacc file] [--work-dir dir]
             [--top TASK_NAME] [--clock-period CLOCK_PERIOD]
             [--part-num PART_NUM] [--platform PLATFORM] [--cflags CFLAGS]
             [-o file] [--other-hls-configs OTHER_HLS_CONFIGS] [--run-tapacc]
             [--run-hls] [--generate-task-rtl] [--run-floorplanning]
             [--generate-top-rtl] [--pack-xo] [--connectivity file]
             [--enable-floorplan] [--run-floorplan-dse]
             [--floorplan-dse-step INT] [--enable-hbm-binding-adjustment]
             [--floorplan-output file] [--constraint file]
             [--register-level INT] [--min-area-limit 0-1]
             [--max-area-limit 0-1]
             [--min-slr-width-limit MIN_SLR_WIDTH_LIMIT]
             [--max-slr-width-limit MAX_SLR_WIDTH_LIMIT]
             [--max-search-time MAX_SEARCH_TIME] [--enable-synth-util]
             [--disable-synth-util]
             [--max-parallel-synth-jobs MAX_PARALLEL_SYNTH_JOBS]
             [--floorplan-pre-assignments FLOORPLAN_PRE_ASSIGNMENTS]
             [--read-only-args READ_ONLY_ARGS]
             [--write-only-args WRITE_ONLY_ARGS]
             [--additional-fifo-pipelining]
             [--floorplan-strategy {QUICK_FLOORPLANNING,SLR_LEVEL_FLOORPLANNING,HALF_SLR_LEVEL_FLOORPLANNING}]
             [--floorplan-opt-priority {AREA_PRIORITIZED,SLR_CROSSING_PRIORITIZED}]
             file

Positional Arguments

file: Input file, usually TAPA C++ source code.

options

-V, --version

show program’s version number and exit

-v, --verbose

Increase logging verbosity.

-q, --quiet

Decrease logging verbosity.

--tapacc

Use a specific tapacc binary instead of searching in PATH.

--work-dir

Use a specific working directory instead of a temporary one.

--top

Name of the top-level task.

--clock-period

Target clock period in nanoseconds.

--part-num

Target FPGA part number.

--platform

Target Vitis platform.

--cflags

Compiler flags for the kernel, may appear many times.

Default: []

-o, --output

Output file.

--other-hls-configs

Additional compile options for Vitis HLS. E.g., –other-hls-configs “config_compile -unsafe_math_optimizations”

Default: “”

Compilation Steps

Selectively run compilation steps (advanced usage).

--run-tapacc: Run tapacc and create program.json.
--run-hls: Run HLS and generate RTL tarballs.
--generate-task-rtl: Generate the RTL for each task
--run-floorplanning: Floorplan the design.
--generate-top-rtl: Generate the RTL for the top-level task
--pack-xo: Package RTL as a Xilinx object file.

Floorplanning

Coarse-grained floorplanning via AutoBridge (advanced usage).

--connectivity

Input connectivity.ini specification for mmaps. This is the same file passed to v++.

--enable-floorplan

Enable the floorplanning step. This option could be skipped if the –floorplan-output option is given.

Default: False

--run-floorplan-dse

Generate multiple floorplan configurations

Default: False

--floorplan-dse-step

The minimal gap of slr_crossing_width between two design points.

Default: 500

--enable-hbm-binding-adjustment

Allow the top arguments to be binded to different physical ports based on the floorplan results. Overwrite the binding from the –connectivity option

Default: False

--floorplan-output

Specify the name of the output tcl file that encodes the floorplan results. If provided this option, floorplan will be enabled automatically.

--constraint

[deprecated] Specify the name of the output tcl file that encodes the floorplan results.

--register-level

Use a specific register level of top-level scalar signals instead of inferring from the floorplanning directive.

--min-area-limit

The floorplanner will try to find solution with the resource usage of each slot betweeen min-area-limit and max-area-limit

Default: 0.65

--max-area-limit

The floorplanner will try to find solution with the resource usage of each slot betweeen min-area-limit and max-area-limit

Default: 0.85

--min-slr-width-limit

The floorplanner will try to find solution with the number of SLR crossing wires of each die boundary betweeen min-slr-width-limit and max-slr-width-limit

Default: 10000

--max-slr-width-limit

The floorplanner will try to find solution with the number of SLR crossing wires of each die boundary betweeen min-slr-width-limit and max-slr-width-limit

Default: 15000

--max-search-time

The max runtime (in seconds) of each ILP solving process

Default: 600

--enable-synth-util

Enable post-synthesis resource utilization report for floorplanning.

Default: False

--disable-synth-util

Disable post-synthesis resource utilization report for floorplanning.

Default: True

--max-parallel-synth-jobs

Limit the number of parallel synthesize jobs if enable_synth_util is set

Default: 8

--floorplan-pre-assignments

Providing a json file of type Dict[str, List[str]] storing the manual assignments to be used in floorplanning. The key is the region name, the value is a list of modules.Replace the outdated –directive option.

--read-only-args

Optionally specify which mmap/async_mmap arguments of the top function are read-only. Regular expression supported.

Default: []

--write-only-args

Optionally specify which mmap/async_mmap arguments of the top function are write-only. Regular expression supported.

Default: []

Strategy

Choose different strategy in floorplanning and codegen (advanced usage).

--additional-fifo-pipelining

Pipelining a FIFO whose source and destination are in the same region

Default: False

--floorplan-strategy

Possible choices: QUICK_FLOORPLANNING, SLR_LEVEL_FLOORPLANNING, HALF_SLR_LEVEL_FLOORPLANNING

Override the automatic choosed floorplanning method. QUICK_FLOORPLANNING: use iterative bi-partitioning, which has the best scalability. Typically used for designs with hundreds of tasks. SLR_LEVEL_FLOORPLANNING: only partition the device into SLR level slots. Do not perform half-SLR-level floorplanning. HALF_SLR_LEVEL_FLOORPLANNING: partition the device into half-SLR level slots.

Default: “HALF_SLR_LEVEL_FLOORPLANNING”

--floorplan-opt-priority

Possible choices: AREA_PRIORITIZED, SLR_CROSSING_PRIORITIZED

AREA_PRIORITIZED: give priority to the area usage ratio of each slot. SLR_CROSSING_PRIORITIZED: give priority to the number of SLR crossing wires.

Default: “AREA_PRIORITIZED”