The TAPA Library (libtapa)
The Task Instantiation Library
task
-
struct task
Defines a parent task instantiating children task instances.
Canonical usage:
tapa::task() .invoke(...) .invoke(...) ... ;
A parent task itself does not do any computation. By default, a parent task will not finish until all its children task instances finish. Such children task instances are joined to their parent. The alternative is to detach a child from the parent. If a child task instance is instantiated and detached, the parent will no longer wait for the child task to finish. Detached tasks are very useful when infinite loops can be used.
Public Functions
-
task()
Constructs a
tapa::task
.
-
template<typename Func, typename ...Args>
inline task &invoke(Func &&func, Args&&... args) Invokes a task and instantiates a child task instance.
- Parameters:
func – Task function definition of the instantiated child.
args – Arguments passed to
func
.
- Returns:
Reference to the caller
tapa::task
.
-
template<int mode, typename Func, typename ...Args>
inline task &invoke(Func &&func, Args&&... args) Invokes a task and instantiates a child task instance with the given instatiation mode.
- Template Parameters:
mode – Instatiation mode (
join
ordetach
).- Parameters:
func – Task function definition of the instantiated child.
args – Arguments passed to
func
.
- Returns:
Reference to the caller
tapa::task
.
-
template<int mode, int n, typename Func, typename ...Args>
inline task &invoke(Func &&func, Args&&... args) Invokes a task
n
times and instantiatesn
child task instances with the given instatiation mode.- Template Parameters:
mode – Instatiation mode (
join
ordetach
).n – Instatiation count.
- Parameters:
func – Task function definition of the instantiated child.
args – Arguments passed to
func
.
- Returns:
Reference to the caller
tapa::task
.
-
task()
-
struct seq
Class that generates a sequence of integers as task arguments.
Canonical usage:
void TaskFoo(int i, ...) { ... } tapa::task() .invoke<3>(TaskFoo, tapa::seq(), ...) ... ;
TaskFoo
will be invoked three times, receiving0
,1
, and2
as the first argument, respectively.
The Streaming Library
A blocking operation blocks if the stream is not available (empty or full) until the stream becomes available.
A non-blocking operation always returns immediately.
A destructive operation changes the state of the stream.
A non-destructive operation does not change the state of the stream.
istream
-
template<typename T>
class istream : public virtual tapa::internal::basic_stream<T> Provides consumer-side operations to a
tapa::stream
where it is used as an input.This class should only be used in task function parameters and should never be instatiated directly.
Subclassed by tapa::internal::unbound_stream< T >
Public Functions
-
inline bool empty() const
Tests whether the stream is empty.
This is a non-blocking and non-destructive operation.
- Returns:
Whether the stream is empty.
-
inline bool try_eot(bool &is_eot) const
Tests whether the next token is EoT.
This is a non-blocking and non-destructive operation.
- Parameters:
is_eot – [out] Uninitialized if the stream is empty. Otherwise, updated to indicate whether the next token is EoT.
- Returns:
Whether
is_eot
is updated.
-
inline bool eot(bool &is_success) const
Tests whether the next token is EoT.
This is a non-blocking and non-destructive operation.
- Parameters:
is_success – [out] Whether the next token is available.
- Returns:
Whether the next token is available and is EoT.
-
inline bool eot(std::nullptr_t) const
Tests whether the next token is EoT.
This is a non-blocking and non-destructive operation.
- Returns:
Whether the next token is available and is EoT.
-
inline bool try_peek(T &value) const
Peeks the stream.
This is a non-blocking and non-destructive operation.
The next token must not be EoT.
- Parameters:
value – [out] Uninitialized if the stream is empty. Otherwise, updated to be the value of the next token.
- Returns:
Whether
value
is updated.
-
inline T peek(bool &is_success) const
Peeks the stream.
This is a non-blocking and non-destructive operation.
The next token must not be EoT.
- Parameters:
is_success – [out] Whether the next token is available.
- Returns:
The value of the next token is returned if it is available. Otherwise, default-constructed
T()
is returned.
-
inline T peek(std::nullptr_t) const
Peeks the stream.
This is a non-blocking and non-destructive operation.
The next token must not be EoT.
- Returns:
The value of the next token is returned if it is available. Otherwise, default-constructed
T()
is returned.
-
inline T peek(bool &is_success, bool &is_eot) const
Peeks the stream.
This is a non-blocking and non-destructive operation.
- Parameters:
is_success – [out] Whether the next token is available.
is_eot – [out] Set to
false
if the stream is empty. Otherwise, updated to indicate whether the next token is EoT.
- Returns:
The value of the next token is returned if it is available. Otherwise, default-constructed
T()
is returned.
-
inline bool try_read(T &value)
Reads the stream.
This is a non-blocking and destructive operation.
The next token must not be EoT.
- Parameters:
value – [out] Uninitialized if the stream is empty. Otherwise, updated to be the value of the next token.
- Returns:
Whether
value
is updated.
-
inline T read()
Reads the stream.
This is a blocking and destructive operation.
The next token must not be EoT.
- Returns:
The value of the next token.
-
inline istream &operator>>(T &value)
Reads the stream.
This is a blocking and destructive operation.
The next token must not be EoT.
- Parameters:
value – [out] The value of the next token.
- Returns:
*this
.
-
inline T read(bool &is_success)
Reads the stream.
This is a non-blocking and destructive operation.
The next token must not be EoT.
- Parameters:
is_success – [out] Whether the next token is available.
- Returns:
The value of the next token is returned if it is available. Otherwise, default-constructed
T()
is returned.
-
inline T read(std::nullptr_t)
Reads the stream.
This is a non-blocking and destructive operation.
The next token must not be EoT.
- Returns:
The value of the next token is returned if it is available. Otherwise, default-constructed
T()
is returned.
-
inline T read(const T &default_value, bool *is_success = nullptr)
Reads the stream.
This is a non-blocking and destructive operation.
- Parameters:
default_value – [in] Value to return if the stream is empty.
is_success – [out] Updated to indicate whether the next token is available if
is_success
is notnullptr
.
- Returns:
The value of the next token is returned if it is available. Otherwise,
default_value
is returned.
-
inline bool try_open()
Consumes an EoT token.
This is a non-blocking and destructive operation.
The next token must be EoT.
- Returns:
Whether an EoT token is consumed.
-
inline void open()
Consumes an EoT token.
This is a blocking and destructive operation.
The next token must be EoT.
-
inline bool empty() const
istreams
-
template<typename T, uint64_t S>
class istreams : public virtual tapa::internal::basic_streams<T> Provides consumer-side operations to an array of
tapa::stream
where they are used as inputs.This class should only be used in task function parameters and should never be instatiated directly.
Subclassed by tapa::internal::unbound_streams< T, S >
Public Functions
-
inline istream<T> operator[](int pos) const
References a
tapa::stream
in the array.- Parameters:
pos – Position of the array reference.
- Returns:
tapa::istream
referenced in the array.
Public Static Attributes
-
static constexpr int length = S
Length of the
tapa::stream
array.
-
inline istream<T> operator[](int pos) const
ostream
-
template<typename T>
class ostream : public virtual tapa::internal::basic_stream<T> Provides producer-side operations to a
tapa::stream
where it is used as an output.This class should only be used in task function parameters and should never be instatiated directly.
Subclassed by tapa::internal::unbound_stream< T >
Public Functions
-
inline bool full() const
Tests whether the stream is full.
This is a non-blocking and non-destructive operation.
- Returns:
Whether the stream is full.
-
inline bool try_write(const T &value)
Writes
value
to the stream.This is a non-blocking and destructive operation.
- Parameters:
value – [in] The value to write.
- Returns:
Whether
value
has been written successfully.
-
inline void write(const T &value)
Writes
value
to the stream.This is a blocking and destructive operation.
- Parameters:
value – [in] The value to write.
-
inline ostream &operator<<(const T &value)
Writes
value
to the stream.This is a blocking and destructive operation.
- Parameters:
value – [in] The value to write.
- Returns:
*this
.
-
inline bool try_close()
Produces an EoT token to the stream.
This is a non-blocking and destructive operation.
- Returns:
Whether the EoT token has been written successfully.
-
inline void close()
Produces an EoT token to the stream.
This is a blocking and destructive operation.
-
inline bool full() const
ostreams
-
template<typename T, uint64_t S>
class ostreams : public virtual tapa::internal::basic_streams<T> Provides producer-side operations to an array of
tapa::stream
where they are used as outputs.This class should only be used in task function parameters and should never be instatiated directly.
Subclassed by tapa::internal::unbound_streams< T, S >
Public Functions
-
inline ostream<T> operator[](int pos) const
References a
tapa::stream
in the array.- Parameters:
pos – Position of the array reference.
- Returns:
tapa::ostream
referenced in the array.
Public Static Attributes
-
static constexpr int length = S
Length of the
tapa::stream
array.
-
inline ostream<T> operator[](int pos) const
stream
-
template<typename T, uint64_t N = kStreamDefaultDepth>
class stream : public tapa::internal::unbound_stream<T> Defines a communication channel between two task instances.
Public Functions
-
inline stream()
Constructs a
tapa::stream
.
-
template<size_t S>
inline stream(const char (&name)[S]) Constructs a
tapa::stream
with the given name for debugging.- Parameters:
name – [in] Name of the communication channel (for debugging only).
-
inline stream()
streams
-
template<typename T, uint64_t S, uint64_t N = kStreamDefaultDepth>
class streams : public tapa::internal::unbound_streams<T, S> Defines an array of
tapa::stream
.Public Functions
-
inline streams()
Constructs a
tapa::streams
array.
-
template<size_t name_length>
inline streams(const char (&name)[name_length]) Constructs a
tapa::streams
array with the given base name for debugging.The actual name of each
tapa::stream
would bename[i]
.- Parameters:
name – [in] Base name of the streams (for debugging only).
-
inline stream<T, N> operator[](int pos) const
References a
tapa::stream
in the array.- Parameters:
pos – Position of the array reference.
- Returns:
tapa::stream
referenced in the array.
Public Static Attributes
-
static constexpr int length = S
Count of
tapa::stream
in the array.
-
static constexpr int depth = N
Depth of each
tapa::stream
in the array.
-
inline streams()
The MMAP Library
async_mmap
-
template<typename T>
class async_mmap : public tapa::mmap<T> Defines a view of a piece of consecutive memory with asynchronous random accesses.
Public Types
-
using addr_t = int64_t
Type of the addresses.
-
using resp_t = uint8_t
Type of the write responses.
Public Members
-
ostream<addr_t> read_addr
Provides access to the read address channel.
Each value written to this channel triggers an asynchronous memory read request. Consecutive requests may be coalesced into a long burst request.
-
istream<T> read_data
Provides access to the read data channel.
Each value read from this channel represents the data retrieved from the underlying memory system.
-
ostream<addr_t> write_addr
Provides access to the write address channel.
Each value written to this channel triggers an asynchronous memory write request. Consecutive requests may be coalesced into a long burst request.
-
using addr_t = int64_t
mmap
-
template<typename T>
class mmap Defines a view of a piece of consecutive memory with synchronous random accesses.
Subclassed by tapa::async_mmap< T >, tapa::hmap< T, chan_count, chan_size >
Public Functions
-
inline explicit mmap(T *ptr)
Constructs a
tapa::mmap
with unknown size.- Parameters:
ptr – Pointer to the start of the mapped memory.
-
inline mmap(T *ptr, uint64_t size)
Constructs a
tapa::mmap
with the givensize
.- Parameters:
ptr – Pointer to the start of the mapped memory.
size – Size of the mapped memory (in unit of element count).
-
template<typename Container>
inline explicit mmap(Container &container) Constructs a
tapa::mmap
from the givencontainer
.- Parameters:
container – Container holding a
tapa::mmap
. Must implementdata()
andsize()
.
-
inline operator T*()
Implicitly casts to a regular pointer.
tapa::mmap
should be used just like a pointer in the kernel.
-
inline mmap &operator++()
Increments the start of the mapped memory.
- Returns:
The incremented
tapa::mmap
.
-
inline mmap &operator--()
Decrements the start of the mapped memory.
- Returns:
The decremented
tapa::mmap
.
-
inline mmap operator++(int)
Increments the start of the mapped memory.
- Returns:
The
tapa::mmap
before incrementation.
-
inline mmap operator--(int)
Decrements the start of the mapped memory.
- Returns:
The
tapa::mmap
before decrementation.
-
inline T *get() const
Retrieves the start of the mapped memory.
This should be used on the host only.
- Returns:
The start of the mapped memory.
-
inline uint64_t size() const
Retrieves the size of the mapped memory.
This should be used on the host only.
- Returns:
The size of the mapped memory (in unit of element count).
-
template<uint64_t N>
inline mmap<vec_t<T, N>> vectorized() const Reinterprets the element type of the mapped memory as
tapa::vec_t<T, N>
.This should be used on the host only. The size of mapped memory must be a multiple of
N
.- Template Parameters:
N – Vector length of the new element type.
- Returns:
tapa::mmap
of the same piece of memory but of typetapa::vec_t<T, N>
.
-
template<typename U>
inline mmap<U> reinterpret() const Reinterprets the element type of the mapped memory as
U
.This should be used on the host only. Both
T
andU
must have standard layout. The host memory pointer must be properly aligned. Ifsizeof(U)
>sizeof(T)
, the size of mapped memory must be a multiple ofsizeof(U)/sizeof
(T) (which must be an integer itself). Ifsizeof(U)
<sizeof(T)
,sizeof(T)
must be a multiple ofsizeof(U)
.- Template Parameters:
U – The new element type.
- Returns:
tapa::mmap<U>
of the same piece of memory.
-
inline explicit mmap(T *ptr)
mmaps
-
template<typename T, uint64_t S>
class mmaps Defines an array of
tapa::mmap
.Public Functions
-
template<typename PtrContainer, typename SizeContainer>
inline mmaps(const PtrContainer &pointers, const SizeContainer &sizes) Constructs a
tapa::mmap
array from the givenpointers
andsizes
.- Parameters:
ptrs – Pointers to the start of the array of mapped memory.
sizes – Sizes of each mapped memory (in unit of element count).
-
template<typename Container>
inline explicit mmaps(Container &container) Constructs a
tapa::mmap
array from the givencontainer
.- Parameters:
container – Container holding an array of
tapa::mmap
.container
must implementoperator
[] that returns a container suitable for constructing atapa::mmap
.
-
inline mmap<T> &operator[](int idx)
References a
tapa::mmap
in the array.
-
template<uint64_t N>
inline mmaps<vec_t<T, N>, S> vectorized() const Reinterprets the element type of each mapped memory as
tapa::vec_t<T, N>
.This should be used on the host only. The size of each mapped memory must be a multiple of
N
.- Template Parameters:
N – Vector length of the new element type.
- Returns:
tapa::mmap
of the same pieces of memory but of typetapa::vec_t<T, N>
.
-
template<typename U>
inline mmaps<U, S> reinterpret() const Reinterprets the element type of each mapped memory as
U
.This should be used on the host only. Both
T
andU
must have standard layout. The host memory pointers must be properly aligned. Ifsizeof(U)
>sizeof(T)
, the size of each mapped memory must be a multiple ofsizeof(U)/sizeof
(T) (which must be an integer itself). Ifsizeof(U)
<sizeof(T)
,sizeof(T)
must be a multiple ofsizeof(N)
.- Template Parameters:
U – The new element type.
- Returns:
tapa::mmaps<U, S>
of the same pieces of memory.
-
template<typename PtrContainer, typename SizeContainer>
The Utility Library
widthof
-
template<typename T>
inline constexpr int tapa::widthof() Queries width (in bits) of the type.
- Template Parameters:
T – Type to be queried.
- Returns:
T::width
if it exists,sizeof(T) * CHAR_BIT
otherwise.
-
template<typename T>
inline constexpr int tapa::widthof(T object) Queries width (in bits) of the object.
Note
Unlike
sizeof
, the argument expression is evaluated (though unused).- Template Parameters:
T – Type of
object
.- Parameters:
object – Object to be queried.
- Returns:
T::width
if it exists,sizeof(T) * CHAR_BIT
otherwise.
The TAPA Compiler (tapac)
Compiles TAPA C++ code into packaged RTL.
usage: tapac [-h] [-V] [-v] [-q] [--tapacc file] [--work-dir dir]
[--top TASK_NAME] [--clock-period CLOCK_PERIOD]
[--part-num PART_NUM] [--platform PLATFORM] [--cflags CFLAGS]
[-o file] [--other-hls-configs OTHER_HLS_CONFIGS] [--run-tapacc]
[--run-hls] [--generate-task-rtl] [--run-floorplanning]
[--generate-top-rtl] [--pack-xo] [--connectivity file]
[--enable-floorplan] [--run-floorplan-dse]
[--floorplan-dse-step INT] [--enable-hbm-binding-adjustment]
[--floorplan-output file] [--constraint file]
[--register-level INT] [--min-area-limit 0-1]
[--max-area-limit 0-1]
[--min-slr-width-limit MIN_SLR_WIDTH_LIMIT]
[--max-slr-width-limit MAX_SLR_WIDTH_LIMIT]
[--max-search-time MAX_SEARCH_TIME] [--enable-synth-util]
[--disable-synth-util]
[--max-parallel-synth-jobs MAX_PARALLEL_SYNTH_JOBS]
[--floorplan-pre-assignments FLOORPLAN_PRE_ASSIGNMENTS]
[--read-only-args READ_ONLY_ARGS]
[--write-only-args WRITE_ONLY_ARGS]
[--additional-fifo-pipelining]
[--floorplan-strategy {QUICK_FLOORPLANNING,SLR_LEVEL_FLOORPLANNING,HALF_SLR_LEVEL_FLOORPLANNING}]
[--floorplan-opt-priority {AREA_PRIORITIZED,SLR_CROSSING_PRIORITIZED}]
file
Positional Arguments
- file
Input file, usually TAPA C++ source code.
options
- -V, --version
show program’s version number and exit
- -v, --verbose
Increase logging verbosity.
- -q, --quiet
Decrease logging verbosity.
- --tapacc
Use a specific
tapacc
binary instead of searching inPATH
.- --work-dir
Use a specific working directory instead of a temporary one.
- --top
Name of the top-level task.
- --clock-period
Target clock period in nanoseconds.
- --part-num
Target FPGA part number.
- --platform
Target Vitis platform.
- --cflags
Compiler flags for the kernel, may appear many times.
Default: []
- -o, --output
Output file.
- --other-hls-configs
Additional compile options for Vitis HLS. E.g., –other-hls-configs “config_compile -unsafe_math_optimizations”
Default: “”
Compilation Steps
Selectively run compilation steps (advanced usage).
- --run-tapacc
Run
tapacc
and createprogram.json
.- --run-hls
Run HLS and generate RTL tarballs.
- --generate-task-rtl
Generate the RTL for each task
- --run-floorplanning
Floorplan the design.
- --generate-top-rtl
Generate the RTL for the top-level task
- --pack-xo
Package RTL as a Xilinx object file.
Floorplanning
Coarse-grained floorplanning via AutoBridge (advanced usage).
- --connectivity
Input
connectivity.ini
specification for mmaps. This is the same file passed tov++
.- --enable-floorplan
Enable the floorplanning step. This option could be skipped if the –floorplan-output option is given.
Default: False
- --run-floorplan-dse
Generate multiple floorplan configurations
Default: False
- --floorplan-dse-step
The minimal gap of slr_crossing_width between two design points.
Default: 500
- --enable-hbm-binding-adjustment
Allow the top arguments to be binded to different physical ports based on the floorplan results. Overwrite the binding from the –connectivity option
Default: False
- --floorplan-output
Specify the name of the output tcl file that encodes the floorplan results. If provided this option, floorplan will be enabled automatically.
- --constraint
[deprecated] Specify the name of the output tcl file that encodes the floorplan results.
- --register-level
Use a specific register level of top-level scalar signals instead of inferring from the floorplanning directive.
- --min-area-limit
The floorplanner will try to find solution with the resource usage of each slot betweeen min-area-limit and max-area-limit
Default: 0.65
- --max-area-limit
The floorplanner will try to find solution with the resource usage of each slot betweeen min-area-limit and max-area-limit
Default: 0.85
- --min-slr-width-limit
The floorplanner will try to find solution with the number of SLR crossing wires of each die boundary betweeen min-slr-width-limit and max-slr-width-limit
Default: 10000
- --max-slr-width-limit
The floorplanner will try to find solution with the number of SLR crossing wires of each die boundary betweeen min-slr-width-limit and max-slr-width-limit
Default: 15000
- --max-search-time
The max runtime (in seconds) of each ILP solving process
Default: 600
- --enable-synth-util
Enable post-synthesis resource utilization report for floorplanning.
Default: False
- --disable-synth-util
Disable post-synthesis resource utilization report for floorplanning.
Default: True
- --max-parallel-synth-jobs
Limit the number of parallel synthesize jobs if enable_synth_util is set
Default: 8
- --floorplan-pre-assignments
Providing a json file of type Dict[str, List[str]] storing the manual assignments to be used in floorplanning. The key is the region name, the value is a list of modules.Replace the outdated –directive option.
- --read-only-args
Optionally specify which mmap/async_mmap arguments of the top function are read-only. Regular expression supported.
Default: []
- --write-only-args
Optionally specify which mmap/async_mmap arguments of the top function are write-only. Regular expression supported.
Default: []
Strategy
Choose different strategy in floorplanning and codegen (advanced usage).
- --additional-fifo-pipelining
Pipelining a FIFO whose source and destination are in the same region
Default: False
- --floorplan-strategy
Possible choices: QUICK_FLOORPLANNING, SLR_LEVEL_FLOORPLANNING, HALF_SLR_LEVEL_FLOORPLANNING
Override the automatic choosed floorplanning method. QUICK_FLOORPLANNING: use iterative bi-partitioning, which has the best scalability. Typically used for designs with hundreds of tasks. SLR_LEVEL_FLOORPLANNING: only partition the device into SLR level slots. Do not perform half-SLR-level floorplanning. HALF_SLR_LEVEL_FLOORPLANNING: partition the device into half-SLR level slots.
Default: “HALF_SLR_LEVEL_FLOORPLANNING”
- --floorplan-opt-priority
Possible choices: AREA_PRIORITIZED, SLR_CROSSING_PRIORITIZED
AREA_PRIORITIZED: give priority to the area usage ratio of each slot. SLR_CROSSING_PRIORITIZED: give priority to the number of SLR crossing wires.
Default: “AREA_PRIORITIZED”