Beignet is an open-source OpenCL implementation for Intel graphics cores on Linux.
Activity Earlier In The Year
- libocl: Add three work-item built-in function
Pan Xiuli: Add get global/local linear id by calculate with global/local id, size and offset.
- gbe: add vec_type_hint's type into functionAttributes
Luo Xionghu: for SPIR kernel, user may call clGetKernelInfo with CL_KERNEL_ATTRIBUTES to query the functionAttributes.
- add utest to demo how to run CM kernerl via OpenCL APIs
Guo Yejun: In this test case, the CM kernel is in VISA binary format, not in GenX Binary format, it means that the CM jitter is needed to compile the CM kernel from VISA format to GenX format, please refer to cmrt_package_path/jitter/readme.txt to prepare the jitter.
- make Beignet as intermedia layer of CMRT
Guo Yejun: CMRT is C for Media Runtime on Intel GPU, see
- Add a option which could set the benchmark unit properly
Meng Mengmeng: For benchmarks, the units are varied e.g.
- Backend: Add gen9 barrier prediction setting
Pan Xiuli: Gen9 have a different context to emit BarrierInst that contains wait instruction, and wait instruction need to be no predication.
- Backend: add debugwait function
Pan Xiuli: Use wait function to extend a debug function:
void debugwait(void) This function can hang the gpu unless gpu reset or host send something to let it go.
- Backend: enable to choose notification register
Pan Xiuli: There are 3 notification can be used by wait, so we should be able to choose which one we'd like to use.
- Add profiling info APIs to runtime
- Backend: Add profilingProlog function for GenContext
Junyan He: The profilingProlog will collect useful information for profiling, including XYZ global range and prolog timestamp.
Most Popular This Year
- reimplement structurize algorithm
Luo Xionghu: serial, loop and if pattern match from top to down.
- add bswap64 for gen7/gen75 and gen8 seperately
Luo Xionghu: as the long type data layout is not continous on platform gen7/gen75, the indirect address access pattern is a bit different than gen8.
- runtime: Add cl device's standalone extension
Junyan He: The cl device may have different extensions from the platform.
- add basic function to dump Selection IR
Guo Yejun: Selection IR is a representation between Gen IR and Gen ASM, it is almost a Gen instruction but *before* the register allocation.
only basic dump supported, not fully completed yet.
- GBE: Add llvm3.7 support
Yang Rong: Move all llvm relative includes to llvm_includes.
- Backend: Add half support for CHV and SKL.
- add basic structure for selection IR optimization
Guo Yejun: The idea is that many optimzations can be done at selection IR level, which is nearly ISA-like *before* physical register allocation.
- enable CL_UNSIGNED_INT8 for CL_RG to fix regression
Guo Yejun: the regression is caused when only enable CL_UNORM_INT8 for CL_RG, the reason is that during the image copy implementation with internal kernel, all formats are considerd as integer format, it becomes unknown since CL_UNSIGNED_INT8 is not enabled yet.
- GBE: optimize phi elimination
Ruiling Song: This is special optimization for below situation:
bb2: x = phi [x1, bb1], [x2, bb2] x2 = x+1; after de-ssa: bb2: mov x, x-copy add x2, x, 1 mov x-copy, x2 obviously x2, x-copy and x2 can be mapped to same virtual register.
v2: only coaleasce if the source register comes from insn def.
- Make tgamma meet the accuracy standard
Rebecca N. Palmer: The old tgamma=exp(lgamma) implementation had high rounding error on large outputs, exceeding the 16ulp specification for approx.