High throughput AES encryption/decryption is anecessity for many of modern embedded systems. This articlepresents a high performance yet cost efficient AES system.Maestro can be used in a wide range of embedded applicationswith various requirements and limitations. Maestro is about onemillion times faster than the pure software implementation. TheMaestro architecture is composed of two major components; thesoft processor aimed at system initialization and control, and thehardware AES engine for high performance AESencryption/decryption. A ten stage implicit pipelinedarchitecture is considered for the AES engine. Two noveltechniques are proposed in design of AES engine which enable itto reach a throughput of 12.8 Gbps. First, tightly coupledencryption and round key generation units in encryption unit,and second, ahead of time round key generation in decryptionunit. Altera DE2-115 development and educational FPGA boardis used as the platform for Maestro. In the proposed architecturethe DMA modules act as interfaces between data sources anddata sinks by loading the input data into AES engine and takingencrypted and generated test data to target memories.