https://github.com/datawhalechina/tiny-universe/blob/main/content/TinyTransformer/tiny_transformer.py#L57 代码实现的bias是下三角矩阵