Skip to content

Commit

Permalink
up
Browse files Browse the repository at this point in the history
  • Loading branch information
haoxins committed May 2, 2024
1 parent 7b498e0 commit b6ac529
Show file tree
Hide file tree
Showing 2 changed files with 61 additions and 7 deletions.
65 changes: 59 additions & 6 deletions 2024/paper-blog-ml.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,66 @@ date: 2023-09-07
---



```
```

---

- [OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework](https://arxiv.org/abs/2404.14619)
- Submitted on 22 Apr 2024
- [CoreNet](https://github.com/apple/corenet)

```
At the core of OpenELM lies layer-wise scaling,
enabling more efficient parameter allocation across layers.
This method utilizes smaller latent dimensions in the attention
and feed-forward modules of the transformer layers closer to the input,
and gradually widens the layers as they approach the output.
```

```
We adopt the decoder-only transformer-based architecture.
Following state-of-the-art LLMs, we:
(1) do not use learnable bias parameters in any fully-connected
(a.k.a., linear) layers,
(2) apply pre-normalization using RMSNorm and also,
use rotatory positional embedding (ROPE)
for encoding positional information,
(3) use grouped query attention (GQA)
instead of multi-head attention (MHA),
(4) replace the feed-forward network (FFN) with SwiGLU FFN,
(5) use flash attention for computing the scaled
dot-product attention, and
(6) use the same tokenizer as LLama.
```

```
Existing LLMs use the same configuration for each
transformer layer in the model, resulting in a uniform
allocation of parameters across layers.
Unlike these models, each transformer layer in OpenELM
has a different configuration
(e.g., number of heads and feed-forward network dimension),
resulting in a variable number of parameters in each layer of the model.
This lets OpenELM to better utilize the available parameter budget
for achieving higher accuracies. We implement this non-uniform
allocation of parameters across layers using layer-wise scaling.
```

```
Layer-wise scaling.
A standard transformer layer is composed of
multi-head attention (MHA) and feed-forward network (FFN).
For non-uniform allocation of parameters in the
transformer layer, we adjust the number of attention heads
and the FFN multiplier in each transformer layer.
```

> 其实没啥内容! 大体上就是项目介绍;
然后就是宣布 Apple 入场了.
有点赶紧拼凑个成果的感觉~

---

- [Multimodal Foundation Models: From Specialists to General-Purpose Assistants](https://arxiv.org/abs/2309.10020)

Expand Down Expand Up @@ -56,14 +110,13 @@ model the deep interaction between image and text representations.
###



---

- https://github.com/HKUDS/OpenGraph
- [GitHub](https://github.com/HKUDS/OpenGraph)

---

- https://github.com/HKUDS/GraphEdit
- [GitHub](https://github.com/HKUDS/GraphEdit)

---

Expand Down Expand Up @@ -546,7 +599,7 @@ technology while working on metrics and robust evaluation.

- [A Review on Graph Neural Network Methods in Financial Applications](https://arxiv.org/abs/2111.15367)
- 2021 (v1), 2022 (v2)
- https://github.com/ProsusAI/finBERT
- [GitHub](https://github.com/ProsusAI/finBERT)

```
While GCN equally treats the neighbors of the target node,
Expand Down Expand Up @@ -597,7 +650,7 @@ efficiency of GNN algorithms is worth further exploration.
---

- [Relational Deep Learning: Graph Representation Learning on Relational Databases](https://arxiv.org/abs/2312.04615)
- https://github.com/snap-stanford/relbench
- [GitHub](https://github.com/snap-stanford/relbench)
- 其实这里上下文中的 Databases 是 Datasets

```
Expand Down
3 changes: 2 additions & 1 deletion 2024/references.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ date: 2023-12-21

### Crypto

- [coinglass](https://www.coinglass.com)
- [CoinGlass](https://www.coinglass.com)
- [BrowserLeaks](https://browserleaks.com)

### 一些好的微信文章

Expand Down

0 comments on commit b6ac529

Please sign in to comment.