-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Item ID Index 的起始值问题 #24
Comments
您好,请问您指的是类似于
|
嗯嗯,对的!感谢您解答了我的疑惑。 另外,我最近做实验的时候发现了另外一个有趣的问题,若您有空想求您帮忙解答 🙏,具体问题如下: 对于序列推来说 RecBole 的实现方式是否为 seq2seq 的方式呢,比如对于 item_seq [2, 6, 1, 7] pos_item [8]:
我自己对 SASRec 模型做了一些改动实现了 2 这种的预测方式,但是发现推荐性能下降非常多,例如在 ml-100k 数据集上从 0.0774 -> 0.0392; 希望您能不吝赐教,非常感谢! 🙏 import torch
from recbole.model.abstract_recommender import SequentialRecommender
from recbole.model.layers import TransformerEncoder
from recbole.model.loss import BPRLoss
from torch import nn
class SASRec(SequentialRecommender):
r"""
SASRec is the first sequential recommender based on self-attentive mechanism.
NOTE:
In the author's implementation, the Point-Wise Feed-Forward Network (PFFN) is implemented
by CNN with 1x1 kernel. In this implementation, we follows the original BERT implementation
using Fully Connected Layer to implement the PFFN.
"""
def __init__(self, config, dataset):
super(SASRec, self).__init__(config, dataset)
# load parameters info
self.n_layers = config["n_layers"]
self.n_heads = config["n_heads"]
self.hidden_size = config["hidden_size"] # same as embedding_size
self.inner_size = config["inner_size"] # the dimensionality in feed-forward layer
self.hidden_dropout_prob = config["hidden_dropout_prob"]
self.attn_dropout_prob = config["attn_dropout_prob"]
self.hidden_act = config["hidden_act"]
self.layer_norm_eps = config["layer_norm_eps"]
self.initializer_range = config["initializer_range"]
self.loss_type = config["loss_type"]
# define layers and loss
self.item_embedding = nn.Embedding(self.n_items, self.hidden_size, padding_idx=0)
self.position_embedding = nn.Embedding(self.max_seq_length, self.hidden_size)
self.trm_encoder = TransformerEncoder(
n_layers=self.n_layers,
n_heads=self.n_heads,
hidden_size=self.hidden_size,
inner_size=self.inner_size,
hidden_dropout_prob=self.hidden_dropout_prob,
attn_dropout_prob=self.attn_dropout_prob,
hidden_act=self.hidden_act,
layer_norm_eps=self.layer_norm_eps,
)
self.LayerNorm = nn.LayerNorm(self.hidden_size, eps=self.layer_norm_eps)
self.dropout = nn.Dropout(self.hidden_dropout_prob)
if self.loss_type == "BPR":
self.loss_fct = BPRLoss()
elif self.loss_type == "CE":
self.loss_fct = nn.CrossEntropyLoss()
else:
raise NotImplementedError("Make sure 'loss_type' in ['BPR', 'CE']!")
# parameters initialization
self.apply(self._init_weights)
def _init_weights(self, module):
"""Initialize the weights"""
if isinstance(module, (nn.Linear, nn.Embedding)):
# Slightly different from the TF version which uses truncated_normal for initialization
# cf https://github.com/pytorch/pytorch/pull/5617
module.weight.data.normal_(mean=0.0, std=self.initializer_range)
elif isinstance(module, nn.LayerNorm):
module.bias.data.zero_()
module.weight.data.fill_(1.0)
if isinstance(module, nn.Linear) and module.bias is not None:
module.bias.data.zero_()
def forward(self, item_seq, item_seq_len):
position_ids = torch.arange(item_seq.size(1), dtype=torch.long, device=item_seq.device)
position_ids = position_ids.unsqueeze(0).expand_as(item_seq)
position_embedding = self.position_embedding(position_ids)
item_emb = self.item_embedding(item_seq)
input_emb = item_emb + position_embedding
input_emb = self.LayerNorm(input_emb)
input_emb = self.dropout(input_emb)
extended_attention_mask = self.get_attention_mask(item_seq)
trm_output = self.trm_encoder(input_emb, extended_attention_mask, output_all_encoded_layers=True)
output = trm_output[-1]
#### for middle item prediction ###
mask = item_seq != 0
mask[torch.arange(mask.size(0), device=mask.device), item_seq_len - 1] = False
middle_output = output[mask]
###################################
# for last item prediction
target_output = self.gather_indexes(output, item_seq_len - 1)
return middle_output, target_output
def calculate_loss(self, interaction):
item_seq = interaction[self.ITEM_SEQ]
item_seq_len = interaction[self.ITEM_SEQ_LEN]
middle_output, seq_output = self.forward(item_seq, item_seq_len)
pos_items = interaction[self.POS_ITEM_ID]
if self.loss_type == "BPR":
# neg_items = interaction[self.NEG_ITEM_ID]
# pos_items_emb = self.item_embedding(pos_items)
# neg_items_emb = self.item_embedding(neg_items)
# pos_score = torch.sum(seq_output * pos_items_emb, dim=-1) # [B]
# neg_score = torch.sum(seq_output * neg_items_emb, dim=-1) # [B]
# loss = self.loss_fct(pos_score, neg_score)
# return loss
raise NotImplementedError
else: # self.loss_type = 'CE'
test_item_emb = self.item_embedding.weight
# last item prediction loss
logits = torch.matmul(seq_output, test_item_emb.transpose(0, 1))
loss = self.loss_fct(logits, pos_items)
# return loss
#### for middle item prediction ###
item_seq = item_seq[:, 1:]
mask = item_seq != 0
targets = item_seq[mask]
middle_logits = torch.matmul(middle_output, test_item_emb.transpose(0, 1))
mid_loss = self.loss_fct(middle_logits, targets)
###################################
return loss + mid_loss
def predict(self, interaction):
item_seq = interaction[self.ITEM_SEQ]
item_seq_len = interaction[self.ITEM_SEQ_LEN]
test_item = interaction[self.ITEM_ID]
_, seq_output = self.forward(item_seq, item_seq_len)
test_item_emb = self.item_embedding(test_item)
scores = torch.mul(seq_output, test_item_emb).sum(dim=1) # [B]
return scores
def full_sort_predict(self, interaction):
item_seq = interaction[self.ITEM_SEQ]
item_seq_len = interaction[self.ITEM_SEQ_LEN]
_, seq_output = self.forward(item_seq, item_seq_len)
test_items_emb = self.item_embedding.weight
scores = torch.matmul(seq_output, test_items_emb.transpose(0, 1)) # [B n_items]
return scores |
理论上这两种方式应该没什么区别,毕竟 SASRec 原始 repo 就是按照 2 的方式做的。 RecBole 现在实现的是 1 这种方式主要是为了通用性,比如 SASRec 可以通过 2 的方式加速训练,但是 GRU4Rec 就不行,为了方便,就统一使用 1 这种最通用的形式了。 在这样的背景下,RecBole 给出的数据就都是按照 1 的方式处理的,如果只在模型层面强制要求按照 2 的方式训练,可能会序列靠前位置的 item 被当作 objective 训练很多次,可能会有 bias 导致效果不对齐。但是我也还没有详细检查是否是这个原因,只是说如果只改 model 的代码而不改 data 的代码,可能实现出来的也并不是 2 这种方法。 |
The text was updated successfully, but these errors were encountered: