🎈前言
继续是算法的学习笔记,这篇博客来用 rust 实现一个替罪羊树。
替罪羊树(Scapegoat Tree)是一种自平衡二叉搜索树,由Arne Andersson提出。它通过暴力重构操作来维持平衡,而不是像其他平衡树那样依赖旋转操作。
替罪羊树它有一个初始化时设置的平衡因子属性alpha,取值 。对于一个子树node,节点数量不妨记为node.unique(如果节点统计词频的话,这里的节点是和词频无关的、非重复的节点),当它节点数量满足
的时候,这个子树会被重构为一棵平衡的二叉搜索树。具体来说,此时我们会对node进行中序遍历,然后再以中序遍历的中点为根,重构一棵平衡的二叉搜索树。

不太严谨地说,重构得越多,树越平衡,但是插入、删除消耗的成本会增加,因为每次重构需要 时间复杂度, 是重构节点数量。虽然重构消耗大,但是通过设置合理的平衡因子,减少重构次数,它能实现平均 时间复杂度的插入、删除操作,经验上一般这个数值取 0.7。
在删除节点阶段,替罪羊树可以把删除节点的词频置为 0,不必要去每次都调整树结构,从而节省时间。由于有重构的存在,我们只需要在重构时把词频为 0 的节点忽略即可维护树的平衡。
🎆类型定义
没什么好说的,除了平衡二叉树节点常见的key、size、count等属性外,这里需要unique`统计非重复节点个数。此外,在替罪羊树节点上,需要记录重构时的中序遍历、根节点、其亲代节点、以及重构的根节点是左右哪个子树,用来与重构前原有亲代连接。
rust- #[derive(Debug)]
- struct ScapegoatTreeNode {
- key: i32,
- count: i32,
- left: OptionScapegoatTreeRc,
- right: OptionScapegoatTreeRc,
- unique: i32,
- size: i32
- }
- type ScapegoatTreeRc = Rc<RefCell<ScapegoatTreeNode>>;
- type OptionScapegoatTreeRc = Option<
- ScapegoatTreeRc
- >;
- impl ScapegoatTreeNode {
- fn new(key: i32) -> Rc<RefCell<Self>> {
- Rc::new(RefCell::new(Self {
- key,
- count: 1,
- left: None,
- right: None,
- unique: 1,
- size: 1
- }))
- }
- }
- const LEFT_SIDE: i32 = 1;
- const RIGHT_SIDE: i32 = 2;
- struct ScapegoatTree {
- root: OptionScapegoatTreeRc,
- alpha: f64,
- collect: Vec<ScapegoatTreeRc>,
- unbalanced_root: OptionScapegoatTreeRc,
- unbalanced_root_parent: OptionScapegoatTreeRc,
- unbalanced_root_side: i32,
- }
- impl ScapegoatTree {
- fn new(alpha: f64) -> Self {
- ScapegoatTree {
- root: None,
- alpha,
- collect: vec![],
- unbalanced_root: None,
- unbalanced_root_parent: None,
- unbalanced_root_side: 0
- }
- }
- }
🎇更新节点
更新操作只需要更新size和unique。
rust- fn size(node: OptionScapegoatTreeRc) -> i32 {
- match node {
- None => 0,
- Some(x) => {
- x.borrow().size
- }
- }
- }
- fn unique(node: OptionScapegoatTreeRc) -> i32 {
- match node {
- None => 0,
- Some(x) => {
- x.borrow().unique
- }
- }
- }
- fn update(node: OptionScapegoatTreeRc) {
- match node {
- None => {},
- Some(x) => {
- let left = x.borrow().left.clone();
- let right = x.borrow().right.clone();
- let count = x.borrow().count;
- x.borrow_mut().size = Self::size(left.clone()) + Self::size(right.clone()) + count;
- x.borrow_mut().unique = Self::unique(left.clone()) + Self::unique(right.clone()) + if count > 0 {
- 1
- } else {
- 0
- }
- }
- }
- }
⚖判断平衡性
我们来用一个工具方法判断树节点的平衡性,在下面增加删除节点时会用到。
rust- fn balance(&self, node: OptionScapegoatTreeRc) -> bool {
- match node {
- None => return true,
- Some(x) => {
- return self.alpha * x.borrow().unique as f64 >= i32::max(
- Self::unique(x.borrow().left.clone()),
- Self::unique(x.borrow().right.clone())
- ) as f64;
- }
- }
- }
💥重构平衡树
当增加、删除操作破坏了树的平衡性时,替罪羊树将会指定待重构的根节点,和它的亲代节点以及它是哪个子树(下面再说)。然后替罪羊树开始重构平衡树,中序遍历待重构的根节点收集count非 0 的节点,然后从中点开始前序地重新组织平衡树。
rust- fn inorder(&mut self, node: OptionScapegoatTreeRc) {
- match node {
- None => {},
- Some(x) => {
- self.inorder(x.borrow().left.clone());
- if x.borrow().count > 0 {
- self.collect.push(x.clone());
- }
- self.inorder(x.borrow().right.clone());
- }
- }
- }
- fn build(&mut self, l: i32, r: i32) -> OptionScapegoatTreeRc {
- if l > r {
- return None
- }
- let mid = (l + r) / 2;
- let head = self.collect[mid as usize].clone();
- head.borrow_mut().left = self.build(l, mid - 1);
- head.borrow_mut().right = self.build(mid + 1, r);
- Self::update(Some(head.clone()));
- Some(head)
- }
- fn rebuild(&mut self) {
- match self.unbalanced_root.clone() {
- None => {},
- Some(x) => {
- self.collect.clear();
- self.inorder(Some(x.clone()));
- if self.collect.len() > 0 {
- if self.unbalanced_root_parent.is_none() {
- self.root = self.build(0, self.collect.len() as i32 - 1);
- } else if self.unbalanced_root_side == LEFT_SIDE {
- self.unbalanced_root_parent.clone().unwrap().borrow_mut().left = self.build(0, self.collect.len() as i32 - 1);
- } else {
- self.unbalanced_root_parent.clone().unwrap().borrow_mut().right = self.build(0, self.collect.len() as i32 - 1);
- }
- }
- }
- }
- }
➕增加节点
在增加删除节点时,我们需要指定待重构根节点的信息。这里会可能存在一种情况,如果子树及其后代某个子树都是不平衡的,此时需要重构的只有最高层的子树,因为它会把后代不平衡的子树也重新排序。因此我们采用后序遍历,这时指定的待重构节点也是最高层的节点。.
rust- fn insert(&mut self, key: i32) {
- self.unbalanced_root = None;
- self.unbalanced_root_parent = None;
- self.unbalanced_root_side = 0;
- self.insert_helper(self.root.clone(), self.unbalanced_root_parent.clone(), self.unbalanced_root_side, key);
- self.rebuild();
- }
- fn insert_helper(&mut self, node: OptionScapegoatTreeRc, parent: OptionScapegoatTreeRc, side: i32, key: i32) {
- match node {
- None => {
- if parent.is_none() {
- self.root = Some(ScapegoatTreeNode::new(key));
- } else if side == LEFT_SIDE {
- parent.clone().unwrap().borrow_mut();
- parent.clone().unwrap().borrow_mut().left = Some(ScapegoatTreeNode::new(key));
- } else {
- parent.clone().unwrap().borrow_mut().right = Some(ScapegoatTreeNode::new(key));
- }
- },
- Some(x) => {
- if x.borrow().key == key {
- x.borrow_mut().count += 1;
- } else if x.clone().borrow().key > key {
- let left = x.borrow().left.clone();
- self.insert_helper(left, Some(x.clone()), LEFT_SIDE, key);
- } else {
- let right = x.borrow().right.clone();
- self.insert_helper(right, Some(x.clone()), RIGHT_SIDE, key);
- }
- Self::update(Some(x.clone()));
- if !self.balance(Some(x.clone())) {
- self.unbalanced_root = Some(x.clone());
- self.unbalanced_root_parent = parent.clone();
- self.unbalanced_root_side = side;
- }
- }
- }
- }
❌删除节点
类似地,后序遍历找到待删除节点,把count减 1,由于有重构的存在,并不需要每次删除时调整子树结构。另外,这里借助查询节点排序的rank方法(下面说到)来确定节点是否存在。
rust- fn remove(&mut self, key: i32) -> bool {
- if self.rank(key) != self.rank(key + 1) {
- self.unbalanced_root = None;
- self.unbalanced_root_parent = None;
- self.unbalanced_root_side = 0;
- self.remove_helper(self.root.clone(), self.unbalanced_root_parent.clone(), self.unbalanced_root_side, key);
- self.rebuild();
- true
- } else {
- false
- }
- }
- fn remove_helper(&mut self, node: OptionScapegoatTreeRc, parent: OptionScapegoatTreeRc, side: i32, key: i32) {
- match node {
- None => {},
- Some(x) => {
- if x.borrow().key == key {
- x.borrow_mut().count -= 1;
- } else if x.borrow().key < key {
- let right = x.borrow().right.clone();
- self.remove_helper(right, Some(x.clone()), RIGHT_SIDE, key);
- } else {
- let left = x.borrow().left.clone();
- self.remove_helper(left, Some(x.clone()), LEFT_SIDE, key);
- }
- Self::update(Some(x.clone()));
- if !self.balance(Some(x.clone())) {
- self.unbalanced_root = Some(x.clone());
- self.unbalanced_root_parent = parent.clone();
- self.unbalanced_root_side = side;
- }
- }
- }
- }
🔟排序的查询
查询某个key在有序表中的顺序,以及查询有序表中排第idx位的节点,和之前写的 AVL 平衡树的操作类似。
rust- fn rank(&self, key: i32) -> i32 {
- Self::rank_helper(self.root.clone(), key) + 1
- }
- fn rank_helper(node: OptionScapegoatTreeRc, key: i32) -> i32 {
- match node {
- Some(cur) => {
- if cur.borrow().key >= key {
- let left = cur.borrow().left.clone();
- Self::rank_helper(left, key)
- } else {
- let left = cur.borrow().left.clone();
- let right = cur.borrow().right.clone();
- Self::size(left) + cur.borrow().count + Self::rank_helper(right, key)
- }
- },
- None => 0
- }
- }
- fn index(&self, idx: i32) -> i32 {
- Self::index_helper(self.root.clone(), idx)
- }
- fn index_helper(node: OptionScapegoatTreeRc, idx: i32) -> i32 {
- match node {
- None => i32::MIN,
- Some(x) => {
- let left = x.borrow().left.clone();
- let left_size = Self::size(left.clone());
- let current_count = x.borrow().count;
- if left_size >= idx {
- let ans = Self::index_helper(left, idx);
- return ans
- } else if left_size + current_count >= idx {
- return x.borrow().key
- } else {
- let right = x.borrow().right.clone();
- let ans = Self::index_helper(right, idx - left_size - current_count);
- return if ans != i32::MIN { ans } else { i32::MAX }
- }
- }
- }
- }
🖇前驱后继的查询
这也就是找到小于节点的最大值和大于节点的最小值。因为有count为 0 的节点,方便起见,借助rank和index方法来完成查询。
rust- fn pre(&self, key: i32) -> i32 {
- let rank = self.rank(key);
- if rank == 1 {
- i32::MIN
- } else {
- self.index(rank - 1)
- }
- }
- fn next(&self, key: i32) -> i32 {
- let rank = self.rank(key + 1);
- if rank == Self::size(self.root.clone()) + 1 {
- i32::MAX
- } else {
- self.index(rank)
- }
- }
🎁结语
这是复习数据结构的第三篇博客,用 rust 实现了一个替罪羊树。
参考:
【算法讲解150【扩展】有序表专题3-替罪羊树】 https://www.bilibili.com/video/BV1GqDSYbEPR/?share_source=copy_web&vd_source=a06db201a5f7fab5fe54f12bff164f84
