Ticket #1057 (closed bug: fixed)

Opened 22 months ago

Last modified 19 months ago

LAN and Wi-Fi hosts hosts cannot communicate with each other since SVN 1929 build

Reported by: seth.fulton@… Owned by:
Priority: normal Milestone: Firmware 2.3.7.0
Component: fon-network Version: Other
Severity: unknown
Cc: Hardware: 2.0g (FON2202)

Description (last modified by matthijs) (diff)

(Note: Read "WiFi?" where the following says "WAN", see comment:7)

LAN and WAN hosts seem to be isolated to their respective physical networks since the SVN 1929 build. LAN hosts can ping and access services on other LAN hosts fine. WAN hosts can ping and access services on other WAN hosts no problem.

When a WAN host tries to ping a LAN host and vice versa however, the communication fails. This applies to TCP connection attempts as well.

The router settings have been reset to factory defaults but the problem persists.

Attachments

972-bridge-vlan-fix.patch Download (1.0 KB) - added by matthijs 19 months ago.

Change History

Changed 22 months ago by matthijs

  • status changed from new to infoneeded

That WAN hosts cannot ping or connect to LAN hosts, is expected. The other way around, connecting pinging from a LAN host to a WAN host, should work. Note that this does not apply for hosts that use the public wifi signal: They can access the internet, but are explicitely denied access to the WAN network.

Furthermore, can you access the internet from a LAN host, or is that also broken? Can you ping the Fonera itself (192.168.10.1 by default)? What IP address does your LAN host get assigned?

When you say "since r1929", does that mean it was still working in r1928, or did you just test r1929?

Changed 22 months ago by seth.fulton@…

It was last working on build 1876. I did not test with any builds between 1876 and 1929. Does that help?

Changed 22 months ago by matthijs

A bit. Looking at the changelog, the most likely candidates for breaking this are the changes made around the following revisions: r1918 - r1920 or r190 - r1912 (though none of them seem to be directly related).

If you have some time, it would be nice if you could further pinpoint the breaking revision, perhps starting to test around the above revision ranges. If not, no problem, I'll have a go at reproducing the issue tomorrow.

In any case, thanks for testing and reporting this issue :-)

Changed 22 months ago by matthijs

  • version changed from 2.3.6.1 (Gari jr.) to Other

I can't seem to reproduce your problem on my 2.0g.

I just flashed my 2.0g with a fresh r1929 build, resetting to factory settings in the process. I connected my laptop to the 2.0g's LAN port, and everything now works fine. I can use the internet, ping and connect to hosts on my WAN network just fine.

Perhaps your build somehow got messed up? You might want to try to do a "make clean" followed by a "make" to make sure everything gets rebuilt.

If the issue persists, it would be helpful if you could pinpoint the breaking revision more precisely, as I suggested above.

Right now, there's not much I can do to further investigate it. Please try and/or comment on the above and we'll see what the next steps should be.

Changed 21 months ago by seth@…

The problem has been fixed as of build 1972, thank you.

Changed 21 months ago by matthijs

  • status changed from infoneeded to closed
  • resolution set to worksforme

Cool, thanks for the update.

Changed 21 months ago by seth@…

Also, just to clarify, I made a mistake in the title of this bug when I created it. I should have read "LAN and Wi-Fi hosts", not "LAN and WAN" hosts. Can you please correct it if possible?

Changed 21 months ago by matthijs

  • description modified (diff)
  • summary changed from LAN and WAN hosts cannot communicate with each other since SVN 1929 build to LAN and Wi-Fi hosts hosts cannot communicate with each other since SVN 1929 build

Hmm, that might explain why I couldn't reproduce the issue. Oh well, it's fixed now, perhaps as a side-effect of another fix.

Changed 21 months ago by danilo.schembri@…

Hi, I can confirm this bug on 2.3.7.0 beta 1.

Wired can’t ping wireless and vice versa; wired clients are GNU/Linux; wireless are Android ones. Firewall is setted to allow only traffic from Private -> WAN, but even though I allow all others, the trouble unchanges.

In any circumstances this could be a feature (isolate LAN from WiFi??), but it could be changeable from firewall policies.

Please, let me know if you want me to test anything.

Changed 21 months ago by danilo.schembri@…

I wanted to reopen this bug, but it seems I simply can't. :-(

Changed 21 months ago by matthijs

  • status changed from closed to reopened
  • resolution worksforme deleted

Apparently the trac workflow configuration doesn't allow me to let anonymous users just reopen a ticket without changing any other properties of the ticket. The config page led me te believe so, but it doesn't work like that (probably a bug, but well...). I've disabled reopening by anonymous users again and will return to manually reopening bugs on request.

Changed 21 months ago by matthijs

  • milestone set to Firmware 2.3.7.0

Changed 21 months ago by matthijs

Hmm, I just thought I could reproduce this issue and then it started working (that is, pinging from WLAN to WLAN and LAN to WLAN worked, source was a Linux laptop, target a Nokia Symbian phone).

I'll have to try again next week, since I did actually think it didn't work the first couple of minutes. Perhaps my tcpdumping of the wireless interface influenced things, I'm not sure.

Changed 21 months ago by danilo.schembri@…

I use class A IP configuration (10.0.0.0 mask 255.255.255.0); I don't known if this could do the difference. If you want me to try anything (tcpdump or other), please, simply ask.

Changed 21 months ago by seth@…

The power went out here and went it came back on the LAN - Wi-Fi isolation bug was back. 2202g is running SVN 1972. Maybe this has something to do with the configuration part of the filesystem? I'm compiling SVN 1999 right now to see if that fixes it.

Changed 21 months ago by Danilo Schembri <danilo.schembri@…>

Do you want me to paste output from iptables -L?

Changed 21 months ago by anonymous

Ugrading to the 1999SVN build did not solve the problem. I pulled my Ethernet switch out of the loop and directly connected my PC to the Fonera LAN port, but this did not work either so the Ethernet switch is not the culprit.

Changed 20 months ago by anonymous

I think this has something to do with the configuration partition. I recently flashed SVN 2010 on my 2202g and the LAN-WiFi? bug went away. Then the fonera decided to spontaneously reboot and suddenly communication between LAN and WiFi? was broken again. Incredibly frustrating. Flashing 2012 SVN from command did not resolve the issue.

Changed 20 months ago by seth@…

setting the unit back to factory defaults does not help either. If there's a way to manually blast the squashfs/j2ffs whatever partition clean for troubleshooting purposes I'm all ears.

Changed 20 months ago by matthijs

A factory reset should do just that: Reset the jffs2 partition and then reboot. See /bin/factory.sh.

Another thing you could try to find out what's wrong is to play around with tcpdump. You can install it using:

root@Fonera:~# opkg update
root@Fonera:~# opkg install tcpdump

and run it using "tcpdump -i eth0.1 icmp" (I added the icmp filter do dump only icmp / ping packets (so you don't end up dumping your own SSH packets, creating a loop). You can dump on different interfaces using the -i option.

Perhaps this helps to understand where packets are going?

Another thing to try would be to clean the iptables rules, to see if that helps (note that this also disables NAT / internet connectivity):

iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -F
iptables -X
iptables -t nat -F
iptables -t nat -X
iptables -t mangle -F
iptables -t mangle -X

Feel free to drop by on IRC as well to discuss your findings, I just missed you this morning :-)

Changed 20 months ago by seth@…

Downgrading to the "Flipper" release will fix this problem.

Changed 20 months ago by Danilo Schembri <danilo.schembri@…>

Hi,

What's happening here?

Let me share with you my tests.

1. I installed tcpdump via opkg 2. I started ping from wired network to wireless -> no response 3. I started tcpdump with tcpdump -i eth0.1 icmp -> no output, but wireless starts to respond and communication works regularly! 4. I stopped tcpdump (with C) -> wired and wireless are isolated again

Are you sure I need to try testing dropping iptables chains? I think not.

Changed 20 months ago by seth@…

Bug still exists on 2024 build. Any other troubleshooting steps? This is incredibly frustrating. I'm about to the point where I'm going to sell my Fonera and hang it up with Fon for good.

Changed 20 months ago by Danilo Schembri <danilo.schembri@…>

Seth, as a dirty workaround, you can hold tcpdump in background

tcpdump -i eth0.1 icmp > /dev/null &

Changed 20 months ago by hans@…

I recognized this bug first on 2.3.7.0 Beta 1, but it can easily have been there before. When I upgraded to 2.3.7.0 Beta 2 it crashed so i loaded flipper. After upgrading to 2.3.7.0 Beta 2 the bug was gone, and I was happy. Couple of day's later i like to connect from wifi to lan again and I see the bug is back. Testing on 2g and UMTS dongle for internet connection.

Changed 20 months ago by hans@…

Did a lot of testing with different firmware. In SVN 1876 Wlan <> Lan was still working. In SVN 1929 Wlan <> Lan is NOT working anymore. I have no SVN in between to test with. I noticed that the file /etc/fonstated/redirect is not in svn1929 anymore. Putin the file back in the directory (in svn2021) did not solve the problem. Hope this can help, I am out for a beer now. ;-)

Changed 19 months ago by matthijs

Thanks for all of the feedback, people. I've managed to reproduce the problem here again, and running tcpdump seems to "fix" the problem here as well. I still have no clue what the problem is, but this tcpdump thing will probably provide a starting point for debugging. I'll keep you posted with any progress.

Changed 19 months ago by matthijs

I think I found the cause of this bug, a bug in the kernel causes the promsicuity of the ethernet device to be wrong, breaking the bridge. This only happens when the bridge and vlan configuration happen in a very specific order, so I assume that an unrelated change to the networking code caused the initialization order to sublty change, exposing this kernel bug.

I'm working on a fix now, which I'll also send upstream. Looks like we can finally closes this issue soon.

For anyone wanting a sneak preview of the fix, I'm attaching the patch I'm testing now. Just drop it into openwrt/target/linux/generic-2.6/patches-2.6.26/ and it should include the patch.

Changed 19 months ago by matthijs

Changed 19 months ago by matthijs

  • status changed from reopened to closed
  • resolution set to fixed

(In [2029]) kernel: Add patch to fix VLAN/bridge promiscuity bug.

This patch fixes a problem with the promiscuity of ethernet devices when a vlan device is (de)configured and added/removed to/form an bridge in a specific order.

This patch has also been submitted upstream, see  http://marc.info/?l=linux-netdev&m=132007004619404&w=2

On the 2.0g Fonera, this bug caused communication between wifi clients and wired LAN clients to become impossible, because the eth0 device did not get set to promiscuous mode.

In the past, this problem did not occur. I assume that recent changes in network (re)configuration triggered this kernel bug, whereas it was not triggered before. By applying this patch, the actual problem in the kernel is fixed, making the order of (de)configuration irrelevant.

This patch is only applied to 2.0g, since the VLAN code is significantly different in the kernel version used on 2.0n. Also, the problem does not seem to occur on 2.0n, so it was probably introduced between 2.6.21 and 2.6.26.

Closes: #1057

Changed 19 months ago by matthijs

Please let me know if this commit actually fixes your problems as well.

Thanks to everyone who helped debugging this problem, you were a great help!

Changed 19 months ago by matthijs

For anyone that wants to test the latest trunk, without compiling it yourself, here's a compiled version of r2029:

 http://www.mkit.nl/static/tmp/fonera2-trunk-r2029.img

See sshflash for flashing instructions.

Add/Change #1057 (LAN and Wi-Fi hosts hosts cannot communicate with each other since SVN 1929 build)

Author


E-mail address and user name can be saved in the Preferences.


Action
as closed
 
Note: See TracTickets for help on using tickets.